Energy-Efficient, Delay-aware Packet Scheduling in High-Speed Networks

Qun Yu*, Taieb Znati† and Wang Yang‡

*† University of Pittsburgh, Pittsburgh, PA 15260
Email: {qyu3, znati}@pitt.edu
‡Southeast University, Nanjing, China 210096
Email: wyang@njnet.edu.cn

Abstract—In current commercial routers, increased execution speeds of Network Processor Units (NPUs) in Line Cards (LCs) significantly improve network QoS performance. Achieving high network performance, however, may come at a high cost of routers’ energy consumption. Dynamic Voltage Frequency Scaling (DVFS) and Dynamic Power Management (DPM) have been proposed as schemes to manage power and reduce energy consumption. Excessive reduction in execution rates or extended sleep periods to save energy, however, could result in severe network degradation which in turn may lead to violation of QoS requirements of the underlying applications. To address the energy-QoS dichotomy, we propose a congestion- and energy-aware packet scheduling scheme to achieve a balance between network delay and energy saving. The scheme, referred to as Queue Length (QL)-based Delay-aware packet scheduler (QLDA), uses multiple queue length thresholds to accurately capture network congestion. In response to different levels of network congestion, the QLDA scheme uses carefully designed frequency adjustment strategies to control execution rates in line cards and achieve high energy savings, without violating the delay requirements of the underlying applications. The simulation results show that the QLDA scheme has potential for significant energy saving in high-speed networks. Furthermore, a simulation study is carried out to compare the performance of the proposed scheme to other DVFS-based schemes described in the literature. The results show that the QLDA scheme outperforms the existing schemes for different network topologies and traffic loads, while meeting the delay performance of the supported applications.

Keywords. Energy-efficient, Delay-aware packet scheduling, DVFS, DPM, Network performance, Simulation.

I. INTRODUCTION

The potential environmental impact of high energy consumption, coupled with the rising cost of computing and networking infrastructure, has become a major concern in an increasingly IT-reliant society [1]. Today’s high-performance router LCs handle most traffic processing, buffering and forwarding, using specialized ASIC or other programmable hardware [2]. It has been reported that LCs consume about 70% of the total router power, with the NPUs consuming more than 50% of the power consumed by one LC [3], [4]. Recent advances in semiconductor technology, which enabled higher parallelism and increased clock frequencies, paved the way to a new generation of power routers. These advances, however, come at a heavy price of increased power consumption, due to higher line card speed [2]. Therefore, seeking efficient solutions to reducing power consumption, without adversely affecting network performance, becomes a critical design objective of future networks.

Currently, two approaches are frequently used to manage power in computing and networking environments. The first, referred to as Speed Scaling, uses Dynamic Voltage and Frequency Scaling (DVFS) to control execution rates and reduce power and energy consumption [5]–[10]. The second, referred to as Dynamic Power Management (DPM), uses sleep mode to control power [7], [8]. A large body of research work shows that DVFS can provide the basis for viable solutions to achieve significant power savings in computing, communications and storage devices [5]–[10]. However, excessively slowing down of the processors may lead to unacceptable level of QoS degradation of the supported applications. Dynamically adjusting processors’ execution rates to achieve energy saving, while satisfying QoS requirements, remains a challenge.

To address this challenge, we propose a novel QL-based, Delay-aware, DVFS-enabled packet scheduler, referred to as QLDA, to reduce energy consumption, while adhering to the QoS requirements of the supported applications. QLDA uses multiple queue length thresholds to model network congestion, and the exponentially weighted moving average (EWMA) algorithm to predict average queue length and average packet delay. In response to different levels of network congestion, different NPU-rate scaling strategies then are used to determine when and how NPU execution rates are adjusted based on the predicted queue-length and packet-delay. The goal is to achieve high energy savings, without degrading delay performance. A simulation study, based on a comprehensive energy model, is used to investigate the performance of the proposed packet scheduler, in different networking environments and traffic loads. The results show that QLDA achieves significant energy saving. Furthermore, the results of a comparative analysis study shows that QLDA outperforms similar QoS-aware DVFS-enabled schemes while maintaining the acceptable QoS requirements, with low scheduling overhead.

The rest of this paper is organized as follows: the related work is discussed in Section II. A Delay-aware DVFS-enabled scheduler and its workload prediction mechanisms and scaling strategies are presented in Section III. An energy model is introduced in Section IV. The performance of the proposed scheme is discussed, and compared with other schemes, under different network environments and traffic loads in Section V.
Section VI presents the conclusion of this paper.

II. RELATED WORK

Several energy-efficient schemes have been proposed for green networks [5], [6], [10]. Some of these schemes propose energy-based traffic engineering approaches designed to only keep a sufficient number of active routers, linecards and interfaces to support the network workload. The remaining network devices are either shutdown or put into sleep mode. Other research works focus on energy saving using DVFS-based power management approaches. In [7], Nedevschi, et al. present two simple power management algorithms, and explore the effect of sleep mode and DVFS-based rate adaptation on network energy saving. In [9], Mandviwalla et al. propose three load-dependent strategies, i.e. Value Predictor (VP), Moving Average Predictor (MAP) and Exponentially-Weighted Map (EWMA), to reduce energy consumption in multiprocessor-based LCs. The results show that more than 60% dynamic power savings of the maximal dynamic power consumption can be achieved in one LC. Although these proposed schemes seek to reduce dynamic energy consumption at different levels through using link utilization in DVFS-enabled processors, they do not address the impact of the entire energy savings on QoS performance under different traffic loads in a network. On the other hand, considering the lack of a comprehensive router-based energy model, in [11], Vishwanath, Arun, et al. propose a power model measurement methodology that quantifies the energy efficiency of high-capacity routing platforms at the packet- and byte-level. It focuses on network energy evaluation. The approach used to save energy, however, is not discussed and analyzed in detail. In this paper, the proposed Delay-aware DVFS-enable packet scheduler addresses these shortcomings, and seeks a balanced tradeoff between high network energy savings and acceptable levels of network QoS performance under different traffic loads, based on a derived comprehensive router-based energy model. This is achieved by controlling the NPU execution rates based on queue length.

III. DELAY-AWARE DVFS-SCHEDULER

In this section, we first present the basic Delay-aware, DVFS-enabled scheduling architecture. We then discuss QLDA, including the frequency scaling strategies it uses to adjust the NPU’s execution rates based on queue length.

A. Delay-aware DVFS-Scheduler Design and Architecture

The basic idea of DVFS-Scheduler is to dynamically adjust the processor frequency, based on the current state of the network, to reduce energy consumption. To design an effective QL-based Delay-aware DVFS-Scheduler, several issues must be addressed. First, a strategy must be in place to determine how queue length impacts scheduling decisions. Second, appropriate levels of congestion granularity must be taken into consideration when adjusting the NPU’s execution rate. Multiple queue length thresholds to model different levels of network congestion are considered in this paper. Third, a mechanism must be in place to predict traffic end-to-end delay and bind its variability so that the desired delay performance can be achieved. In the proposed scheduler, an effective method, which estimates packet delay variability to predict the deviation of packet delay from the target end-to-end delay, is used to control NPU’s execution rate adjustments. Finally, an adaptive mechanism must be in place to control the “aggressivity” of the scheduling policy to ensure energy savings without degrading QoS performance.

To address the above issues, a Delay-aware DVFS-enabled scheduling architecture, depicted in Fig. 1, is proposed. The Traffic Monitor (TM) monitors the packet queue, and gathers statistics related to its length. The estimated average queue length, $\bar{q}(\tau)$, and average packet delay, $\bar{d}(\tau)$, over a time interval $\tau$, are used to scale up or down the NPU execution rates. The NPU Rate Scaler (RS) computes a network state-dependent scaling function, $\xi()$, taking into consideration the aggressivity factor of the scheduling strategy, $\eta(\tau)$ generated by the average network traffic load $\rho(\tau)$, and the current level of network congestion. The DVFS Adjustor (DA) adjusts the NPU frequency, $f(\tau)$, based on the scaling factor.

Based on the above architecture, a Delay-aware DVFS-enabled packet scheduler is proposed, which uses predicted average queue length and average packet delay to control the frequency adjustment. The related NPU rate scaling strategies, queue-length and packet-delay prediction mechanisms are introduced in the following.

B. QL-based Delay-Aware Packet Scheduler (QLDA)

QLDA uses the exponentially weighted moving average (EWMA) scheme to periodically predict the average queue length over a given time interval, $\tau$, to adjust the NPU execution frequency dynamically. In order to reduce DVFS switching overhead, QLDA uses a coarser level of network congestion granularity in its decision to scale up or down the NPU execution rates. More specifically, QLDA uses two queue length thresholds, namely $q_l$ and $q_h$ ($0 \leq q_l < q_h \leq Q$) to define low, medium and high network congestion regions, as depicted in Fig.2.

![Fig. 1: Delay-aware DVFS-enabled Scheduling Architecture](image)

Based on the above three packet buffer occupancy regions, the frequency, $f_{\tau_k}$, over the $k^{th}$ time interval, $\tau_k$, $k \geq 1$, is defined in Eq. 1.
In order to predict the average packet delay, $\overline{d}_{\tau_k}$, the same exponential smoothing technique, defined in Eq. 6, is used. $d_{\tau_k}$ represents the $k^{th}$ average packet delay, which is computed based on the average queue length, the average arrival rate and the average departure rate over $\tau_k$.

$$d_{\tau_k} = (1 - w_{\tau_k}^d) \cdot \overline{d}_{\tau_k-1} + w_{\tau_k}^d \cdot d_{\tau_k}$$

The term $w_{\tau_k}^d$, $0 < w_{\tau_k}^d < 1$, denotes a dynamic delay weight factor and is defined as $w_{\tau_k}^d = c_d \cdot \frac{\sigma_d^2}{\overline{q}_{\tau_k}^2}$, where $0 < c_d < 1$. Similarly, $cd_{\tau_k}$ represents the delay prediction error function, defined as $cd_{\tau_k} = d_{\tau_k} - \overline{d}_{\tau_k}$, and $\sigma_{d_{\tau_k}}$ denotes the square prediction error for $\tau_k$, defined as $\sigma_{d_{\tau_k}} = cd_{\tau_k} \cdot cd_{\tau_k}^2 + (1 - c_d) \cdot \sigma_{d_{\tau_k-1}}$. The first order auto-regressive filter used to predict future packet delay, combined with the error prediction method used to adaptively compute the weight function $w_{\tau_k}^d$, guarantee that the predicted delay is not affected by small delay deviations.

Based on the delay prediction error function and a constant $\alpha_{dv}$, $0 < \alpha_{dv} < 1$, ($\alpha_{dv} = 0.25$ is recommended), the delay deviation can be estimated in Eq. 7.

$$\hat{d}v_{\tau_k} = (1 - \alpha_{dv}) \cdot \hat{d}v_{\tau_k-1} + \alpha_{dv} \cdot \left| cd_{\tau_k} \right|$$

QLDA relies exclusively on queue length to schedule packets. As such, it can easily be incorporated in packet scheduling schemes commonly used in current routers, such as FIFO, priority-based, and weighted fair queuing. Algorithm 1 describes the basic steps of the QLDA scheduling scheme.

IV. ENERGY MODEL

In this section, we consider a set of DVFS-enabled LCs and present a comprehensive energy model to determine the packet-based and router-based energy consumption, taking into consideration the frequency adjustment strategies used by the underlying scheduler.

A. Power Model

Two main components impact power consumption in network routers [6], [10], [11]. The first, referred to as static power, arises from the bias and leakage current to support control plan, environment units, and load-independent data plan. The second, referred to as dynamic power, results from the charging and discharging of the voltage saved in node capacitance of the circuit. We use $PS$ and $PD$ to denote static and dynamic power, respectively. In a router, NPUs operate in two possible states, namely “idle” and “busy”. In the “idle” state, the power consumption is load-independent and equals to the static power, $PS$. In the “busy” state, the power consumption is load-dependent and is composed of the static power $PS$ and dynamic power $PD$. Consequently, the power consumed by a router can be expressed as follows:

$$P = \begin{cases} 
PS, & \text{“idle” state} \\
PS + PD, & \text{“busy” state}
\end{cases}$$
Algorithm 1 QLDA Scheduling Scheme.

For each NPU in LC at the router 
Initialization: 
\[ \tau_{\text{in}}, d_{\text{in}}, \tau_{\text{out}} \leftarrow 0, k \leftarrow 1, f_{\text{init}} \leftarrow f_{\text{initial}} \]
Monitor queue length, \( \tau_{\text{in}} \), at the end time of \( \tau_{\text{in}} \)
Update the queue-length smooth filter, \( w_{\tau_{\text{in}}}^1 \)
Estimate the new average \( \bar{\tau}_{\text{in}} \) for \( \tau_{\text{in}} \)
\[ \bar{\tau}_{\text{in}} \leftarrow \left(1 - w_{\tau_{\text{in}}}^0 \right) \cdot \bar{\tau}_{\text{in}} + w_{\tau_{\text{in}}}^0 \cdot \tau_{\text{in}} \]
Calculate \( d_{\text{in}} \) based on \( \bar{\tau}_{\text{in}} \)
Estimate the new \( \bar{d}_{\text{in}} \) and \( \text{div}_{\text{in}} \) for \( \tau_{\text{in}} \)
Update the delay smooth filter, \( w_{\tau_{\text{in}}}^1 \)
\[ \bar{d}_{\text{in}} \leftarrow \left(1 - w_{\tau_{\text{in}}}^1 \right) \cdot \bar{d}_{\text{in}} + w_{\tau_{\text{in}}}^1 \cdot d_{\text{in}} \]
\[ \text{div}_{\text{in}} \leftarrow \left(1 - \alpha_{\text{div}} \right) \cdot \text{div}_{\text{in}} - \alpha_{\text{div}} \cdot |ed_{\text{in}}| \]
if \( \text{div}_{\text{in}} \leq q_t \) then
Scale frequency \( f_{\text{in}} \) to \( f_{\text{min}} \)
\[ f_{\text{in}} \leftarrow f_{\text{min}} \]
else if \( \bar{d}_{\text{in}} \geq q_t \) then
Scale frequency \( f_{\text{in}} \) to \( f_{\text{max}} \)
\[ f_{\text{in}} \leftarrow f_{\text{max}} \]
else
\[ \bar{\tau}_{\text{in}} \leftarrow \bar{\tau}_{\text{in}} - \frac{q_t}{\eta(\text{rho}_{\text{in}})} \]
\[ \text{rho}_{\text{in}} \leftarrow \text{rho}_{\text{in}} + \eta(\text{rho}_{\text{in}}) \]
Generate aggressivity factor, \( \xi_{\text{in}} \)
\[ \xi_{\text{in}} \leftarrow \frac{\tau_{\text{in}} - \bar{\tau}_{\text{in}}}{\text{div}_{\text{in}} \cdot \eta(\text{rho}_{\text{in}})} \]
Scale frequency, \( f_{\text{in}} \)
\[ f_{\text{in}} \leftarrow f_{\text{min}} + \left(f_{\text{max}} - f_{\text{min}}\right) \cdot \xi_{\text{in}} \]
end if
end if

Fixed Parameter:
- \( \text{max} \): the maximal service rate at NPU.

Saved Variable:
- \( A_{\tau_{\text{in}}} \): the number of the arrival packets over \( \tau_{\text{in}} \).

The dynamic power, \( P^D \), operated by the processor frequency, can be further expressed as \( P^D = \gamma \cdot f^3 \) [5], [12]. The parameter \( f \) denotes the clock frequency of the processor and \( \gamma \) is a constant parameter, expressed in units of Watts/\( \text{GHz}^3 \).

B. Packet-based Energy Model

For a given router, the dynamic power consumed by the data plane, \( P^D \), is composed of two components, namely the per-packet processing power component, \( P_P \), and the per-byte store and forward power component, \( P_{SKF} \) [11]. Both components are affected by the operational processor frequency, \( f \). \( P_P \) represents the power consumed to process a given packet, regardless of the packet payload size. \( P_{SKF} \), on the other hand, represents the power needed to receive, store, switch and transmit a packet. Contrary to \( P_P \), which only depends on the number of instructions needed to process a packet \((IPP_P)\), \( P_{SKF} \) depends on the packet length, as packets with different lengths require different storage, switching time and transmission time, thereby consuming different amounts of power.

Let \( IPP_{SKF} \) denote the number of instructions required to process, store and forward a byte worth of data. Assuming a packet length of \( L \) bytes, the number of instructions required to process the packet is \( IPP_{SKF} = L \cdot IPP_{SKF} \). Note that \( IPP_{SKF} \) is constant, as it only depends on the number of instructions to process a byte. Therefore, \( IPP_P \) can be expressed as a linear function of \( IPP_{SKF} \), namely \( IPP_P = h \cdot IPP_{SKF} \), where \( h > 0 \).

Let \( IPP \) represent the number of instructions to complete the processing, store, switch and transmission of an entire packet with length \( L \) by an NPU at a given LC. We have \( IPP = IPP_P + IPP_{SKF} = (h + L) \cdot IPP_{SKF} \). The NPU’s processing, storage, switching and transmission time of a packet, \( T_p = \frac{T_{\text{IPPS}}}{{h+L}} \), where \( IPS \) represents the number of instructions executed by the NPU per second. \( IPS \) can be further expressed as \( \frac{\text{IPPS}}{f} \), where \( f \) denotes the operational frequency of the NPU and \( CPI \) represents the number of cycles per instruction. Therefore, \( T_p = \frac{\text{IPPS} \cdot \text{CPI}}{f} = \frac{\text{IPPS} \cdot f}{2} \), where \( \text{CPI} = CPI \cdot IPP_{SKF} \).

Let \( f_{j,i} \) denote the operational frequency of the active NPU \( j \) in LC \( i \). In the proposed scheduler, the derived frequencies may be continuous. In practice, however, the NPU only allows a number of manufacturer-specified discrete operational voltage levels, \( V = \{V_1, V_2, ..., V_M\} \). These discrete levels result in a corresponding set of discrete frequencies, \( F = \{f_1, f_2, ..., f_M\} \). Consequently, \( f_{j,i} \) must be set to the smallest discrete frequency, \( f_l(1 \leq l \leq M)|f_l \geq f_{j,i} \). The dynamic energy consumed by a successful packet transmission with length \( L \) at NPU \( j \) in LC \( i \) is given by:

\[
E_{j,i}(T_p) = \gamma_{j,i} \cdot f_{j,i}^2 \cdot T_p = \gamma_{j,i} \cdot \text{CPI} \cdot f_{j,i}^2 \cdot (h_{j,i} + L) \tag{9}
\]

According to [11], the packet energy consumption can be expressed as \( E_P = E_P + E_{SKF} \cdot L \), where \( E_P \), expressed in \( \text{nJ/packet} \), denotes the per-packet processing energy, and \( E_{SKF} \), expressed in \( \text{nJ/byte} \), denotes the per-byte store and forward energy [11]. Based on Eq. 9, we can compute \( h_{j,i} = \frac{E_{P_{max,j,i}}}{E_{SKF_{max,j,i}}} \) and \( \gamma_{j,i} = \frac{E_{SKF_{max,j,i}}}{h_{j,i} \cdot f_{max,j,i}} \). Thus, the above packet-based dynamic energy consumption can be rewritten by Eq. 10.

\[
E_{j,i}(T_p) = \left( \frac{E_{P_{max,j,i}} + E_{SKF_{max,j,i}}}{f_{max,j,i}} \cdot L \right) \cdot f_{j,i}^2 \tag{10}
\]

C. Router-based Energy Model

Assume that a router is equipped with \( \Psi \) LCs, \( 1 \leq i \leq \Psi \), is equipped with \( n_i \) active NPUs. According to the power model, the energy consumption of the router, over \( T \), can be expressed as \( E(T) = E^S(T) + E^D(T) \), where \( E^S(T) \) and \( E^D(T) \) represent the energy consumed due to static power and dynamic power, repetitively, during time \( T \). These energy components can be expressed as:

\[
E^S(T) = P_S \cdot T
\]
\[
E^D(T) = \sum_{i=1}^{\Psi} \sum_{j=1}^{n_i} E_{j,i}^D(T) \tag{11}
\]

\( E^D \) represents the energy consumed by NPU \( j \) in LC \( i \) due to dynamic power, over \( T \). Let \( Z_{j,i} \) be the amount of time
interval \( \tau \) at NPU \( j \) in LC \( i \) over \( T \), and \( \tau_1, \ldots, \tau_k, \ldots, \tau_{Z_{j,i}} \), represent the frequency time slots at NPU \( j \) in LC \( i \). Assuming \( f_{j,i} \) is the initial frequency. The frequency of NPU \( j \) at LC \( i \), over the \( k^{th} \) time slot, \( \tau_k \), is \( f_{j,i} \cdot \tau_k \), where \( 1 \leq i \leq \Psi, 1 \leq j \leq n_i \), and \( 1 \leq k \leq Z_{j,i} \). Let \( D_{j,i} \) denote the number of packets serviced by NPU \( j \) in LC \( i \) over the interval \( \tau_k \), and \( E_{j,i} \) represent the average length of the packets serviced at NPU \( j \) in LC \( i \). According to Eq. 10, the total dynamic energy consumed by NPU \( j \) in LC \( i \) over \( T \) can be expressed as:

\[
E_{j,i}^{D}(T) = \sum_{k=1}^{Z_{j,i}} \left( E_{p_{\text{max}} j,i} + E_{c_{k} F_{\text{max}} j,i} \cdot \tau_{j,i} \right) \cdot D_{j,i} \cdot \tau_k
\]

Therefore, the entire energy consumption of the router over \( T \) can be derived as:

\[
E(T) = E_{D}^{S} \cdot \tau + \sum_{i=1}^{\Psi} \sum_{j=1}^{n_i} \sum_{k=1}^{Z_{j,i}} \left( E_{p_{\text{max}} j,i} + E_{c_{k} F_{\text{max}} j,i} \cdot \tau_{j,i} \right) \cdot D_{j,i} \cdot \tau_k
\]

Eq. 13 demonstrates that adjusting the frequency, as opposed to using the maximum frequency, further reduces the energy consumption. The following simulation study will be used to further explore the impact of dynamically adjusting frequencies on the energy consumption.

V. SIMULATION AND ANALYSIS FRAMEWORK

In this section, we present a simulation framework to assess the performance of the Delay-aware scheduling scheme discussed. The energy model is incorporated in a NS2-based network simulation platform to carry out an extensive performance analysis study of the proposed scheduler, focusing on: (i) sensitivity of the proposed scheme to critical design parameters; (ii) comparative analysis of the proposed scheduler’s performances to other similar schedulers, in different network environments and traffic loads.

A. Simulation Environment

In our simulation framework, we consider two network topology models: i.e. dumbbell and parking lot, as displayed in Fig.3, which are two promising models to capture the behavior and performance of a large variety of Internet applications [13]–[16]. \( S \) and \( D \) denote end-hosts, and the intermediate nodes between \( S \) and \( D \) are energy-saving routers. The capacities of links between all the routers are 10 Gbps. The routers implement FIFO scheduling and DropTail queuing. The propagation delays between the sources and the destinations are 30 ms. In each router, all LCs are configured with multiple NPU, each using a specific QoS-aware DVFS scheduler. In order to simulate real scenarios, Huawei CX600-X3 Metro Router model [11], supporting 10GE LCs, is used. We further assume that each 10GE port provides 250 ms worth of traffic buffering. This results in processor buffers of approximately 250ms \times 10Gbps, which is roughly 250000 packets, assuming the average packet size of 1250 bytes.

<table>
<thead>
<tr>
<th>TABLE I: Main simulation parameters and conditions.</th>
</tr>
</thead>
<tbody>
<tr>
<td>Items</td>
</tr>
<tr>
<td>NIC port</td>
</tr>
<tr>
<td>Operating Frequency (GHz)</td>
</tr>
<tr>
<td>EFP_{\text{max}} (nJ/pkt)</td>
</tr>
<tr>
<td>EB_{\text{Sk}} F_{\text{max}} (nJ/byte)</td>
</tr>
<tr>
<td>Packet Size</td>
</tr>
<tr>
<td>Queue</td>
</tr>
<tr>
<td>Topology Model</td>
</tr>
<tr>
<td>Network Traffic Load ( \rho )</td>
</tr>
<tr>
<td>Propagation Delay (ms)</td>
</tr>
<tr>
<td>Traffic Model</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>TABLE II: Traffic source models and specifications.</th>
</tr>
</thead>
<tbody>
<tr>
<td>Flow Type</td>
</tr>
<tr>
<td>Video</td>
</tr>
<tr>
<td>VoIP</td>
</tr>
<tr>
<td>Exp VBR</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>TABLE III: Four combinations of ( q_{t_l}, q_{t_h} ).</th>
</tr>
</thead>
<tbody>
<tr>
<td>( q_{t_l} = 60% \cdot Q )</td>
</tr>
<tr>
<td>( q_{t_l} = 4% \cdot Q )</td>
</tr>
<tr>
<td>( q_{t_l} = 10% \cdot Q )</td>
</tr>
</tbody>
</table>

TABLE I describes the main simulation parameters used in this simulation study. According to [18], [19], TABLE II specifies the traffic source models, namely one constant bit rate (CBR) and two variable bit rate (VBR) models, including Pareto or Exponential On/Off distribution traffic. TABLE IV lists four combinations of \( q_{t_l} \) and \( q_{t_h} \). In addition to the proposed QLDA scheduler and the Load-aware scheduler, EWMAP [9], we also implemented a generic scheduler, referred to as NoDVFS, that operates network devices at their maximum frequency. NoDVFS provides a baseline to compare the performance of energy-aware schedulers. Two QoS-aware DVFS-based schemes, i.e. QLDA and EWMAP, are compared in this paper.

The ITU G.114 specification recommends less than 150 ms one-way end-to-end delay for high-quality real-time traffic such as voice and video. Therefore, we consider the following QoS requirements [20], to evaluate energy saving percentage (ESP), average end-to-end delay (AED) of the all schemes in our simulation:

- 150 ms as the one-way average end-to-end delay threshold,
- 10 ms as the delay jitter bound (DJB),
- 1% as the packet loss rate (PLR) threshold.
B. Sensitivity to the main parameters of QLDA

In this section, we carried out a series of experiments to do the sensitivity analysis of the proposed QLDA scheme to different parameters.

1) Sensitivity to $\eta$: The first experiment is designed to study the sensitivity of the scheduler to the aggressivity factor $\eta$. The results show that the value of the aggressivity factor to achieve the highest energy saving depends on the network load. In order to determine the “optimum” $\eta^*$, a series of simulation experiments where carried out, whereby for a given network load, $\rho$, multiple values of $\eta$ are tested and the value which produces the highest energy saving, without violating the traffic QoS requirements, is selected. The experiments used two different network models, namely dumbbell and parking lot, assuming $f_{min} = 1.6$ GHz and $f_{max} = 2.4$ GHz. Using Matlab, the two independent variables, $\eta$ and $\rho$, are fitted by a Gaussian process regression (GPR) function of successive approximations, as illustrated Eq. 14. In this equation, $(a_1, b_1, c_1)$ and $(a_2, b_2, c_2)$ are GPR model parameters, where $(a_1, b_1, c_1) = (4.4, 0.6085, 0.1805)$ and $(a_2, b_2, c_2) = (-2.745, 0.6554, 0.1466)$. The fitted curve is depicted in Fig.4.

$$\eta(\rho) = \eta_1(\rho) + \eta_2(\rho) = a_1 \cdot e^{-\left(\frac{\rho - b_1}{c_1}\right)^2} + a_2 \cdot e^{-\left(\frac{\rho - b_2}{c_2}\right)^2}$$

(14)

Using the load-dependent values of $\eta$, generated by the GPR function, the two network topology models, dumbbell and parking lot, were used to assess the performance of QLDA and determine the levels of energy saving it achieves, under different network loads, while maintaining acceptable QoS requirements. The results of these experiments are shown in TABLE IV.

2) Sensitivity to $\tau$: The second experiment is designed to study the scheduler’s sensitivity to the rate of DVFS adjusting. The values of $\eta$ for different traffic loads refer to TABLE IV. Under a frequency range, $[f_{min}, f_{max}]$, a small frequency adjustment interval creates more opportunities for a more accurate adjustment of the frequency, based on the queue length. A small interval, however, increases the frequency adjustment overhead. A large frequency adjustment interval reduces the overhead required to adjust frequencies, but fails to capture more accurately the current level of congestion. Fig.5 (a) and (b) depict the energy-saving percentage and the corresponding average end-to-end packet delay for the different network models, using different frequency adjustment interval, $\tau$, under the range of $[0.01, 100]$ ms. The results show that QLDA, assuming $q_l = 4\%, q_h = 80\% \times Q$, is not sensitive to $\tau$ when the value of $\tau$ is under 1 ms. Therefore, $\tau = 1$ ms is selected for the rest of experiments.

3) Sensitivity to $c_q$: QLDA scheme uses EWMA based algorithm with weight, $w^q$, to predict the queue length. The constant parameter $c_q$ is used in the error prediction function to adaptively adjust $w^q$. Different values of $c_q$ in the range $[0.01, 0.50]$ are tested in the QLDA scheme. The results show that the energy saving and the average end-to-end packet delay are not sensitive to $c_q$. When the value of $c_q$ increases,
the energy saving increases slightly. Therefore, $c_q = 0.5$ is selected for the following analysis.

4) Sensitivity to $c_d$: Similarly, QLDA uses the EWMA algorithm to predict the average packet delay. The constant parameter $c_d$ is used in the error prediction function to dynamically adjust the smooth filter $w$. Different values of $c_d$ in the range $[0.01, 0.5]$ are tested in the QLDA scheme. The results show that the energy saving and the average end-to-end packet delay are not sensitive to $c_d$. When the value of $c_d$ decreases, the energy saving increases slightly. Therefore, $c_d = 0.01$ is considered.

5) Sensitivity to $q_l$ and $q_h$: In this experiment, the value setting of $\eta$ for different traffic load is based on the above GPR function, as shown in Fig.4, and the value of $\tau$ is set to 1 ms, while the thresholds, $q_l$ and $q_h$, are varied, as described in TABLE III. Four combinations of $q_l$ and $q_h$ are tested to study the impact of the queue length thresholds on energy saving and average packet delay. The results, shown in TABLE V, indicate that, although the energy saving is not highly sensitive to $q_h$, a higher value of $q_h$ leads to higher energy saving. The results also show that packet delay is highly sensitive to $q_l$, as the delay increases dramatically when the value of $q_l$ increases. Therefore, adjusting queue-length thresholds can improve the effectiveness and efficiency of the QLDA scheme. The results show that for a dumbbell topology, the ratio $q_l : q_h = 4\% : 80\%$ leads to the highest energy savings, without violating QoS requirements. A similar outcome can be observed in the case of a parking lot model.

C. Comparative analysis

In [9], Mandviwalla et al. propose three Load-aware predictors to reduce energy consumption in LCs, in which the most effective Load-aware predictor, called EWMAP, uses EWMA algorithm with a fixed load smooth filter, $\mu = 0.2$ is recommended in the EWMAP scheme [9]), to predict traffic.
load over a constant prediction interval, \( \tau \) (i.e. \( PI \) in [9]), to control the execution rates of LCs, aiming to achieve energy saving. In the Load-aware schemes, choosing a small prediction period in the Load-aware schemes could suffer the overhead impact of the back-to-back undesirable DVFS adjustment. Different values of the prediction period, \( \tau \), in the same range \([0.01, 100]\) ms as the QLDA scheme are tested to determine sensitivities of the EWMAP scheduler to DVFS adjustment. Different from QLDA, the EWMAP scheme exhibits sensitivity to \( \tau \). The results show that the EWMAP scheme can achieve the largest energy saving without QoS violence when \( \tau \) is set as 1 ms.

Using the same router-based energy model, we compare the proposed QLDA scheme with the EWMAP scheme under \( \tau = 1 \) ms in two different network models, as shown in Fig.6 (a) and (b). We found that two network models have the same trend in the energy saving and the average end-to-end packet delay in both QoS-aware schemes respectively. The results show that these two QoS-aware schemes are potential to save significant energy. Under the same QoS requirements, the QLDA scheme with \( q_l : q_h = 4% : 80% \) can provide up to 9.82% energy saving with AED of 122.30 ms and DJB of 1.73 ms, and 9.76% energy saving with AED of 116.89 ms and DJB of 1.24 ms, in the dumbbell model and the parking lot model, respectively. Although the EWMAP scheme leads to an increase in energy saving, from 2% to 7%, under different traffic loads, the results show that QLDA achieves up to 5% increase in energy saving than EWMAP, without violating QoS requirements.

VI. CONCLUSION

In this paper, we propose a Delay-aware DVFS-based scheduler, which scales frequency and achieves energy saving, while meeting QoS requirements. When and how decisions are made to adjust the router execution rates are based on the predicted queue-length and the target packet delay. A simulation framework, including a comprehensive router energy model, which accounts for both static and dynamic energy consumption, is proposed to investigate and compare the performance of the proposed scheme to other similar QoS-aware DVFS-based schemes. Different networking topologies and traffic models are used to carry out sensitivity analysis of the proposed scheme with respect to its main parameters, assess its performance in different network environments, and perform comparative analysis with other schemes. The simulation results show that the proposed QLDA scheme achieves high energy-saving, without violating the QoS requirements of the supported applications. More specifically, the results show that QLDA outperforms load-aware energy-saving schemes and can achieve up to 10% energy saving, while meeting the desired QoS performance. Overall, the results indicate that load-aware schemes, such as EWMAP, are limited when it comes to achieving energy savings without violating QoS requirements. On the other hand, queue length based, delay-aware schedulers, such as QLDA, using adequate queue length thresholds and load-dependent aggressivity factors, can control frequency scaling to achieve a balance between energy saving and QoS requirements.

ACKNOWLEDGMENT

This material is based in part upon work supported by the National Science Foundation under Grants Number CNS-011418 and CNS-1162159. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.

REFERENCES