On multiprocessor temperature-aware scheduling problems

We study temperature-aware scheduling problems under the model introduced in [Chrobak et al. AAIM 2008], where unit-length jobs of given heat contributions and common release dates are to be scheduled on a set of parallel identical processors. We consider three optimization criteria: makespan, maximum temperature and (weighted) average temperature. On the positive side, we present polynomial time approximation algorithms for the minimization of the makespan and the maximum temperature, as well as, optimal polynomial time algorithms for minimizing the average temperature and the weighted average temperature. On the negative side, we prove that there is no approximation algorithm of absolute ratio \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{4}{3}-\epsilon $$\end{document} for the problem of minimizing the makespan for any \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon > 0$$\end{document}, unless \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal{P}=\mathcal{NP}$$\end{document}.


Introduction
The exponential increase in the processing power of recent (micro)processors has led to an analogous increase in the energy consumption of computing systems of any kind, from compact mobile devices to large scale data centers. This has also led to vast heat emissions and high temperatures affecting the processors' performance and reliability. Moreover, high temperatures reduce the lifetime of chips and may permanently damage the processors. For this reason, manufacturers have set appropriate temperature thresholds for their processors and use cooling systems to control the temperature below these thresholds. However, the energy consumption and heat emission of these cooling systems have to be added to that of the whole system.
The issues of energy and thermal management in the (micro)processor and system design levels date back to the first computer systems. During the last few years these issues have been also addressed at the operating system's level, generating new interesting questions. In this context the operating system has to decide the order in which the jobs should be scheduled so that the system's temperature (and/or energy consumption) remains as low as possible, while at the same time some standard user or system oriented criterion (e.g., makespan, response time, throughput, etc) is optimized. Clearly, the minimization of the temperature and the optimization of the scheduling criteria are typically in conflict, and several models have been proposed in the literature to analyze such conflicts and trade-offs. A first model is based on the speed-scaling technique for energy saving and the Newton's law of cooling; see for example Bansal et al. (2007); Atkins et al. (2011) as well as recent reviews on speed-scaling in Irani and Pruhs (2005); Albers (2010Albers ( , 2011. In another model proposed in Zhang and Chatha (2007), a thermal RC circuit is utilized to capture the temperature profile of a processor.
In this study, we adopt the simplified model for cooling and thermal management introduced by Chrobak et al. (2008), who were motivated by Yang et al. (2008). In particular, they consider a set of unit-length jobs (corresponding to slices of the processes to be scheduled), each one with a given heat contribution, and model the thermal behavior of the system as follows: if a job of heat contribution h is executed on a processor within a time interval [t − 1, t), t ∈ N, and the temperature of the processor at time t − 1 is , then the processor's temperature at time t is +h 2 . Although in practice the heat contribution of the executed jobs and the cooling effect are spread over time Zhou et al. (2010), the authors in Chrobak et al. (2008) consider the above simplified discrete model in which the heat contribution of the job to be executed is first added to the current temperature and then this sum is halved, in order to take into account the cooling effect.
In Chrobak et al. (2008), the authors study the problem of scheduling a set of unit-length jobs with release dates and deadlines on a single processor so as to maximize the throughput, i.e., the number of jobs that meet their deadlines, without exceeding a given temperature threshold θ at any time t ∈ N. Extending the well-known three-field notation for scheduling problems Graham et al. (1979), this problem is denoted as 1|r i , p i = 1, h i , θ| U i . They prove that this problem is NP-hard even for the special case when all jobs are released at time 0 and their deadlines are equal, i.e., 1| p i = 1, h i , θ| U i . Furthermore, in the presence of release dates and deadlines it is shown that a family of reasonable list scheduling algorithms, including the coolest first and earliest deadline first algorithms, have a competitive ratio of at most two. This result implies also an approximation factor of two for the off-line problem. In the negative side, they also give an instance that shows that there is no deterministic on-line algorithm with competitive ratio less than two.
The same model has been also adopted by Birks et al. (2010Birks et al. ( , 2011a where online algorithms for several generalizations of the throughput maximization problem have been studied. In fact, in Birks et al. (2010) the cooling effect is generalized by multiplying the temperature by 1/c, where c > 1, instead of one half. In Birks et al. (2011a) the weighted throughput objective is considered, while in Birks et al. (2011b) the jobs have equal (non-unit) processing times.
Our problems and results. Under the thermal model of Chrobak et al. (2008), we initiate the study of scheduling a set J = {J 1 , J 2 , . . . , J n } of n jobs on a system of m identical processors, unlike the previous works that study only single processor systems. All jobs have common release dates and unit processing times, and for each one of them we are given a heat contribution h i , 1 ≤ i ≤ n. Let h max = max{h i , J i ∈ J } be the maximum heat contribution among all jobs. We consider each job J i executed in a time interval [t − 1, t), t ∈ N, which we call slot t, on some processor. By j t we denote the temperature of processor j at time t. As in , if we start executing job J i at time t − 1, then The initial temperature of each processor (the ambient temperature) is considered to be zero, i.e., In what follows, we simplify the notation using t instead of j t , when the processor is specified by the context. We consider two natural variants of the above model: The threshold thermal model. In this model, a given threshold θ on the temperature of the processors cannot be violated at any time t ∈ N. This is the case with the throughput maximization problems studied in Chrobak et al. (2008); Birks et al. (2010Birks et al. ( , 2011a. It is clear that, for a given instance in this model, a feasible schedule may exist only if h i ≤ 2 · θ for each job J i . By normalizing the values of h i 's and θ we can assume w.l.o.g. that 0 < h i ≤ 2 and θ = 1, as in Chrobak et al. (2008). Moreover, if a processor at time t − 1 has temperature t−1 and it holds that t−1 +h i 2 > 1, for every job J i that has not yet been scheduled, then this processor will remain idle for the slot [t − 1, t) and its temperature at time t will be reduced by half, i.e., t−1 2 . Note also that once a processor has executed some job(s), its temperature will never become exactly zero. Therefore, in this model, a feasible instance cannot contain more than m jobs of heat contributions equal to 2, as there are m slots with 0 = 0 (the first slots in each one of the m available processors). Under this model we study the makespan minimization problem, that is P| p i = 1, h i , θ|C max . The optimization thermal model. In this model, no explicit threshold on the processors' temperature is given. The lack of such a threshold is counterbalanced by studying the problems of minimizing the maximum and average temperature of a schedule. For any instance in this model, any schedule of length at least n m is feasible, independently of the range of the jobs' heat contributions. However, the optimum value of our objectives depends on the time available to execute the given set of jobs: the maximum or average temperature of a schedule of length equal to n m is, clearly, greater than that of a schedule of longer length, where we are allowed to introduce idle slots. In what follows, we are interested in minimizing these two objective functions with respect to a given schedule length (makespan or deadline) of d ≥ n m . Such a schedule will contain md − n idle slots and we can consider them as executing md − n fictitious jobs of heat contribution equal to zero. This length d is part of our problems' instances, denotes the time available to complete the execution of all the jobs and represents the need to complete them within a given time at the price of higher temperatures. Thus, in both problems we consider under this model (minimizing the maximum and the average temperature) we are accounting the temperatures at the end of any of the md slots available on the m processors. The problems of minimizing the maximum and average temperature we consider under this model are denoted by The complexity of our problems is strongly related to the complexity of the throughput maximization problem studied in Chrobak et al. (2008). It is already mentioned in Chrobak et al. (2008), that the NP-hardness of the maximum throughput problem of scheduling jobs with common release dates and deadlines on a single processor 1| p i = 1, h i , θ| U i implies the NP-hardness of our makespan minimization problem 1| p i = 1, h i , θ|C max . In fact, the decision version of the latter problem asks for the existence of a feasible schedule where all jobs complete their execution by some given deadline d. Moreover, the decision version of the maximum temperature problem on a single processor 1| p i = 1, h i , d| max asks for the existence of a schedule where all jobs complete their execution by some given deadline d without exceeding a given temperature threshold θ . Therefore, the same reduction gives NP-hardness for both makespan and maximum temperature minimization problems. The NP-hardness for our problems on an arbitrary number of parallel processors follows trivially.
Given these NP-hardness results, in this paper we focus on approximation algorithms and inapproximability results for the above-mentioned problems, under the threshold and optimization thermal models for the case of multiple processors. We start in Sect. 2 with the problem P| p i = 1, h i , θ|C max of minimizing the schedule length (makespan) in the threshold thermal model. We first prove that this problem cannot be approximated within an absolute ratio less than 4/3. Then, we present a generic algorithm of approximation ratio 2ρ, where ρ is the approximation ratio of an algorithm A for the classical makespan problem on parallel machines, used as a subroutine in our algorithm. This leads to a (2 + )approximation ratio within a running time that is polynomial in n but exponential in 1/ for m processors (using the known PTAS's for minimizing makespan), and a 2-approximation ratio for a single processor, within O(n log n) time. If in the place of algorithm A we use the standard LPT ( 4 3 − 1 3m )approximation algorithm, we are able to give a tighter analysis, improving the 2ρ-approximation ratio to 7 3 − 1 3m , while the overall running time is O(n log n). Then in Sects. 3 and 4, we move to the optimization thermal model. In Sect. 3, we study the problem P| p i = 1, h i , d| max of minimizing the maximum temperature of a schedule, and we give a 4/3 approximation algorithm. In Sect. 4, we prove that the problem P| p i = 1, h i , d| j t of minimizing the average temperature of a schedule, as well as a time-dependent weighted version of this problem are both solvable in polynomial time. We conclude in Sect. 5.

Makespan minimization
In this section, we study the approximability of makespan minimization under the threshold thermal model, that is, We start with a negative result on the approximability of our problem. The proof of the next theorem is along the same lines with the NP-hardness reduction for the throughput maximization problem under the same model Chrobak et al. (2008).
Theorem 1 There is no polynomial time algorithm achieving an absolute approximation ratio better than 4/3 for the minimum makespan problem P| p i = 1, h i , θ|C max , unless P = N P.
Proof We give a reduction from Numerical 3-Dimensional Matching (N3DM) where we are given three sets A, B, C of n integers each and an integer β, and the question is whether A ∪ B ∪C can be partitioned into n disjoint triples (a, b, c) ∈ A × B × C such that each triple contains exactly one integer from each of A, B, C, and a + b + c = β for each triple. W.l.o.g., we assume that x∈A∪B∪C x = βn and x ≤ β for each x ∈ A ∪ B ∪ C. The N3DM problem is known to be NP-complete (see Garey and Johnson (1979)).
Given an instance I of N3DM, we construct an instance I of P| p i = 1, h i , θ|C max consisting of n processors and 3n jobs, one for each integer in The reduction works by showing that it is hard to decide whether the optimal schedule is of length three or not.
Claim There is a N3DM for instance I if and only if there is a feasible schedule for the instance I of P| p i = 1, h i , θ|C max of length three.
in this solution, we schedule in the i-th processor the jobs corresponding to a i , b i and c i in the first, second, and third slots, respectively. For the temperatures, a i , b i , c i , of the i-th processor after each one of those executions we have and hence there is a feasible schedule of length three.
(⇐) Assume, now, that there is a feasible schedule of length three. In this schedule there are exactly three jobs in each processor, since there are 3n jobs in total.
If a job corresponding to an integer a ∈ A is scheduled to the second slot of a processor, then the temperature threshold θ = 1 is violated after the third slot of this processor. Indeed the temperature at this slot will be at least In a similar way, we can show that a job corresponding to an integer a ∈ A cannot be scheduled to the third slot of a processor: Hence, each of the n jobs corresponding to one of the n integers a ∈ A is scheduled to the first slot of a processor. Moreover, we can show that a job corresponding to an integer b ∈ B cannot be scheduled to the third slot of a processor: In all, in each processor exactly three jobs are scheduled: a job a ∈ A in the first slot, a job b ∈ B in the second slot, and a job c ∈ C in the third slot. Therefore, the jobs of a processor correspond to a feasible triple for N3DM.
To finish our proof, we have to show that each triple sums up to β. If this does not hold then there is a triple (a, b, c) for which a + b + c > β, since x∈A∪B∪C x = βn.The temperature of the third slot of the processor in which the corresponding jobs to this triple are scheduled is which is a contradiction that there is a feasible schedule. This completes the proof of Theorem 1 since an approximation ratio better than 4/3 would be able to decide the N3DM problem.
Note that the result of Theorem 1 allows the possibility of an asymptotic PTAS or even an additive constant approximation ratio.
In what follows in this section, we present an approximation algorithm for the minimum makespan problem. Note that, in order to respect the temperature threshold, a schedule may have to contain idle slots. To argue about the number of idle slots that are needed before the execution of each job, we will introduce first an appropriate partition of the set of jobs according to their heat contribution. In particular, for each integer k ≥ 0, we can argue separately for jobs whose heat contribution belongs to the interval(2 − 1 2 k−1 , 2 − 1 2 k ]; recall that h i ≤ 2, for 1 ≤ i ≤ n. Moreover, the interval to which a job of heat contribution h i belongs to is indexed by k i , that is Our algorithm and its analysis are based on the following proposition for the structure of any feasible schedule. Proof (i) Consider a feasible schedule that has less than min{n , m} jobs in J executed in the first slot of the processors. Assume, first, that in this schedule there is a processor, p, in which a job J i ∈ J \ J is executed in its first slot and there is at least one job of J executed in p. Let J j ∈ J be the earliest of these jobs which is executed in slot s > 1. By swapping the jobs J i and J j , the temperature s of processor p after slot s is decreased. Indeed, let s be the temperature of processor p after slot s and be the contribution of jobs executed in slots 2, 3, . . . , s − 1 to s , that is s = h i 2 s + + h j 2 . After the swap it holds that s = h j 2 s + + h i 2 < s , since h i < h j . Thus, the temperature of any slot s ≥ s in p is decreased. Moreover, by assumption, each slot s , 2 ≤ s ≤ s − 1, of p executes a job in J \ J . Hence, no new idle slots are required for these jobs, although the temperature before their execution is increased. Therefore, the new schedule is feasible and it has the same length. If there is not such a processor, then let J i ∈ J \ J be a job executed in the first slot of some processor p and J j ∈ J be a job executed in s-th, s > 1, slot of processor q. By swapping the jobs J i and J j the temperature of any slot s ≥ s of processor q is decreased as h i < h j . Moreover, by assumption, the processor p contains only jobs in J \ J , and, as in the previous case, no new idle slots are required for these jobs. Therefore, after the swap we get a feasible schedule of the same length.
(ii) Consider a schedule that is feasible up until the execution of the job preceding J i . Let x be the number of idle slots before the execution of job J i and let be the temperature of the processor before the first of these x slots. Since the schedule is feasible before J i , we have that ≤ 1. The temperature will become 2 x , after the last idle slot, and 2 x +h i 2 after the execution of job J i . For such a schedule to be feasible we need that 2 x +h i 2 ≤ 1, This means that with at least k i idle slots, feasibility is ensured. (iii) Let t be the temperature of the processor before executing J j . Next, after the execution of J j we have t+1 = t +h j 2 . Then, after x slots (idles or executing jobs of heat contribution h ≤ 1) we get a temperature t+x+1 ≥ t +h j 2 · 1 2 x . In order for J i to be executed in the next slot, it should hold that t+x+1 In what follows we consider instances with n > m, for otherwise the problem becomes trivial. By Proposition 1(i), we also assume that the number of jobs of heat contribution h i > 1 is greater than m. If this is not the case, all jobs can be executed without any idle slot before them and the length of an optimal schedule is exactly n m . We consider the jobs in non-increasing order of their heat contributions, i.e., h 1 ≥ h 2 ≥ . . . ≥ h n , and we define A = {J 1 , J 2 , . . . , J m } and B = {J m+1 , J m+2 , . . . , J n }. Our algorithm schedules first the jobs in A to the first slot of each processor. Each one of the jobs in B is scheduled by leaving before its execution exactly k i idle slots, according to the Proposition 1(ii). In this way, our problem, for the jobs in B, is transformed to an instance of the classical makespan problem on parallel machines, P||C max , where the processing time of each job is p i = k i + 1, that is, k i idle slots plus its original unit processing time. Then, these jobs are scheduled using any known approximation algorithm A for P||C max .
From now on we fix an instance of our problem and we denote by SO L the length of the schedule S provided by Algorithm MAX_C and by O PT the length of an optimal schedule S * for our original scheduling problem. For the presentation and the analysis of our algorithm, we denote by I B and I + B the instances of P||C max consisting only of jobs in B with processing times p i = k i and p i = k i + 1, respectively, for each J i ∈ B. For an instance I of P||C max , we denote by S(I) the schedule found by an algorithm A and by C(I) the length of this schedule. In a similar way, we denote by S * (I) and C * (I) an optimal schedule for P||C max and the length of this optimal schedule, respectively.
Clearly, SO L = 1 + C(I B + ). To analyze our Algorithm MAX_C, we need a lower bound on the optimal makespan. To derive this bound we will utilize an optimal schedule S * (I B ). Note that for jobs with h i ∈ (0, 1], k i = 0, hence the schedule S * (I B ) involves only jobs for which h i > 1.

Lemma 1 For the optimal makespan it holds that
The first bound on the optimal makespan follows trivially by considering all jobs requiring a single slot for their execution.
For the second bound, let A * , |A * | = m, be the set of jobs executed in the first slot of the m processors in an optimal solution and B * = J \ A * .
Consider, first, an auxiliary schedule of length O PT − , identical to the optimal apart from the fact that each job in B * ∩ A has been replaced by a different job in A * ∩ B. Observe that in this schedule, the jobs executed in the first slot of the processors remain A * while the jobs executed in the remaining slots are the jobs in B. Since each job in B has smaller or equal heat contribution than any job in A, it follows that O PT ≥ O PT − . Consider, next, the schedule S * (I B ). For this schedule it holds that, O PT − ≥ 1 + C * (I B ), since by Proposition 1(i),(iii) each job in B requires at least k i slots to be executed; recall that we consider instances where the number of jobs of heat contribution h i > 1 is greater than m and that jobs in B with h i ≤ 1, and hence k i = 0, do not appear in the schedule S * (I B ).
It is well-known that the P||C max problem is strongly NP-hard and a series of constant approximation algorithms and PTASs have been proposed. Our main result in this section is that in step 4 of Algorithm MAX_C we can use any algorithm A for P||C max to obtain twice the approximation ratio of A for our problem.
Theorem 2 Algorithm MAX_C achieves a 2ρ approximation ratio for P| p i = 1, h i , θ|C max , where ρ is the approximation ratio of the algorithm A for P||C max .
Proof A ρ-approximation algorithm A implies that . To obtain an upper bound to C * (I + B ) we start from the schedule S * (I B ). The processing times of jobs in the latter schedule are reduced by one with respect to the former one, and the jobs in B with h ≤ 1 do not appear in schedule S * (I B ). Let B ⊆ B be this set of jobs.
We transform the schedule S * (I B ) to a new schedule S (I + B ) in two successive steps: (i) we increase the processing time of jobs in B \ B from k i to k i + 1, and (ii) we introduce the jobs in B with unit processing time, at the end of the resulting schedule in a first-fit manner. Clearly, for the length, C (I + B ), of this new schedule it holds that C * (I + B ) ≤ C (I + B ) as both of them refer to the same instance I + B . Let us now bound C (I + B ) in terms of C * (I B ).
, then we consider the construction of S (I + B ) and we argue about the completion time of a critical processor in S * (I B ), i.e., the processor that finishes last. By step (i), the length of schedule S * (I B ) increases at most twice, since each job in B \ B has processing time at least one and this is increased by 1. For the case of a single processor the 1||C max problem is trivially polynomial, whereas for multiple processors there are well-known PTAS's, e.g., Hochbaum and Shmoys (1987); Alon et al. (1998). Hence, the main implication of Theorem 2 is Corollary 1 For any > 0, there is a (2+ )-approximation algorithm for P| p i = 1, h i , θ|C max . For the case of a single processor, there is an algorithm that achieves an approximation ratio of 2.
To obtain the ratio of 2 + , as stated above, one needs to use a PTAS for the classical makespan problem in step 4 of Algorithm MAX_C, resulting in a running time that is exponential in 1/ . To achieve more practical running times, we can investigate the use of other algorithms for step 4. In particular, if the standard Longest Processing Time (LPT) algorithm is used, then Theorem 2 leads to a 2( 4 3 − 1 3m ) approximation ratio within O(n log n) time. Recall that the LPT algorithm greedily assigns the next job (in nonincreasing order of their processing times) to the first available processor Graham (1969). In the next theorem we are able to improve this ratio to 7/3, based on an LPT oriented analysis of Algorithm MAX_C.

Theorem 3 Algorithm MAX_C using the LPT rule in step 4 achieves an approximation ratio of
Proof Our proof follows the standard analysis given in Graham (1969), for the classical multiprocessor scheduling problem. For the lower bound on the length of an optimal schedule we use Lemma 1 and the fact that C * (I B ) ≥ To upper bound the length SO L of the schedule S returned by Algorithm MAX_C we consider the job J which finishes last in S. Clearly > m, for otherwise there are at most m jobs to be scheduled and the problem becomes trivial.
The job J will start being executed not later than , and hence, it holds that Thus, we get SO L ≤ 2O PT − 1 + 1 − 1 m (k + 1). If k ≤ O PT/3, then the theorem follows directly. If k > O PT/3, then we consider the subinstance, I , of the original problem that contains only the jobs of heat contribution at least h , i.e., J = {J 1 , J 2 , . . . , J }. Obviously, 3 and k ≥ 1, as k is an integer. Moreover, for the length of an optimal schedule, C * (I ), of the subinstance I it holds that C * (I ) ≤ O PT . As > m, the lengths of the schedules returned by Algorithm MAX_C for instances I and I are equal, i.e., C(I ) = SO L. Hence, In an optimal schedule of I there are at most three jobs in each processor, for otherwise, if there is a processor with four assigned jobs, the length of that schedule will be, by Proposition 1(iii), at least 1 + 3k > O PT , a contradiction. Hence, ≤ 3m.
Algorithm MAX_C schedules the jobs of I as follows: the job J i , 1 ≤ i ≤ m, is scheduled to the first slot of processor i, the job J m+i , 1 ≤ i ≤ m, to the (1 + (k m+i + 1)) − th slot of processor i and job J 2m+i , 1 ≤ i ≤ m, accordingly to the LPT rule.
If m < ≤ 2m, then the length of the above schedule is C(I ) = 1 + (k m+1 + 1) = 2 + k m+1 . By Lemma 1 it follows that C * (I ) ≥ 1 + k m+1 , since there is a processor executing at least two jobs in {J 1 , J 2 , . . . , J m+1 }. Hence, , then the Algorithm MAX_C schedules in the first processor either the jobs J 1 and J m+1 or the jobs J 1 , J m+1 and J . In the first case, the job J starts its execution not later than the slot 1 + (k m+1 + 1), for otherwise J would have been scheduled by Algorithm MAX_C in processor 1, that is C(I ) ≤ 1 + (k m+1 + 1) + (k + 1). In the second case, J is the job that finishes last, that is C(I ) = 1 + (k m+1 + 1) + (k + 1). Thus, in both cases it holds that C(I ) ≤ 3 + k m+1 + k .
For an optimal schedule for I , Lemma 1 implies as before that C * (I ) ≥ 1 + k m+1 . Moreover, in such a schedule there is a processor with at least three jobs, and hence C * (I ) ≥ 1 + 2k . Combining these two bounds we get C * (I ) ≥ 1 + k m+1 2 + k . Therefore, we get SO L O PT ≤ C(I ) C * (I ) ≤ 6+2k m+1 +2k 2+k m+1 +2k . This ratio is decreasing with k and as k ≥ 1 we get SO L O PT ≤ 8+2k m+1 4+k m+1 = 2, and the proof is completed.
Note that the 4 3 − 1 3m -approximation ratio of the LPT algorithm for the classical makespan problem on parallel machines is tight. Concerning the tightness of our algorithm, we are able to give an instance where it achieves a 2-approximation ratio. This instance consists of m(k + 2) jobs: a set J 1 of m jobs of heat contribution h i = 2, a set J 2 of m jobs of heat contribution h i = 2 − 3 2 k+1 , and a set J 3 of mk jobs of heat contribution h i = 1 2(2 k −1) . An optimal solution for this instance is to schedule the jobs in the following way: every processor executes a job of J 1 in the first slot, k jobs of J 3 in slots 2, 3, . . . , k + 1, and a job of J 2 in slot k + 2. The temperature of every processor after slot k + 1 is 1 2 k + 1 2(2 k −1) · 2 k −1 2 k = 3 2 k+1 , and hence a job of J 2 can be executed in slot k + 2. Moreover, as the jobs of J 3 have heat contribution h i ≤ 1, this schedule is feasible. On the other hand, our algorithm schedules in every processor a job of J 1 in the first slot, a job of J 2 in the slot k + 2, and k jobs of J 3 in slots k + 3, k + 4, · · · , 2k + 2. Therefore, the ratio achieved by our algorithm is 2k+2 k+2 2.

Maximum temperature minimization
Now, we turn our attention to the optimization thermal model and to the problem of minimizing the maximum temperature, i.e., P| p i = 1, h i , d| max . Recall that as we discussed in the Introduction, we consider a schedule length d ≥ n m and that n = m · d, by adding the appropriate number of fictitious jobs. Recall also that the maximum is taken over the temperatures at the end of any of the md slots available on the m processors. In the sequel, we will denote by * max the maximum temperature of an optimal schedule. We start with the observation that any algorithm for this problem achieves a 2 approximation ratio. Indeed, it holds that * max ≥ h max /2, no matter how we schedule the job of maximum heat contribution. It also holds that for any algorithm, max ≤ h max , with max being the maximum temperature of the algorithm's schedule. Therefore, max ≤ 2 · * max . To improve this trivial ratio we propose the Algorithm MAX_T below, which is based on the intuitive idea of alternating the execution of hot and cool jobs.
To elaborate a little more on how the algorithm works, note that processor 1 will be assigned the job J 1 , followed by J n , then followed by J m+1 , and then by J n−m and this alternation of hot and cool jobs will continue till the end of the schedule. Similarly processor 2 will be assigned the jobs J 2 , J n−1 , J m+2 , J n−m−1 , and so on. The schedule is illustrated further in Table 1.
To analyze the Algorithm MAX_T, we start with the proposition below, which is implied by the Round-Robin scheduling of jobs in Steps 2 and 3 of the algorithm.

Algorithm MAX_T
1: Sort the jobs in non-increasing order of their heat contributions: h 1 ≥ h 2 ≥ ... ≥ h n ; 2: Using the order of Step 1, schedule the d 2 m hottest jobs to the odd slots of the processors using Round-Robin; 3: Using the reverse order of Step 1, schedule the d 2 m coolest jobs to the even slots of the processors using Round-Robin;

Average temperature minimization
In this section, we look at the problem of minimizing the average temperature, P| p i = 1, h i , d| j t , instead of the maximum temperature. We will again consider a schedule length d and assume that the number of jobs is n = md.
Contrary to the maximum temperature, we show that minimizing the average temperature of a schedule is solvable in polynomial time. Our algorithm is based on the following lemma.

Lemma 4
In any optimal solution for the average temperature, jobs are scheduled in a coolest first order, i.e., for any pair of jobs J i , J j such that h i > h j scheduled at slots t and t , respectively, it holds that t ≤ t, regardless of the processor they are assigned to.
Proof Consider the job J i to be scheduled at slot t of some processor p in a schedule S. The contribution of job J i to the temperature of the s-th slot of processor p (with t ≤ s ≤ d), is h i 2 s−t+1 , while this job does not affect the temperature of any other slot in any processor. Hence, the contribution of job J i to the objective function, . Therefore, the later job J i is scheduled, the smaller its contribution to the objective function becomes.
Assume, now, that in an optimal schedule S * the job J i is scheduled at slot t of some processor, while the job J j at slot t > t in any processor. By swapping the execution of this pair of jobs the contribution of the job J i to the objective function decreases by h i · 2 t −2 t 2 d+1 and the contribution of job J j increases by h j · 2 t −2 t 2 d+1 . As h i > h j , it follows that the resulting schedule contradicts the optimality of the schedule S * and this completes the proof of the lemma.
The previous lemma leads directly to the next simple algorithm.
Algorithm AVR_T 1: Sort the jobs in non-decreasing order of their heat contributions: h 1 ≤ h 2 ≤ ... ≤ h n ; 2: According to this order schedule the jobs to processors using Round-Robin;

Conclusions
We have provided algorithms as well as negative results for various optimization criteria in scheduling under thermal management models. There are many interesting open questions remaining. The most important is to improve the approximation ratio both for the problem of minimizing the makespan and for minimizing the maximum temperature. Also it would be interesting to generalize our results in the case where the cooling effect is different than one half, as in Birks et al. (2010Birks et al. ( , 2011a. Towards a different direction, one can also consider other objectives under the threshold thermal model, in line with the objectives that have been studied in the more traditional models of job scheduling. Resolving these questions seems technically more challenging than the classic scheduling problems due to the different nature of the constraints that are introduced by temperature management models. Note that scheduling problems under the threshold thermal model can be seen as scheduling problems with sequence-dependent setup times; such a setup time for a job corresponds to the idle slots required to respect the temperature threshold. In scheduling problems with setup times (see for example Pinedo (1995)), the setup time of a job usually depends only on the job itself and the previous job in the schedule. However, in our case, the number of idle slots, required before executing a job, depends on all the jobs scheduled before as well as on their order. Hence existing results from the literature cannot be applied.