1 DS|A / day: Greedy scheduling heuristics
Wed Nov 15 2023
- Time complexity:
- N/A
- Space complexity:
- N/A
- Uses
- Simple greedy heuristics for solving interval/task scheduling problems.
- Downsides
- Make sure they fit the pattern before applying a greedy heuristic. Many (most) are not solvable greedily.
- Resources
- ~wayne
- Scheduling Heuristics (Rulz, 2015)
- Heuristics Techniques for Scheduling Problems with Reducing Waiting Time Variance (Mahapatra et al., 2017)
- Wikipedia: Interval scheduling
- Wikipedia: Optimal job scheduling
- Wikipedia: Single-machine scheduling
- Wikipedia: Identical-machine scheduling
- (too many subpages to list)
- The Algorithm Design Manual
- book.pdf
- Hochbaum
A general optimal job scheduling problem is: given n
machines/processors/workers, and a list J
of jobs, produce some schedule - an assignment of jobs to machines.
The schedule should optimize some certain objective function. For example:
- maximize the total number of jobs we can fit in the allotted time
- minimize the total waiting time
- minimize the total time spent for all jobs by spreading the jobs over some number of machines
- minimize the maximum lateness
Jobs may have single or multiple stages. We'll keep the scope to single stage jobs here, where each job j
consists of
a single execution phase, with a given processing time p_j
.
There are four main categories of machine environments:
- 1 - Single-machine scheduling: There is a single machine.
- P - Identical-machines scheduling: There are
m
identical parallel machines. Jobj
takes timep_j
on any machine. - Q - Uniform-machines scheduling: There are
m
parallel machines, with different speeds. Jobj
on machinei
takesp_j
/s_i
. - R - Unrelated-machines scheduling: There are
m
parallel machines, and they are unrelated. Jobj
on machinei
takesp_ij
.
There are certain characteristics of jobs we might have:
- Processing time is equal for all jobs
- Processing time is 1 for all jobs
- Jobs have a release time
r_j
before which we can't schedule the job - Jobs have a strict due date
d_j
jobs must complete by - Jobs have a relaxed due date
d_j
we are penalized for after (e.g. 1 cost for every late time period) - Jobs have a
size_j
they must be scheduled on at the same time. This occurs in parallel task scheduling problems.
Jobs might have precedence relations:
- Job
i->j
means jobj
must start afteri
finishes
In these cases we can think about jobs as a graph, e.g.
There might be transportation delays between jobs, e.g.
- Between the completion of job
O_aj
of jobj
on machinea
and the start ofO_bk
jobk
on machineb
, there is a transportation delay oft_aj
. - Between the completion of job
O_aj
of jobj
on machinea
and the start ofO_ak
jobk
on machinea
, there is a transportation delay oft_a
.
And a lot of other constraints.
Very, very often, these problems take exponential time as we must enumerate every case. However, for some specific problems, greedy heuristics exist.
Find maximum subset of non-overlapping jobs
Very common. 1 machine only. We're given a set of intervals representing start and end times of jobs, and we need to find the maximum subset of intervals that don't overlap.
Heuristic: Sort by earliest finish time.
Why: Say we have some job j_1
with the earliest start time in the set, that does not have the earliest end time, that is part of the best set.
Then if we pick some job j_2
with an end time e_2 <= e_1
, j_2
could replace j_1
without overlapping, while giving e_1 - e_2
more time.
So j_2
is strictly better than j_1
for being in the best set.
Here are some counterexamples for heuristics like:
- earliest start time
- shortest interval
- fewest conflicts.
If the intervals are weighted, then we are finding a maximum-weight independent set in an interval graph. This must be solved in polynomial time.
Find minimum machines to schedule jobs
Given some jobs with start and end times, find the minimum number of machines to schedule all jobs so that no two use the same machine at the same time.
The number needed is going to be the maximum depth (the maximum parallel scheduled at one time).
Heuristic: Schedule the jobs earliest start time first, assigning jobs to any free machine (or spinning up a new one if there are none free)
Why: First, this will never put two overlapping jobs on the same machine.
Second, if we ever open a new machine for a new job j
, that means it overlaps with the current jobs on the other m
machines.
Because we sorted by start time, all the overlaps are caused by jobs that start before or at s_j
. Because they overlap with j
,
all these jobs end after s_j
. So, we have depth d
jobs overlapping at this time. So we need at least d
machines.
Minimize maximum lateness
Given 1 machine, and each job requiring p_j
time and due at time d_j
, we want to minimize the maximum lateness of all jobs.
Heuristic: Earliest deadline first.
Why: There exists a best schedule with no idle time. Given any schedule S
, we can swap two jobs i
and j
currently scheduled as j,i
where job i
's deadline < j
's deadline without increasing the maximum lateness, and it will be strictly better.
Here are some counterexamples for heuristics like:
- shortest processing time first
- smallest slack time first
Minimize total/average lateness
Given 1 machine and some jobs, minimize the total/average lateness.
This can also be formulated as, we can score d-x
points per job where d
is the job's deadline and x
is the time we finish the job.
Heuristic: Shortest processing time first.
Why: If we ever perform two tasks after another such that the first task takes longer than the second, we can swap the tasks:
Since job X
gives b
points less and Y
gives a
points more, the total score increases by a-b > 0
.
For a 1 machine system, shortest processing time will also minimize the:
- mean waiting time of jobs from time of arrival to start of processing
- maximum waiting time
- mean number of jobs in the system.
For a 1 machine system, you can use modified due-date scheduling heuristic to find the minimize total weighted lateness.