1 DS|A / day: Greedy scheduling heuristics

Wed Nov 15 2023

A general optimal job scheduling problem is: given n machines/processors/workers, and a list J of jobs, produce some schedule - an assignment of jobs to machines. The schedule should optimize some certain objective function. For example:

  • maximize the total number of jobs we can fit in the allotted time
  • minimize the total waiting time
  • minimize the total time spent for all jobs by spreading the jobs over some number of machines
  • minimize the maximum lateness

Jobs may have single or multiple stages. We'll keep the scope to single stage jobs here, where each job j consists of a single execution phase, with a given processing time p_j.

There are four main categories of machine environments:

  • 1 - Single-machine scheduling: There is a single machine.
  • P - Identical-machines scheduling: There are m identical parallel machines. Job j takes time p_j on any machine.
  • Q - Uniform-machines scheduling: There are m parallel machines, with different speeds. Job j on machine i takes p_j / s_i.
  • R - Unrelated-machines scheduling: There are m parallel machines, and they are unrelated. Job j on machine i takes p_ij.

There are certain characteristics of jobs we might have:

  • Processing time is equal for all jobs
  • Processing time is 1 for all jobs
  • Jobs have a release time r_j before which we can't schedule the job
  • Jobs have a strict due date d_j jobs must complete by
  • Jobs have a relaxed due date d_j we are penalized for after (e.g. 1 cost for every late time period)
  • Jobs have a size_j they must be scheduled on at the same time. This occurs in parallel task scheduling problems.

Jobs might have precedence relations:

  • Job i->j means job j must start after i finishes

In these cases we can think about jobs as a graph, e.g.

small-pic Scheduling heuristics

There might be transportation delays between jobs, e.g.

  • Between the completion of job O_aj of job j on machine a and the start of O_bk job k on machine b, there is a transportation delay of t_aj.
  • Between the completion of job O_aj of job j on machine a and the start of O_ak job k on machine a, there is a transportation delay of t_a.

And a lot of other constraints.

Very, very often, these problems take exponential time as we must enumerate every case. However, for some specific problems, greedy heuristics exist.

Find maximum subset of non-overlapping jobs

Very common. 1 machine only. We're given a set of intervals representing start and end times of jobs, and we need to find the maximum subset of intervals that don't overlap.

small-pic Scheduling heuristics

Heuristic: Sort by earliest finish time.

Why: Say we have some job j_1 with the earliest start time in the set, that does not have the earliest end time, that is part of the best set. Then if we pick some job j_2 with an end time e_2 <= e_1, j_2 could replace j_1 without overlapping, while giving e_1 - e_2 more time. So j_2 is strictly better than j_1 for being in the best set.

Here are some counterexamples for heuristics like:

  • earliest start time
  • shortest interval
  • fewest conflicts.

small-pic Scheduling heuristics

If the intervals are weighted, then we are finding a maximum-weight independent set in an interval graph. This must be solved in polynomial time.

Find minimum machines to schedule jobs

Given some jobs with start and end times, find the minimum number of machines to schedule all jobs so that no two use the same machine at the same time.

small-pic Scheduling heuristics

The number needed is going to be the maximum depth (the maximum parallel scheduled at one time).

Heuristic: Schedule the jobs earliest start time first, assigning jobs to any free machine (or spinning up a new one if there are none free)

Why: First, this will never put two overlapping jobs on the same machine. Second, if we ever open a new machine for a new job j, that means it overlaps with the current jobs on the other m machines. Because we sorted by start time, all the overlaps are caused by jobs that start before or at s_j. Because they overlap with j, all these jobs end after s_j. So, we have depth d jobs overlapping at this time. So we need at least d machines.

small-pic Scheduling heuristics

Minimize maximum lateness

Given 1 machine, and each job requiring p_j time and due at time d_j, we want to minimize the maximum lateness of all jobs.

small-pic Scheduling heuristics

Heuristic: Earliest deadline first.

Why: There exists a best schedule with no idle time. Given any schedule S, we can swap two jobs i and j currently scheduled as j,i where job i's deadline < j's deadline without increasing the maximum lateness, and it will be strictly better.

small-pic Scheduling heuristics

Here are some counterexamples for heuristics like:

  • shortest processing time first
  • smallest slack time first

pic Scheduling heuristics

Minimize total/average lateness

Given 1 machine and some jobs, minimize the total/average lateness.

This can also be formulated as, we can score d-x points per job where d is the job's deadline and x is the time we finish the job.

small-pic Scheduling heuristics

Heuristic: Shortest processing time first.

Why: If we ever perform two tasks after another such that the first task takes longer than the second, we can swap the tasks:

small-pic Scheduling heuristics

Since job X gives b points less and Y gives a points more, the total score increases by a-b > 0.

For a 1 machine system, shortest processing time will also minimize the:

  • mean waiting time of jobs from time of arrival to start of processing
  • maximum waiting time
  • mean number of jobs in the system.

For a 1 machine system, you can use modified due-date scheduling heuristic to find the minimize total weighted lateness.

Sample questions