Maximum area under a histogram
Suppose you have an 0-indexed array h
of nonnegative integers with
length \(n\) and want to maximise the quantity (j-i) min(h[i:j])
over
all \(0\leq i < j \leq n\). This is an old chestnut coding exercise
found on all the algobashing sites, often called the “maximum area under
a histogram” problem because if you consider the histogram
which has a bar of height h[0]
above the interval \([0, 1)\), a bar of
height h[1]
above \([1, 2)\), and so on,
then this (j-i) min(h[i:j])
is
the area of the largest rectangle sitting on the \(x\)-axis
whose left-hand edge is part of the line \(x = i\) and whose right-hand
edge is part of the line \(x=j\) and whose height is the minimum value
in h
between index i
(inclusive) and j
(exclusive), so that the
rectangle is completely contained within the histogram.
If j
is an index
in the array h
, its little sibling \(\operatorname{ls}(j)\) to be the maximal \(i<j\) such that h[i] < h[j]
. Some people are only children, of course. The little sibling
sequence starting at j
is \(j, \operatorname{ls}(j),
\operatorname{ls}(\operatorname{ls}(j)), \ldots\), continuing for as
long as possible.
To solve the problem, let’s fix the value of j
. In looking for the
largest rectangle
we’re going to consider all rectangles below the histogram whose
right-hand edge is part of the line \(x=j\) (so the j
th bar is not
included). For these to have a chance of being the maximum area
rectangle their height must be greater than h[j]
, otherwise they could
be extended to the right. This height must also be at most h[j-1]
if
the rectangle is to stay beneath the histogram. Again, if these
rectangles are to be as large as possible they should extend as far left
as possible, which means the \(x\)-coordinate of the left-hand edge
should be one more than the little sibling of the index of the height.
So both the heights and the left-hand limits we need to consider arise
from the little sibling sequence starting at j-1
.
Suppose we keep a stack s
, initially empty, and do the
following:
for 0 <= i < n:
while s is not empty and h[s.peek()] >= h[i]:
s.pop()
s.push(i)
In other words, for each i
, pop every index on the stack
whose height is bigger than that of i
or equal to that of i
, then push
i
.
I claim that each time the for-loop body completes, the stack consists
of the little sibling sequence starting at i
. By induction, it’s
enough to show that this holds at the top, that is, that after
completing the loop body, the thing below i
is the little sibling of
i
.
The little sibling m
of i
entered the stack when it was pushed at
the end of iteration m
of the loop. It can’t have been popped in the
intervening time (steps m+1
up to i-1
inclusive) as the indices
between m+1
and i-1
have height at least that of i
, so bigger than
that of m
. At step i
any indices coming after m
get popped in the
while loop as their height is at least that if i
, so m
really is
next on the stack after i
.
Notice that running this loop is \(O(n)\) because every index enters and
leaves the stack at most once. By updating the best rectangle area every
time something is popped we therefore solve the problem in \(O(n)\). The
only annoying edge case is that we should consider i=n
, but you can
avoid this by adding a small element to the end of the array.