Appearance
Extracting a 12-colour palette from a photograph, with octrees
I spent most of the winter writing a small command-line tool that takes a photograph and gives me back a twelve-colour Aseprite palette I can dither against. Behind the tool is one of the prettier algorithms in graphics: octree colour quantisation, first described by Michael Gervautz and Werner Purgathofer in 1988. It is fast, it is deterministic, and it is short enough to fit in a single Rust file.
This post walks through what an octree is, why it works for colour quantisation, the tightest implementation I have arrived at, and how it compares to the more famous k-means approach.
What an octree is, in two sentences
An octree is a tree where every internal node has exactly eight children. It is to 3D space what a quadtree is to 2D space — a way of recursively subdividing a cube into octants. For colour quantisation, the "cube" is the RGB cube (0..1 in each axis), and each pixel of the input image is a point inside it.
Why an octree fits colour quantisation
The key observation is that RGB colours that are close together in the cube are colours that look similar. If we drop every pixel of the input image into the octree, sibling leaves represent perceptually adjacent colours. To get a palette of twelve colours we merge the smallest-population leaves with their siblings until only twelve are left. Each surviving leaf is then summarised as the average of all the pixels it contains.
This is fast for two reasons. First, each pixel touches at most eight nodes (one per depth level). Second, merging is local: collapsing a node into its parent means combining at most eight small running sums.
The Rust implementation
Here is the algorithm in about 90 lines of Rust. It depends only on image for PNG/JPG loading.
rust
use image::Rgb;
const MAX_DEPTH: usize = 6;
#[derive(Default)]
struct OctreeNode {
is_leaf: bool,
pixel_count: u64,
r_sum: u64, g_sum: u64, b_sum: u64,
children: [Option<Box<OctreeNode>>; 8],
}
impl OctreeNode {
fn new() -> Self { Self::default() }
fn insert(&mut self, c: Rgb<u8>, depth: usize, leaves: &mut Vec<*mut OctreeNode>) {
if depth == MAX_DEPTH {
if !self.is_leaf {
self.is_leaf = true;
leaves.push(self as *mut OctreeNode);
}
self.pixel_count += 1;
self.r_sum += c[0] as u64;
self.g_sum += c[1] as u64;
self.b_sum += c[2] as u64;
return;
}
let idx = child_index(c, depth);
let child = self.children[idx].get_or_insert_with(|| Box::new(OctreeNode::new()));
child.insert(c, depth + 1, leaves);
}
fn merge_children(&mut self) {
for child_opt in self.children.iter_mut() {
if let Some(child) = child_opt.take() {
self.pixel_count += child.pixel_count;
self.r_sum += child.r_sum;
self.g_sum += child.g_sum;
self.b_sum += child.b_sum;
}
}
self.is_leaf = true;
}
fn colour(&self) -> Rgb<u8> {
let n = self.pixel_count.max(1);
Rgb([(self.r_sum / n) as u8, (self.g_sum / n) as u8, (self.b_sum / n) as u8])
}
}
fn child_index(c: Rgb<u8>, depth: usize) -> usize {
let shift = 7 - depth;
let r_bit = (c[0] as usize >> shift) & 1;
let g_bit = (c[1] as usize >> shift) & 1;
let b_bit = (c[2] as usize >> shift) & 1;
(r_bit << 2) | (g_bit << 1) | b_bit
}
pub fn extract_palette(pixels: &[Rgb<u8>], target: usize) -> Vec<Rgb<u8>> {
let mut root = OctreeNode::new();
let mut leaves: Vec<*mut OctreeNode> = Vec::new();
for p in pixels { root.insert(*p, 0, &mut leaves); }
// Repeatedly merge the smallest leaf with its parent until target reached.
while leaves.len() > target {
// Find the deepest, smallest node to merge.
// (Walk the tree; depth-first; pick the deepest with the lowest count.)
// Implementation of `find_smallest` left as an exercise — about 20 lines.
let candidate = find_smallest(&mut root);
unsafe { (*candidate).merge_children(); }
leaves.retain(|&p| p != candidate);
}
leaves.iter().map(|&p| unsafe { (*p).colour() }).collect()
}The unsafe pointer-juggling for the leaves vector is the one place this becomes annoying to write in safe Rust without trading clarity for type gymnastics; in production I keep a Vec of paths-from-root instead, which is a little uglier but compiles clean.
You can find the full, safe-Rust version on my GitHub, along with a working CLI that reads a JPEG and writes an Aseprite .pal file.
What about k-means?
K-means is the textbook answer to "give me N colours that represent this image". You pick N seed colours, assign every pixel to its closest seed, move each seed to the mean of its cluster, and repeat until things stop moving. It produces a slightly better palette than octree in the perceptual sense — the final centroids really are the best summary of the pixel cloud, not just an artefact of cube-subdivision.
The problem with k-means is that it is much slower (typically 50× for a 4-megapixel input), it is non-deterministic (different seed choices give different palettes), and it is sensitive to outliers (one bright pixel can pull a centroid away from its proper home).
For my Sunday dither series I want the same palette every time I run the tool on the same photo, and I want the result in under a second. Octree gives me both. The output is a touch less perceptually optimal than k-means but, dithered through a 4-colour NES sub-palette anyway, you cannot tell the difference.
Six photos, both algorithms
I ran both algorithms on six photos from a trip to Hokkaidō last winter and dithered the result. The headline numbers:
| photo | octree (ms) | k-means (ms) | visually distinguishable? |
|---|---|---|---|
| snow on cedar | 84 | 4,212 | no |
| Sapporo neon street | 81 | 4,008 | yes (k-means cleaner) |
| Otaru canal at dusk | 92 | 4,580 | no |
| Lake Tōya pano | 76 | 3,910 | no |
| ramen counter | 88 | 4,330 | yes (octree more chroma) |
| Mount Yōtei | 73 | 3,840 | no |
Four of the six were indistinguishable. Of the two that differed, one favoured each algorithm.
For practical pixel-art work — "I want a small palette that captures the mood of this photograph" — octree is the right tool. For research, illustration, or anything destined for print, the extra runtime of k-means buys you a meaningful improvement.
Where to go from here
If you want to go deeper, the canonical references are still Heckbert's "Colour Image Quantisation for Frame Buffer Display" (1982), Wu's 1991 paper on principal-axis-aligned binary splits, and the chapter in Foley/van Dam's Computer Graphics: Principles and Practice which covers all the major families.
For pixel-art use specifically, Lospec has hundreds of hand-curated palettes that will almost always look better than anything an algorithm produces — humans are good at this. But the algorithmic approach is invaluable when you need to dither from a real photograph to a palette you already love, which is exactly the bayer-dither shader scenario the series is built around.