Appearance
The math of bayer dithering, in one short page
This is the short companion piece to the 80-line shader post, for people who want to understand why the threshold matrix is in the specific order it is. It does not need any maths beyond high-school algebra.
The problem ordered dithering solves
Quantising a continuous gradient to a small palette creates banding. The bands are the literal step function: every input value in a range maps to the same output colour, so the boundaries between ranges appear as hard lines.
What we want is for those step boundaries to break up into a perceptual texture that the eye reads as a smooth transition. The simplest way to do this is to add a small per-pixel offset to the input before the quantisation, so that pixels near a step boundary get rounded different ways depending on their position. The pattern of rounded-up and rounded-down pixels does the perceptual smoothing.
The question is: what pattern of offsets?
The naive option: random noise
The first thing anyone tries is to add white noise. This works, but it is ugly. White noise has energy at every spatial frequency, including the low frequencies that the eye is most sensitive to, so the dither pattern is visible as a kind of static crawl over the image.
What you want is offsets that are deterministic and predictable per pixel position, and whose pattern has most of its energy at high spatial frequencies the eye does not resolve well — "blue noise". Bayer dithering is an early, cheap approximation to blue noise that you can compute with zero state.
What the Bayer matrix actually is
The Bayer matrix is the result of a recursive construction designed to maximise the minimum distance between equal threshold values in a tiled plane. If you label every pixel in the dither pattern with its threshold value — the offset that pixel adds — you want pixels with similar thresholds to be as far apart as possible. That spreads the dither pattern across spatial frequencies the eye cannot resolve.
The construction is recursive. The base case is the 2×2 matrix:
[ 0 2 ]
[ 3 1 ]Read the values: 0 is in the top-left, 2 is in the top-right, 3 is in the bottom-left, 1 is in the bottom-right. The "nearest neighbours" of 0 (its four orthogonal neighbours in a tiled plane) are 2, 3, and the 2 and 3 from the wrap-around. The matrix is constructed so that the value 1 is as far from 0 as it can be in this 2×2 grid.
To get the 4×4 matrix, you take the 2×2 matrix B2 and build a 4×4 matrix B4 according to the rule:
B4 = [ 4*B2 + 0 , 4*B2 + 2 ]
[ 4*B2 + 3 , 4*B2 + 1 ]Each cell of the 2×2 matrix expands into a 2×2 block of the 4×4 matrix, and the value of the original cell determines the offset applied to the inner block. Reading that out:
[ 0 8 2 10 ]
[ 12 4 14 6 ]
[ 3 11 1 9 ]
[ 15 7 13 5 ]The 8×8 matrix is constructed the same way, expanding the 4×4 with the 2×2 rule. Each level of recursion doubles the dimension and quadruples the range of values, preserving the "neighbours are far apart" property at every scale. That is the whole construction.
Why this is the right shape
The clever thing about the recursive construction is that the maximum-distance property is preserved at every scale. At the 2×2 scale, the value 1 is opposite the value 0. At the 4×4 scale, the value 4 (the next-nearest after 1) is opposite the 2×2 block containing the 0. At the 8×8 scale, the same property holds for the values 16, 32, and 48.
This means the threshold pattern has energy at the highest spatial frequencies the matrix can support, which is exactly what we want for a dither pattern. The eye smooths over the highest-frequency variations and reads the macroscopic shading.
There is a closed-form expression for the Bayer matrix value at position (x, y) that avoids the recursion, useful if you want to compute the matrix on the fly without storing it:
B(x, y) = bit_reverse(interleave(x XOR y, y))Where interleave weaves the bits of two numbers, and bit_reverse reverses the bit order of the result. For an 8×8 matrix you reverse 6 bits; for a 16×16 matrix you reverse 8 bits. I find the recursive construction easier to remember, but the closed form is what you reach for if you are writing a fragment shader without a uniform array.
How it lands in the shader
In the 80-line shader piece you can see the matrix normalised: each value divided by 64, with 0.5 subtracted, so the entire matrix lies in the range roughly (-0.5, +0.5). Then we scale it by one step of palette resolution — 1.0 / palette_size — and add it to each pixel's value before quantising.
The intuition is this: each pixel gets a deterministic per-position offset that nudges its colour value either up or down by, at most, half a palette step. When a region of the image happens to sit at a value that is exactly between two palette colours, half the pixels in that region get nudged up and half get nudged down, in the spatial pattern the Bayer matrix prescribes. That pattern is high-frequency, so from a normal viewing distance the eye averages it back to the intermediate colour and reads the smooth gradient. Up close you see the dither, the way you see the halftone dots in a newspaper photograph up close.
When you would not use Bayer
Bayer dithering is the cheapest possible dither, and you can see its pattern. It is the visible regular cross-hatch that gives early Mac graphics and old print materials their look. If you want the look, that is great. If you do not want the look — if you want a dither that vanishes from view — you want Floyd-Steinberg error diffusion, which produces a much more organic pattern at the cost of needing previous pixels to be processed before the current pixel.
For real-time pixel art and post-processing shaders, Bayer wins because it is stateless. Each pixel can be computed in parallel on the GPU with no information about its neighbours. Floyd-Steinberg requires either a CPU pass or a clever multi-pass approach on the GPU, neither of which is worth it unless you have a specific aesthetic in mind.
If you want a higher-quality dither that is still stateless, precomputed blue-noise textures by Christoph Peters are the next step up. They are bigger than a Bayer matrix (typically 64×64 or 128×128) and they look closer to true blue noise, at the cost of having to keep a texture around. For the NES sub-palette series, I deliberately chose Bayer for the visible regular pattern, because the goal was the retro look. For modern work where you do not want to see the dither, a blue-noise texture is what I reach for.
Further reading
The classic reference is the Bryce Bayer 1973 paper that gives the technique its name. It is short, mostly tables, and worth reading once if you want the original framing. For a working tour of all the major dithering algorithms, Tanner Helland's article has source code for each one in VB6, which is hilarious but legible. For the blue-noise alternative, Christoph Peters' explanation of his sampling method is the right next step.
That is the whole maths. There is nothing else to it. The matrix exists because it was designed to have certain spatial-frequency properties, and the shader uses it because those properties are exactly what we want.