Digital DNA — Backpropagation

1986

The Idea

Neuroevolution solved XOR by guessing — spawning thousands of random networks and keeping the lucky ones. It works, but it's slow. What if, after every wrong answer, we could trace the error backward through the network and tell each weight exactly how much to change?

That's backpropagation. Three steps, repeated thousands of times:

1. Forward pass — feed inputs through the network, get an output.
2. Measure error — how far off was the output from the right answer?
3. Backward pass — send the error backward, adjusting each weight by its share of the blame.

The key insight is the chain rule from calculus. Each layer passes blame to the layer before it, like a chain of "because" statements: the output was wrong because the last layer's weights were off, because the hidden layer fed it bad values, because the first layer's weights were off. Follow the chain, fix each link.

The Core Mechanism

Gradient Descent

Imagine you're blindfolded on a hilly landscape. Your goal is to reach the lowest valley. You can't see, but you can feel which way the ground slopes under your feet. So you take a step downhill. Then another. Eventually you reach a low point. That's gradient descent.

The "landscape" is the loss function — a measure of how wrong the network is. The "slope" is the gradient — it tells us which direction to adjust each weight to reduce the error. The size of each step is the learning rate.

Learning Rate 0.10 Click Drop Ball

Try a very high learning rate — the ball overshoots, bouncing past the minimum. Try too low and it barely moves. The sweet spot is a rate that descends quickly but doesn't overshoot. This tradeoff is one of the most important decisions in training any neural network.

Step by Step

Forward & Backward Pass

Below is a tiny network: 2 inputs, 2 hidden neurons, 1 output. Step through the forward pass to see values flow left-to-right, then step through the backward pass to see gradients flow right-to-left. Watch how each weight receives its "blame" for the error.

Step through the network

Network

Values

Node	Value	Gradient

Weight	Value	δ	New

The Balancing Act

Learning Rate Explorer

The learning rate controls how big each weight update is. Below, three networks train on XOR simultaneously with different learning rates. Watch their loss curves — too small is slow, too large is chaotic, just right converges fast.

Click Start

Full Circle

XOR, Solved with Backprop

The perceptron couldn't solve XOR. Neuroevolution solved it by trial and error over many generations. Backpropagation solves it surgically — computing the exact gradient for every weight, every step. Watch a 2→4→1 network learn XOR from scratch, typically in under 1000 epochs.

LR 0.30 Epoch 0

Loss Over Epochs

XOR Decision Boundary

XOR Outputs

A	B	Expected	Output
0	0	0	—
0	1	1	—
1	0	1	—
1	1	0	—

Comparison

Evolution: ~50–200 generations of 100 organisms
Backprop: ~200–800 epochs, 1 network

Evolution guesses and selects.
Backprop calculates and corrects.

Backpropagation

Compute the gradient of the loss with respect to every weight, then adjust each weight to reduce the error. Uses the chain rule to pass blame backward through layers.

Gradient Descent

An optimization algorithm: repeatedly step in the direction that reduces the loss. The gradient tells you which way is "downhill" for every parameter simultaneously.

Loss Function

Measures how wrong the network is. Mean squared error, cross-entropy, etc. Training means minimizing this number. The landscape of the loss function determines how hard the problem is.

Learning Rate

The step size for gradient descent. Too large and you overshoot. Too small and you crawl. Modern optimizers like Adam adapt the rate automatically per-weight.