Computation Graph

Computation Graph (Optional)

Optional Content

Introduction to Computation Graphs

Computation graphs are a key idea in deep learning
They explain how frameworks like TensorFlow automatically compute derivatives
Enable efficient backpropagation for neural networks

Example: Simple Neural Network

Simplified Model

One-layer neural network with a single output unit
Linear activation function: a = wx + b
Cost function: J = 1/2(a - y)²
Single training example: x = -2, y = 2
Parameters: w = 2, b = 8

Forward Propagation with Computation Graph

Compute w × x = c

w = 2, x = -2
c = -4

Compute a = wx + b = c + b

c = -4, b = 8
a = 4

Compute a - y = d

a = 4, y = 2
d = 2

Compute J = 1/2(a-y)² = 1/2d²

d = 2
J = 2

Backward Propagation (Backprop)

Right-to-Left Calculation

While forward prop moves left-to-right, backprop moves right-to-left
Computes derivatives by asking: “If input to this node changes by ε, how much does J change?”

Computing Derivatives Step-by-Step:

Derivative of J with respect to d:

If d increases from 2 to 2.001, J increases from 2 to 2.002
J increases by ~2 × ε
∂J/∂d = 2

Derivative of J with respect to a:

If a increases by ε, d increases by ε (since d = a - y)
We know when d increases by ε, J increases by 2 × ε
Therefore, when a increases by ε, J increases by 2 × ε
∂J/∂a = 2

Derivative of J with respect to c:

If c increases by ε, a increases by ε (since a = c + b)
We know when a increases by ε, J increases by 2 × ε
Therefore, when c increases by ε, J increases by 2 × ε
∂J/∂c = 2

Derivative of J with respect to b:

If b increases by ε, a increases by ε (since a = c + b)
We know when a increases by ε, J increases by 2 × ε
Therefore, when b increases by ε, J increases by 2 × ε
∂J/∂b = 2

Derivative of J with respect to w:

If w increases by ε, c changes by x × ε = -2 × ε (decreases)
We know when c increases by -2 × ε, J changes by 2 × (-2 × ε) = -4 × ε
Therefore, when w increases by ε, J changes by -4 × ε
∂J/∂w = -4

Verification of Results

We can verify our derivative calculations:
If w increases from 2 to 2.001, J decreases from 2 to ~1.996
J decreases by ~0.004 (4 × 0.001)
Confirms ∂J/∂w = -4

Efficiency of Backpropagation

Computational Advantage

Backprop is sequenced right-to-left for computational efficiency
Each derivative calculation reuses previously computed derivatives
For a computation graph with n nodes and p parameters:
Backprop computes all derivatives in roughly O(n+p) steps
Naive approach would require O(n×p) steps

Key Takeaways

Computation graphs break down complex calculations into simple operations
Forward propagation is a left-to-right calculation through the graph
Backpropagation is a right-to-left calculation that efficiently computes derivatives
This approach enables training of large neural networks with millions of parameters

Computation graphs provide the foundation for how neural networks learn through efficient derivative calculations. By breaking complex functions into small, manageable operations and computing derivatives in reverse order, backpropagation enables the training of large networks that would otherwise be computationally infeasible.