Computation Graph
Computation Graph (Optional)
Section titled “Computation Graph (Optional)”Introduction to Computation Graphs
Section titled “Introduction to Computation Graphs”- Computation graphs are a key idea in deep learning
- They explain how frameworks like TensorFlow automatically compute derivatives
- Enable efficient backpropagation for neural networks
Example: Simple Neural Network
Section titled “Example: Simple Neural Network”- One-layer neural network with a single output unit
- Linear activation function: a = wx + b
- Cost function: J = 1/2(a - y)²
- Single training example: x = -2, y = 2
- Parameters: w = 2, b = 8
Forward Propagation with Computation Graph
Section titled “Forward Propagation with Computation Graph”- Compute w × x = c
- w = 2, x = -2
- c = -4
- Compute a = wx + b = c + b
- c = -4, b = 8
- a = 4
- Compute a - y = d
- a = 4, y = 2
- d = 2
- Compute J = 1/2(a-y)² = 1/2d²
- d = 2
- J = 2
Backward Propagation (Backprop)
Section titled “Backward Propagation (Backprop)”- While forward prop moves left-to-right, backprop moves right-to-left
- Computes derivatives by asking: “If input to this node changes by ε, how much does J change?”
Computing Derivatives Step-by-Step:
Section titled “Computing Derivatives Step-by-Step:”- Derivative of J with respect to d:
- If d increases from 2 to 2.001, J increases from 2 to 2.002
- J increases by ~2 × ε
- ∂J/∂d = 2
- Derivative of J with respect to a:
- If a increases by ε, d increases by ε (since d = a - y)
- We know when d increases by ε, J increases by 2 × ε
- Therefore, when a increases by ε, J increases by 2 × ε
- ∂J/∂a = 2
- Derivative of J with respect to c:
- If c increases by ε, a increases by ε (since a = c + b)
- We know when a increases by ε, J increases by 2 × ε
- Therefore, when c increases by ε, J increases by 2 × ε
- ∂J/∂c = 2
- Derivative of J with respect to b:
- If b increases by ε, a increases by ε (since a = c + b)
- We know when a increases by ε, J increases by 2 × ε
- Therefore, when b increases by ε, J increases by 2 × ε
- ∂J/∂b = 2
- Derivative of J with respect to w:
- If w increases by ε, c changes by x × ε = -2 × ε (decreases)
- We know when c increases by -2 × ε, J changes by 2 × (-2 × ε) = -4 × ε
- Therefore, when w increases by ε, J changes by -4 × ε
- ∂J/∂w = -4
Verification of Results
Section titled “Verification of Results”- We can verify our derivative calculations:
- If w increases from 2 to 2.001, J decreases from 2 to ~1.996
- J decreases by ~0.004 (4 × 0.001)
- Confirms ∂J/∂w = -4
Efficiency of Backpropagation
Section titled “Efficiency of Backpropagation”- Backprop is sequenced right-to-left for computational efficiency
- Each derivative calculation reuses previously computed derivatives
- For a computation graph with n nodes and p parameters:
- Backprop computes all derivatives in roughly O(n+p) steps
- Naive approach would require O(n×p) steps
Key Takeaways
Section titled “Key Takeaways”- Computation graphs break down complex calculations into simple operations
- Forward propagation is a left-to-right calculation through the graph
- Backpropagation is a right-to-left calculation that efficiently computes derivatives
- This approach enables training of large neural networks with millions of parameters
Computation graphs provide the foundation for how neural networks learn through efficient derivative calculations. By breaking complex functions into small, manageable operations and computing derivatives in reverse order, backpropagation enables the training of large networks that would otherwise be computationally infeasible.