Skip to content
Pablo Rodriguez

Computation Graph

Optional Content
  • Computation graphs are a key idea in deep learning
  • They explain how frameworks like TensorFlow automatically compute derivatives
  • Enable efficient backpropagation for neural networks
Simplified Model
  • One-layer neural network with a single output unit
  • Linear activation function: a = wx + b
  • Cost function: J = 1/2(a - y)²
  • Single training example: x = -2, y = 2
  • Parameters: w = 2, b = 8

Forward Propagation with Computation Graph

Section titled “Forward Propagation with Computation Graph”
  1. Compute w × x = c
  • w = 2, x = -2
  • c = -4
  1. Compute a = wx + b = c + b
  • c = -4, b = 8
  • a = 4
  1. Compute a - y = d
  • a = 4, y = 2
  • d = 2
  1. Compute J = 1/2(a-y)² = 1/2d²
  • d = 2
  • J = 2
Right-to-Left Calculation
  • While forward prop moves left-to-right, backprop moves right-to-left
  • Computes derivatives by asking: “If input to this node changes by ε, how much does J change?”
  1. Derivative of J with respect to d:
  • If d increases from 2 to 2.001, J increases from 2 to 2.002
  • J increases by ~2 × ε
  • ∂J/∂d = 2
  1. Derivative of J with respect to a:
  • If a increases by ε, d increases by ε (since d = a - y)
  • We know when d increases by ε, J increases by 2 × ε
  • Therefore, when a increases by ε, J increases by 2 × ε
  • ∂J/∂a = 2
  1. Derivative of J with respect to c:
  • If c increases by ε, a increases by ε (since a = c + b)
  • We know when a increases by ε, J increases by 2 × ε
  • Therefore, when c increases by ε, J increases by 2 × ε
  • ∂J/∂c = 2
  1. Derivative of J with respect to b:
  • If b increases by ε, a increases by ε (since a = c + b)
  • We know when a increases by ε, J increases by 2 × ε
  • Therefore, when b increases by ε, J increases by 2 × ε
  • ∂J/∂b = 2
  1. Derivative of J with respect to w:
  • If w increases by ε, c changes by x × ε = -2 × ε (decreases)
  • We know when c increases by -2 × ε, J changes by 2 × (-2 × ε) = -4 × ε
  • Therefore, when w increases by ε, J changes by -4 × ε
  • ∂J/∂w = -4
  • We can verify our derivative calculations:
  • If w increases from 2 to 2.001, J decreases from 2 to ~1.996
  • J decreases by ~0.004 (4 × 0.001)
  • Confirms ∂J/∂w = -4
Computational Advantage
  • Backprop is sequenced right-to-left for computational efficiency
  • Each derivative calculation reuses previously computed derivatives
  • For a computation graph with n nodes and p parameters:
  • Backprop computes all derivatives in roughly O(n+p) steps
  • Naive approach would require O(n×p) steps
  • Computation graphs break down complex calculations into simple operations
  • Forward propagation is a left-to-right calculation through the graph
  • Backpropagation is a right-to-left calculation that efficiently computes derivatives
  • This approach enables training of large neural networks with millions of parameters

Computation graphs provide the foundation for how neural networks learn through efficient derivative calculations. By breaking complex functions into small, manageable operations and computing derivatives in reverse order, backpropagation enables the training of large networks that would otherwise be computationally infeasible.