Skip to content
Pablo Rodriguez

Why Its Needed

Why Neural Networks Need Activation Functions

Section titled “Why Neural Networks Need Activation Functions”
Critical Concept
  • If we used linear activation functions for all neurons in a neural network:
  • The network becomes no different than linear regression
  • “This would defeat the entire purpose of using a neural network”
  • Cannot fit anything more complex than basic linear regression models
  • Input: x (scalar)
  • Hidden layer: one unit with parameters w₁ and b₁, outputs a₁
  • Output layer: one unit with parameters w₂ and b₂, outputs a₂ (which equals f(x))
  1. Calculate hidden layer output:

    • a₁ = g(w₁x + b₁)
    • Since g(z) = z (linear activation)
    • a₁ = w₁x + b₁
  2. Calculate final output:

    • a₂ = g(w₂a₁ + b₂)
    • Since g(z) = z
    • a₂ = w₂a₁ + b₂
  3. Substitute a₁ into the equation:

    • a₂ = w₂(w₁x + b₁) + b₂
    • a₂ = w₂w₁x + w₂b₁ + b₂
  4. Simplify by setting:

    • w = w₂w₁
    • b = w₂b₁ + b₂
  5. Result:

    • a₂ = wx + b
  • “A linear function of a linear function is itself a linear function”
  • Multiple layers with linear activations don’t enable the network to compute:
  • More complex features
  • Non-linear relationships
  • Complex patterns in data
  • Neural network with multiple hidden layers using linear activations:

  • Output will always be equivalent to linear regression

  • Can be expressed as wx + b (for some values of w and b)

  • If using linear activations in hidden layers but sigmoid in output layer:

  • Becomes equivalent to logistic regression

  • Output can be expressed as 1/(1+e^(-wx+b))

  • “This big neural network doesn’t do anything that you can’t also do with logistic regression”

Key Recommendation
  • “Don’t use the linear activation function in the hidden layers of the neural network”
  • Recommendation: Use ReLU activation for hidden layers
  • This enables the network to learn non-linear relationships
  • So far, we’ve covered neural networks for:
  • Binary classification (y is 0 or 1)
  • Regression (y can be positive, negative, or non-negative)
  • Next: Multi-class classification
  • When y can take 3, 4, 10 or more categorical values

Non-linear activation functions are essential for neural networks to work effectively. Without them, a neural network simply collapses into linear or logistic regression, regardless of how many layers it has, making it impossible to learn complex patterns in data.