Skip to content
Pablo Rodriguez

Improvements

Floating-Point Precision
  • The previous implementation of neural networks with softmax layers works, but has potential numerical issues
  • Problem stems from floating-point precision limitations in computers
  1. Option 1: Calculate x = 2/10,000 directly
  2. Option 2: Calculate x = (1 + 1/10,000) - (1 - 1/10,000)
    • Mathematically equivalent to 2/10,000
    • Computationally produces round-off errors
Conceptual Bridge
  1. Compute activation: a = g(z) = 1/(1+e^(-z))
  2. Compute loss: -y*log(a) - (1-y)*log(1-a)
Original Implementation
model = tf.keras.Sequential([
# Previous layers...
tf.keras.layers.Dense(1, activation='sigmoid')
])
model.compile(
loss='BinaryCrossentropy',
optimizer='adam',
metrics=['accuracy']
)
  • Instead of computing a separately, provide the complete loss expression
  • Let TensorFlow optimize the computation internally
Improved Implementation
model = tf.keras.Sequential([
# Previous layers...
tf.keras.layers.Dense(1, activation='linear')
])
model.compile(
loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
optimizer='adam',
metrics=['accuracy']
)
Key Improvement
  1. Compute activations: aⱼ = e^(zⱼ) / Σ e^(zᵢ)
  2. Compute loss: -log(aⱼ) where j is the correct class
Original Softmax Implementation
model = tf.keras.Sequential([
# Previous layers...
tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(
loss='SparseCategoricalCrossentropy',
optimizer='adam',
metrics=['accuracy']
)
  • Output layer uses linear activation (outputs z values directly)
  • Loss function handles both softmax activation and cross-entropy calculation
Improved Softmax Implementation
model = tf.keras.Sequential([
# Previous layers...
tf.keras.layers.Dense(10, activation='linear')
])
model.compile(
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
optimizer='adam',
metrics=['accuracy']
)

Numerical Stability

  • Avoids computing extremely small values (e^(-large_number))
  • Avoids computing extremely large values (e^(large_number))
  • TensorFlow can rearrange terms for more accurate computation

Trade-offs

  • More numerically accurate
  • Less intuitive/readable code
  • Conceptually equivalent to original implementation
  • You now know how to implement multiclass classification with softmax output layers
  • You understand how to do it in a numerically stable way
  • Next topic: Multi-label classification (a different type of classification problem)

The improved implementation of softmax neural networks focuses on numerical stability by separating the activation and loss functions in the code, even though conceptually they remain connected. This approach allows TensorFlow to optimize the calculations internally, reducing rounding errors that can occur with very large or very small numbers.