Neural Network

Neural Networks with Softmax Output Layer

Extending Neural Networks for Multiclass Classification

Architecture Update

Previously: Neural network for binary classification (e.g., digits 0 and 1)
Now: Modifying the network for multiclass classification (e.g., digits 0-9)
Key change: Replace output layer with a Softmax output layer having multiple units (e.g., 10 units for digits 0-9)

Forward Propagation in the Network

Hidden Layers:

Input layer to first hidden layer (a¹): Computed exactly the same as before
First hidden layer to second hidden layer (a²): Computed exactly the same as before

Softmax Output Layer:

For 10 output classes, compute z values:
- z₁³ = w₁³·a² + b₁³
- z₂³ = w₂³·a² + b₂³
- …
- z₁₀³ = w₁₀³·a² + b₁₀³
Apply softmax activation:
- a₁³ = e^(z₁³) / (e^(z₁³) + e^(z₂³) + … + e^(z₁₀³))
- a₂³ = e^(z₂³) / (e^(z₁³) + e^(z₂³) + … + e^(z₁₀³))
- …
- a₁₀³ = e^(z₁₀³) / (e^(z₁³) + e^(z₂³) + … + e^(z₁₀³))
Interpretation: a₁³, a₂³, …, a₁₀³ represent the estimated probabilities of y being equal to 1, 2, …, 10

TensorFlow Implementation

Code Structure

model = tf.keras.Sequential([
  tf.keras.layers.Dense(25, activation='relu'),
  tf.keras.layers.Dense(15, activation='relu'),
  tf.keras.layers.Dense(10, activation='softmax')
])

model.compile(
  optimizer='adam',
  loss='SparseCategoricalCrossentropy',
  metrics=['accuracy']
)

model.fit(X, y, epochs=100)

Understanding the Loss Function

SparseCategoricalCrossentropy

Replaces BinaryCrossentropy used in binary classification
“Categorical” because we’re classifying into categories (1-10)
“Sparse” because each example belongs to only one category
Example: A digit image is either 0, 1, 2, …, or 9, but not multiple digits simultaneously

Key Takeaways

Neural networks for multiclass classification use a softmax output layer
The number of output units equals the number of possible classes
Softmax activation produces a probability distribution across all classes
Unlike other activation functions, softmax operates on all z values collectively
TensorFlow implementation uses 'softmax' activation and 'SparseCategoricalCrossentropy' loss

Neural networks with softmax output layers extend the power of neural networks beyond binary classification to handle any number of classes. By replacing the single sigmoid output unit with multiple softmax units, the network can estimate the probability distribution across all possible classes, making it suitable for a wide range of real-world classification tasks.