Skip to content
Pablo Rodriguez

Neural Network

Extending Neural Networks for Multiclass Classification

Section titled “Extending Neural Networks for Multiclass Classification”
Architecture Update
  • Previously: Neural network for binary classification (e.g., digits 0 and 1)
  • Now: Modifying the network for multiclass classification (e.g., digits 0-9)
  • Key change: Replace output layer with a Softmax output layer having multiple units (e.g., 10 units for digits 0-9)
  • Input layer to first hidden layer (a¹): Computed exactly the same as before
  • First hidden layer to second hidden layer (a²): Computed exactly the same as before
  1. For 10 output classes, compute z values:

    • z₁³ = w₁³·a² + b₁³
    • z₂³ = w₂³·a² + b₂³
    • z₁₀³ = w₁₀³·a² + b₁₀³
  2. Apply softmax activation:

    • a₁³ = e^(z₁³) / (e^(z₁³) + e^(z₂³) + … + e^(z₁₀³))
    • a₂³ = e^(z₂³) / (e^(z₁³) + e^(z₂³) + … + e^(z₁₀³))
    • a₁₀³ = e^(z₁₀³) / (e^(z₁³) + e^(z₂³) + … + e^(z₁₀³))
  3. Interpretation: a₁³, a₂³, …, a₁₀³ represent the estimated probabilities of y being equal to 1, 2, …, 10

Code Structure
Neural Network with Softmax
model = tf.keras.Sequential([
tf.keras.layers.Dense(25, activation='relu'),
tf.keras.layers.Dense(15, activation='relu'),
tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(
optimizer='adam',
loss='SparseCategoricalCrossentropy',
metrics=['accuracy']
)
model.fit(X, y, epochs=100)

SparseCategoricalCrossentropy

  • Replaces BinaryCrossentropy used in binary classification
  • “Categorical” because we’re classifying into categories (1-10)
  • “Sparse” because each example belongs to only one category
  • Example: A digit image is either 0, 1, 2, …, or 9, but not multiple digits simultaneously
  • Neural networks for multiclass classification use a softmax output layer
  • The number of output units equals the number of possible classes
  • Softmax activation produces a probability distribution across all classes
  • Unlike other activation functions, softmax operates on all z values collectively
  • TensorFlow implementation uses 'softmax' activation and 'SparseCategoricalCrossentropy' loss

Neural networks with softmax output layers extend the power of neural networks beyond binary classification to handle any number of classes. By replacing the single sigmoid output unit with multiple softmax units, the network can estimate the probability distribution across all possible classes, making it suitable for a wide range of real-world classification tasks.