Numerical Stability
- Avoids computing extremely small values (e^(-large_number))
- Avoids computing extremely large values (e^(large_number))
- TensorFlow can rearrange terms for more accurate computation
x = 2/10,000
directlyx = (1 + 1/10,000) - (1 - 1/10,000)
a = g(z) = 1/(1+e^(-z))
-y*log(a) - (1-y)*log(1-a)
model = tf.keras.Sequential([ # Previous layers... tf.keras.layers.Dense(1, activation='sigmoid')])
model.compile( loss='BinaryCrossentropy', optimizer='adam', metrics=['accuracy'])
a
separately, provide the complete loss expressionmodel = tf.keras.Sequential([ # Previous layers... tf.keras.layers.Dense(1, activation='linear')])
model.compile( loss=tf.keras.losses.BinaryCrossentropy(from_logits=True), optimizer='adam', metrics=['accuracy'])
aⱼ = e^(zⱼ) / Σ e^(zᵢ)
-log(aⱼ)
where j is the correct classmodel = tf.keras.Sequential([ # Previous layers... tf.keras.layers.Dense(10, activation='softmax')])
model.compile( loss='SparseCategoricalCrossentropy', optimizer='adam', metrics=['accuracy'])
model = tf.keras.Sequential([ # Previous layers... tf.keras.layers.Dense(10, activation='linear')])
model.compile( loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), optimizer='adam', metrics=['accuracy'])
Numerical Stability
Trade-offs
The improved implementation of softmax neural networks focuses on numerical stability by separating the activation and loss functions in the code, even though conceptually they remain connected. This approach allows TensorFlow to optimize the calculations internally, reducing rounding errors that can occur with very large or very small numbers.