Too Small Learning Rate
- When starting far from minimum
- Steps are small and in similar directions
- Convergence is unnecessarily slow
- Ideally would increase learning rate
Too Small Learning Rate
Too Large Learning Rate
If parameter w_j consistently moves in same direction:
Increase the learning rate for that specific parameter
Accelerates progress toward minimum
If parameter oscillates back and forth:
Decrease the learning rate for that specific parameter
Prevents bouncing and allows convergence
model = Sequential([ Dense(25, activation='relu'), Dense(15, activation='relu'), Dense(1, activation='sigmoid')])
model.compile( loss=tf.keras.losses.BinaryCrossentropy(), optimizer=tf.keras.optimizers.Adam(1e-3), # Initial learning rate of 0.001)
The Adam optimization algorithm provides significant training speedups by adaptively adjusting learning rates for each parameter. Instead of using a single global learning rate, it increases rates for parameters moving consistently in one direction and decreases rates for oscillating parameters. This adaptive approach has made Adam the preferred choice for most modern neural network training.