Skip to content
Pablo Rodriguez

Evaluation

Performance Assessment
  • Evaluating model performance is essential for building effective ML systems
  • A systematic evaluation approach helps create a clearer path for improvement
  • Proper evaluation distinguishes between models that memorize training data and models that generalize well
Limitations

Example: Housing price prediction with a 4th-order polynomial

  • When using a single feature (like house size), you can plot the model
  • Visual inspection shows a “wiggly” curve that fits training data but likely won’t generalize
  • With multiple features (size, bedrooms, floors, age), plotting becomes impossible:
  • How do you visualize a function with 4+ dimensions?
  • Need a systematic approach beyond visualization
Core Technique

Instead of using all data for training, split your dataset into:

  1. Training Set: Used to train the model (typically 70-80% of data)
  2. Test Set: Used to evaluate the model (typically 20-30% of data)

Training Set Notation

  • Examples: (x₁, y₁) to (xₘ, yₘ)
  • m_train: number of training examples

Test Set Notation

  • Examples: (x₁_test, y₁_test) to (xₘ_test, yₘ_test)
  • m_test: number of test examples
Squared Error Cost
  • Fit parameters by minimizing the cost function:
  • J(w,b) = (1/2m)Σ(f(x^(i)) - y^(i))² + (λ/2m)Σ(w_j²)
  • J_test(w,b) = (1/2m_test)Σ(f(x_test^(i)) - y_test^(i))²
  • Note: Test error does not include the regularization term

Step 3: Calculate training error (optional but useful)

Section titled “Step 3: Calculate training error (optional but useful)”
  • J_train(w,b) = (1/2m_train)Σ(f(x^(i)) - y^(i))²
  • Also excludes regularization term
Two Approaches
  • Calculate average logistic loss on test set (J_test)
  • Calculate average logistic loss on training set (J_train)

Approach 2: Misclassification Rate (More Common)

Section titled “Approach 2: Misclassification Rate (More Common)”
  • J_test = Fraction of test examples misclassified
  • Count test examples where predicted y_hat ≠ actual y
  • (Predict 1 when f(x) ≥ 0.5, predict 0 when f(x) < 0.5)
  • J_train = Fraction of training examples misclassified
Key Advantages
  • Provides a systematic way to evaluate performance, especially with many features
  • Helps assess how well your model generalizes to new data
  • Comparing J_train vs J_test helps diagnose overfitting vs. underfitting
  • Forms the foundation for model selection (e.g., choosing between linear, quadratic, or higher-order models)

Train-test splitting is a fundamental technique for evaluating machine learning models. By measuring performance on unseen data, you can assess whether your model is truly learning patterns or simply memorizing the training examples. This approach serves as the foundation for more advanced model selection techniques that help you automatically choose the right model complexity for your problem.