Evaluation

Evaluating a Machine Learning Model

Performance Assessment

Introduction to Model Evaluation

Evaluating model performance is essential for building effective ML systems
A systematic evaluation approach helps create a clearer path for improvement
Proper evaluation distinguishes between models that memorize training data and models that generalize well

The Problem with Visual Inspection

Limitations

Example: Housing price prediction with a 4th-order polynomial

When using a single feature (like house size), you can plot the model
Visual inspection shows a “wiggly” curve that fits training data but likely won’t generalize
With multiple features (size, bedrooms, floors, age), plotting becomes impossible:
How do you visualize a function with 4+ dimensions?
Need a systematic approach beyond visualization

The Train-Test Split Approach

Core Technique

Instead of using all data for training, split your dataset into:

Training Set: Used to train the model (typically 70-80% of data)
Test Set: Used to evaluate the model (typically 20-30% of data)

Training Set Notation

Examples: (x₁, y₁) to (xₘ, yₘ)
m_train: number of training examples

Test Set Notation

Examples: (x₁_test, y₁_test) to (xₘ_test, yₘ_test)
m_test: number of test examples

Evaluation Process for Regression

Squared Error Cost

Step 1: Train your model

Fit parameters by minimizing the cost function:
J(w,b) = (1/2m)Σ(f(x^(i)) - y^(i))² + (λ/2m)Σ(w_j²)

Step 2: Calculate test error

J_test(w,b) = (1/2m_test)Σ(f(x_test^(i)) - y_test^(i))²
Note: Test error does not include the regularization term

Step 3: Calculate training error (optional but useful)

J_train(w,b) = (1/2m_train)Σ(f(x^(i)) - y^(i))²
Also excludes regularization term

Evaluation Process for Classification

Two Approaches

Approach 1: Using the Loss Function

Calculate average logistic loss on test set (J_test)
Calculate average logistic loss on training set (J_train)

Approach 2: Misclassification Rate (More Common)

J_test = Fraction of test examples misclassified
Count test examples where predicted y_hat ≠ actual y
(Predict 1 when f(x) ≥ 0.5, predict 0 when f(x) < 0.5)
J_train = Fraction of training examples misclassified

Benefits of Train-Test Split

Key Advantages

Provides a systematic way to evaluate performance, especially with many features
Helps assess how well your model generalizes to new data
Comparing J_train vs J_test helps diagnose overfitting vs. underfitting
Forms the foundation for model selection (e.g., choosing between linear, quadratic, or higher-order models)

Train-test splitting is a fundamental technique for evaluating machine learning models. By measuring performance on unseen data, you can assess whether your model is truly learning patterns or simply memorizing the training examples. This approach serves as the foundation for more advanced model selection techniques that help you automatically choose the right model complexity for your problem.