ML Models for Bias Prediction

WIFA-UQ supports multiple machine learning models for predicting residual bias after calibration. Each model offers different trade-offs between accuracy, interpretability, and uncertainty quantification.

Overview

The bias predictor learns to predict residual model error as a function of atmospheric features:

residual_bias = f(ABL_height, wind_veer, lapse_rate, turbulence_intensity, ...)

┌─────────────────────────────────────────────────────────────────────────────┐
│                         Bias Prediction Pipeline                            │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  Input Features              ML Model                  Output               │
│  ┌─────────────────┐        ┌─────────────────┐       ┌─────────────────┐   │
│  │ ABL_height      │        │                 │       │ predicted_bias  │   │
│  │ wind_veer       │   ──►  │  XGB / PCE /    │  ──►  │                 │   │
│  │ lapse_rate      │        │  SIR / Linear   │       │ (± uncertainty) │   │
│  │ TI              │        │                 │       │                 │   │
│  │ ...             │        └─────────────────┘       └─────────────────┘   │
│  └─────────────────┘                                                        │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Model Comparison

Model	`regressor`	Interpretability	UQ Method	Best For
XGBoost	`XGB`	SHAP values	Ensemble variance	Best accuracy, feature importance
PCE	`PCE`	Sobol indices	Analytical	Smooth responses, global sensitivity
SIR Polynomial	`SIRPolynomial`	Coefficients	Analytical	Dimension reduction, interpretability
Ridge	`Linear`	Coefficients	Analytical	Baseline, linear relationships
Lasso	`Linear`	Sparse coefficients	Analytical	Feature selection
ElasticNet	`Linear`	Sparse coefficients	Analytical	Mixed regularization

Quick Start

error_prediction:
  calibrator: MinBiasCalibrator
  regressor: XGB                    # Model choice
  regressor_params:
    max_depth: 3
    n_estimators: 100
  features:
    - ABL_height
    - wind_veer
    - lapse_rate
    - turbulence_intensity

XGBoost (XGB)

Gradient boosted decision trees. Generally provides the best predictive accuracy and robust feature importance via SHAP values.

When to Use

Best accuracy is the priority
You want SHAP-based feature importance
Non-linear relationships exist between features and bias
You have sufficient training data (> 50-100 cases)

Configuration

error_prediction:
  regressor: XGB
  regressor_params:
    max_depth: 3              # Tree depth (default: 3)
    n_estimators: 100         # Number of trees (default: 100)
    learning_rate: 0.1        # Step size shrinkage (default: 0.1)
    min_child_weight: 1       # Minimum sum of instance weight in child
    subsample: 1.0            # Fraction of samples per tree
    colsample_bytree: 1.0     # Fraction of features per tree
    random_state: 42          # Reproducibility

Parameter Tuning Guide

Parameter	Low Value Effect	High Value Effect	Recommended
`max_depth`	Underfitting	Overfitting	2-5
`n_estimators`	Underfitting	Slower, diminishing returns	50-200
`learning_rate`	Needs more trees	Overfitting risk	0.05-0.3
`min_child_weight`	More complex trees	More regularization	1-10

SHAP Feature Importance

XGBoost automatically generates SHAP (SHapley Additive exPlanations) analysis:

Output files:
  shap_summary.png           # Beeswarm plot of feature impacts
  shap_feature_importance.png # Bar chart of mean |SHAP|

Interpreting SHAP plots:

Beeswarm plot: Each point is a prediction; x-axis shows impact on output; color shows feature value
Feature importance: Mean absolute SHAP value per feature

API Usage

from wifa_uq.postprocessing.error_prediction import BiasPredictor
import xarray as xr

database = xr.load_dataset("results_stacked_hh.nc")

predictor = BiasPredictor(
    database,
    regressor_name="XGB",
    regressor_params={"max_depth": 3, "n_estimators": 100},
    feature_names=["ABL_height", "wind_veer", "lapse_rate"]
)

predictor.fit(calibrated_sample_indices)

# Predict bias
predictions = predictor.predict(X_test)

# Get SHAP values
shap_values = predictor.get_shap_values()

Strengths & Limitations

Strengths: - Best out-of-the-box accuracy - Handles non-linear relationships - Robust to outliers - SHAP provides trustworthy feature importance

Limitations: - Less interpretable than linear models - Can overfit with small datasets - No analytical uncertainty (uses ensemble variance)

Polynomial Chaos Expansion (PCE)

Represents the bias function as a polynomial expansion in the input features. Provides analytical sensitivity indices (Sobol) and smooth predictions.

When to Use

You need Sobol sensitivity indices for variance decomposition
The response is expected to be smooth and polynomial-like
You want analytical uncertainty quantification
Interpretability of polynomial coefficients is valuable

Configuration

error_prediction:
  regressor: PCE
  regressor_params:
    degree: 3                 # Maximum polynomial degree (default: 3)
    q_norm: 0.75              # Hyperbolic truncation parameter (default: 0.75)
    fit_type: LeastSquares    # Fitting method: LeastSquares or LARS

Parameters Explained

Parameter	Description	Typical Range
`degree`	Maximum total polynomial degree	2-5
`q_norm`	Controls interaction term truncation (lower = sparser)	0.5-1.0
`fit_type`	`LeastSquares` (dense) or `LARS` (sparse)	-

Polynomial basis:

For features $(x_1, x_2)$ with degree=2: $$f(x) = c_0 + c_1 x_1 + c_2 x_2 + c_{11} x_1^2 + c_{12} x_1 x_2 + c_{22} x_2^2$$

Sobol Sensitivity Indices

PCE automatically computes Sobol indices for global sensitivity analysis:

Index	Meaning
First-order ($S_i$)	Variance from feature $i$ alone
Total-order ($S_{Ti}$)	Variance from $i$ including all interactions

Output files:
  sobol_indices.csv          # Numerical values
  sobol_barplot.png          # Visualization

API Usage

from wifa_uq.postprocessing.error_prediction import BiasPredictor

predictor = BiasPredictor(
    database,
    regressor_name="PCE",
    regressor_params={"degree": 3, "q_norm": 0.75},
    feature_names=["ABL_height", "wind_veer", "lapse_rate"]
)

predictor.fit(calibrated_sample_indices)

# Get Sobol indices
sobol_first, sobol_total = predictor.get_sobol_indices()
print("First-order Sobol indices:")
for feat, val in zip(predictor.feature_names, sobol_first):
    print(f"  {feat}: {val:.3f}")

Strengths & Limitations

Strengths: - Analytical Sobol indices for sensitivity analysis - Smooth, continuous predictions - Well-suited for UQ workflows - Interpretable polynomial coefficients

Limitations: - Assumes polynomial response structure - Can struggle with discontinuities or sharp transitions - Number of terms grows with degree and features - Requires careful degree selection

Sliced Inverse Regression Polynomial (SIRPolynomial)

Combines Sliced Inverse Regression (SIR) for dimension reduction with polynomial regression. Finds low-dimensional projections of features that best explain the response.

When to Use

You have many features and suspect lower-dimensional structure
You want dimension reduction combined with prediction
Interpretability of the projection directions is valuable
Linear combinations of features drive the response

Configuration

error_prediction:
  regressor: SIRPolynomial
  regressor_params:
    n_directions: 2           # Number of SIR directions to keep (default: 2)
    n_slices: 10              # Number of slices for SIR (default: 10)
    degree: 2                 # Polynomial degree in reduced space (default: 2)

How It Works

┌─────────────────────────────────────────────────────────────────────────────┐
│                         SIR Polynomial Pipeline                             │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  Original Features (p dims)        SIR Directions (d dims)     Polynomial   │
│  ┌─────────────────────┐          ┌─────────────────────┐     ┌─────────┐   │
│  │ ABL_height          │          │ z₁ = w₁·X           │     │         │   │
│  │ wind_veer           │   SIR    │ z₂ = w₂·X           │ ──► │ ŷ=P(z)  │   │
│  │ lapse_rate          │   ───►   │                     │     │         │   │
│  │ TI                  │          │ (d << p)            │     └─────────┘   │
│  │ ...                 │          └─────────────────────┘                   │
│  └─────────────────────┘                                                    │
│       (p=10)                            (d=2)                               │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

SIR finds directions $w_i$ such that $z_i = w_i \cdot X$ captures response variation
Polynomial fits $\hat{y} = P(z_1, z_2, ...)$ in the reduced space

Parameters Explained

Parameter	Description	Guidance
`n_directions`	Dimensionality of reduced space	Start with 1-2, increase if underfitting
`n_slices`	Slices for estimating inverse regression	5-20, more for larger datasets
`degree`	Polynomial degree in reduced space	1-3

Interpreting Directions

The SIR directions reveal which feature combinations matter:

predictor = BiasPredictor(
    database,
    regressor_name="SIRPolynomial",
    regressor_params={"n_directions": 2},
    feature_names=features
)
predictor.fit(calibrated_indices)

# Get SIR directions
directions = predictor.get_sir_directions()
print("SIR Direction 1:")
for feat, weight in zip(features, directions[0]):
    print(f"  {feat}: {weight:.3f}")

Example output:

SIR Direction 1:
  ABL_height: 0.72
  wind_veer: 0.45
  lapse_rate: 0.12
  TI: 0.52

This indicates the first direction is dominated by ABL_height and TI.

Strengths & Limitations

Strengths: - Effective dimension reduction - Interpretable projection directions - Handles many features gracefully - Can reveal unexpected feature combinations

Limitations: - Assumes linear projections capture response structure - Sensitive to n_slices parameter - Less common, fewer reference implementations - May miss complex non-linear interactions

Linear Regressors

Simple and interpretable models for baseline comparisons and when linear relationships are expected.

Available Variants

Variant	`regressor_params.linear_type`	Regularization	Best For
Ridge	`ridge` (default)	L2 (shrinkage)	Collinear features, default
Lasso	`lasso`	L1 (sparsity)	Feature selection
ElasticNet	`elasticnet`	L1 + L2	Mixed benefits
OLS	`ols`	None	Simple baseline

Configuration

Ridge (default):

error_prediction:
  regressor: Linear
  regressor_params:
    linear_type: ridge
    alpha: 1.0                # Regularization strength

Lasso:

error_prediction:
  regressor: Linear
  regressor_params:
    linear_type: lasso
    alpha: 0.1

ElasticNet:

error_prediction:
  regressor: Linear
  regressor_params:
    linear_type: elasticnet
    alpha: 0.5
    l1_ratio: 0.5            # Balance between L1 and L2

Interpreting Coefficients

Linear models provide direct coefficient interpretation:

predictor = BiasPredictor(
    database,
    regressor_name="Linear",
    regressor_params={"linear_type": "ridge", "alpha": 1.0},
    feature_names=features
)
predictor.fit(calibrated_indices)

# Get coefficients
coefs = predictor.get_coefficients()
print("Feature coefficients:")
for feat, coef in zip(features, coefs):
    print(f"  {feat}: {coef:.4f}")

Interpretation: A coefficient of 0.002 for ABL_height means a 1-unit increase in ABL_height increases predicted bias by 0.002 (normalized units).

When to Use Each

Scenario	Recommended
Baseline comparison	Ridge or OLS
Many correlated features	Ridge
Want automatic feature selection	Lasso
Uncertain about regularization type	ElasticNet
Very few features, lots of data	OLS

Strengths & Limitations

Strengths: - Highly interpretable coefficients - Fast training and prediction - Analytical uncertainty estimates - Good baseline for comparison

Limitations: - Cannot capture non-linear relationships - May underfit complex responses - Feature engineering required for interactions

Model Selection Guide

Decision Tree

                        Start Here
                             │
                             ▼
              ┌─────────────────────────────┐
              │  Need Sobol sensitivity     │
              │  indices?                   │
              └─────────────────────────────┘
                      │           │
                     Yes          No
                      │           │
                      ▼           ▼
                    PCE    ┌─────────────────────────────┐
                           │  Many features (>5) with    │
                           │  suspected low-dim structure?│
                           └─────────────────────────────┘
                                  │           │
                                 Yes          No
                                  │           │
                                  ▼           ▼
                           SIRPolynomial  ┌─────────────────────────────┐
                                          │  Need best accuracy or     │
                                          │  SHAP importance?          │
                                          └─────────────────────────────┘
                                                  │           │
                                                 Yes          No
                                                  │           │
                                                  ▼           ▼
                                                XGB      ┌─────────────────────┐
                                                         │  Simple baseline or │
                                                         │  linear expected?   │
                                                         └─────────────────────┘
                                                                  │
                                                                  ▼
                                                               Linear

Comparison by Criterion

Criterion	Best Choice	Second Choice
Predictive accuracy	XGB	PCE
Interpretability	Linear	SIRPolynomial
Sensitivity analysis	PCE (Sobol)	XGB (SHAP)
Small dataset (<50)	Linear	PCE (low degree)
Many features	SIRPolynomial	XGB
Production deployment	XGB or Linear	PCE

Common Configuration Patterns

High-Accuracy Setup

error_prediction:
  regressor: XGB
  regressor_params:
    max_depth: 4
    n_estimators: 200
    learning_rate: 0.05
    min_child_weight: 3
  features:
    - ABL_height
    - wind_veer
    - lapse_rate
    - turbulence_intensity
    - wind_speed

Sensitivity Analysis Setup

error_prediction:
  regressor: PCE
  regressor_params:
    degree: 3
    q_norm: 0.75
  features:
    - ABL_height
    - wind_veer
    - lapse_rate
    - turbulence_intensity

sensitivity_analysis:
  output_sobol: true

Interpretable Baseline

error_prediction:
  regressor: Linear
  regressor_params:
    linear_type: ridge
    alpha: 1.0
  features:
    - ABL_height
    - wind_veer
    - lapse_rate

Dimension Reduction Setup

error_prediction:
  regressor: SIRPolynomial
  regressor_params:
    n_directions: 2
    n_slices: 10
    degree: 2
  features:
    - ABL_height
    - wind_veer
    - lapse_rate
    - turbulence_intensity
    - capping_inversion_strength
    - capping_inversion_thickness
    - wind_speed
    - wind_direction

Troubleshooting

"XGBoost is overfitting"

Symptoms: Training RMSE much lower than test RMSE.

Solutions: - Reduce max_depth (try 2-3) - Reduce n_estimators - Increase min_child_weight - Add regularization: reg_alpha, reg_lambda

"PCE predictions are unstable"

Symptoms: Large prediction variance, coefficients vary across CV folds.

Solutions: - Reduce degree - Increase q_norm closer to 1.0 - Use fit_type: LARS for sparse solution - Ensure features are scaled

"Linear model has high bias"

Symptoms: Both training and test errors are high.

Solutions: - Add polynomial features manually - Add interaction terms - Switch to XGB or PCE for non-linear capture - Check if features are properly normalized

"SIRPolynomial gives poor results"

Symptoms: Predictions don't track actual values.

Solutions: - Increase n_directions - Adjust n_slices (try 5-20) - Increase degree in reduced space - Verify sufficient data for dimension reduction

ML Models for Bias Prediction

Overview

Model Comparison

Quick Start

XGBoost (XGB)

When to Use

Configuration

Parameter Tuning Guide

SHAP Feature Importance

API Usage

Strengths & Limitations

Polynomial Chaos Expansion (PCE)

When to Use

Configuration

Parameters Explained

Sobol Sensitivity Indices

API Usage

Strengths & Limitations

Sliced Inverse Regression Polynomial (SIRPolynomial)

When to Use

Configuration

How It Works

Parameters Explained

Interpreting Directions

Strengths & Limitations

Linear Regressors

Available Variants

Configuration

Interpreting Coefficients

When to Use Each

Strengths & Limitations

Model Selection Guide

Decision Tree

Comparison by Criterion

Common Configuration Patterns

High-Accuracy Setup

Sensitivity Analysis Setup

Interpretable Baseline

Dimension Reduction Setup

Troubleshooting

"XGBoost is overfitting"

"PCE predictions are unstable"

"Linear model has high bias"

"SIRPolynomial gives poor results"

See Also