Local Calibration
Local calibration predicts optimal parameters for each flow case based on atmospheric conditions. Instead of a single parameter set, an ML model learns how optimal parameters vary with features like ABL height, wind veer, and stability.
Overview
Local calibration answers the question: "What parameters work best for this specific atmospheric condition?"
┌─────────────────────────────────────────────────────────────────────────────┐
│ Local Calibration │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ Training Phase: │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ For each flow case, find optimal sample: │ │
│ │ │ │
│ │ Case 1: ABL=500m, veer=0.01 → optimal k_b=0.035, α=0.88 │ │
│ │ Case 2: ABL=800m, veer=0.02 → optimal k_b=0.052, α=0.92 │ │
│ │ Case 3: ABL=300m, veer=0.00 → optimal k_b=0.028, α=0.85 │ │
│ │ ... │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ Train ML model: │ │
│ │ │ │
│ │ optimal_params = f(ABL_height, wind_veer, lapse_rate, ...) │ │
│ │ │ │
│ │ Using: Ridge, RandomForest, XGBoost, etc. │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
│ Prediction Phase: │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ New case: ABL=650m, veer=0.015 │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ ML model predicts: k_b=0.044, α=0.90 │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ Find closest sample in database → use that for bias prediction │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
When to Use Local Calibration
Recommended when:
- Atmospheric conditions vary significantly across your dataset
- Global calibration leaves systematic patterns in residuals
- You have sufficient data to train a reliable parameter predictor (> 100 cases recommended)
- Physical reasoning suggests parameters should vary with conditions
Stay with global calibration when:
- Limited reference data (< 50-100 cases)
- Conditions are relatively homogeneous
- Deployment simplicity is critical
- You're establishing a baseline
How It Works
Step 1: Find Per-Case Optimal Parameters
For each flow case in the training set, identify which parameter sample minimizes bias:
for case_idx in range(n_cases):
# Get bias across all samples for this case
bias_values = database["model_bias_cap"].isel(case_index=case_idx)
# Find sample with minimum |bias|
best_sample_idx = argmin(|bias_values|)
# Record optimal parameters for this case
optimal_k_b[case_idx] = database.k_b[best_sample_idx]
optimal_ss_alpha[case_idx] = database.ss_alpha[best_sample_idx]
Step 2: Train Parameter Predictor
Train an ML model to predict optimal parameters from atmospheric features:
# Features (same for all samples - they're physical conditions)
X = database[["ABL_height", "wind_veer", "lapse_rate"]].isel(sample=0)
# Targets (optimal parameter values found in step 1)
y = [optimal_k_b, optimal_ss_alpha]
# Train regressor
regressor.fit(X, y)
Step 3: Predict for New Cases
For test/new cases, predict optimal parameters and find the closest sample:
# Predict optimal parameters
predicted_params = regressor.predict(X_new)
# Find database sample closest to predicted values
closest_sample_idx = find_closest_sample(database, predicted_params)
# Use that sample's bias for prediction
test_bias = database["model_bias_cap"].sel(sample=closest_sample_idx)
Configuration
error_prediction:
calibrator: LocalParameterPredictor
# ML regressor for parameter prediction
local_regressor: Ridge # Options: Linear, Ridge, Lasso, ElasticNet, RandomForest, XGB
local_regressor_params:
alpha: 1.0 # Regularization strength
# Features used for parameter prediction (same as bias prediction)
features:
- ABL_height
- wind_veer
- lapse_rate
Available Regressors
| Regressor | local_regressor |
Best For | Key Parameters |
|---|---|---|---|
| Linear Regression | Linear |
Baseline, interpretability | None |
| Ridge Regression | Ridge |
Collinear features, default choice | alpha |
| Lasso Regression | Lasso |
Feature selection | alpha |
| ElasticNet | ElasticNet |
Mixed L1/L2 regularization | alpha, l1_ratio |
| Random Forest | RandomForest |
Non-linear relationships | n_estimators, max_depth |
| XGBoost | XGB |
Complex patterns | max_depth, learning_rate |
Regressor Configuration Examples
Ridge (recommended default):
local_regressor: Ridge
local_regressor_params:
alpha: 1.0
Random Forest:
local_regressor: RandomForest
local_regressor_params:
n_estimators: 100
max_depth: 5
random_state: 42
XGBoost:
local_regressor: XGB
local_regressor_params:
max_depth: 3
n_estimators: 100
learning_rate: 0.1
ElasticNet:
local_regressor: ElasticNet
local_regressor_params:
alpha: 0.5
l1_ratio: 0.5
API Usage
Basic Usage
from wifa_uq.postprocessing.calibration import LocalParameterPredictor
import xarray as xr
import pandas as pd
# Load database
database = xr.load_dataset("results_stacked_hh.nc")
# Initialize with features
calibrator = LocalParameterPredictor(
database,
feature_names=["ABL_height", "wind_veer", "lapse_rate"],
regressor_name="Ridge",
regressor_params={"alpha": 1.0}
)
# Fit the parameter predictor
calibrator.fit()
# Get optimal sample indices for training data
optimal_indices = calibrator.get_optimal_indices()
print(f"Optimal indices shape: {optimal_indices.shape}") # (n_cases,)
# Predict optimal parameters for new data
new_features = pd.DataFrame({
"ABL_height": [500, 700, 900],
"wind_veer": [0.01, 0.02, 0.015],
"lapse_rate": [0.003, 0.005, 0.004]
})
predicted_params = calibrator.predict(new_features)
print(predicted_params)
# k_b ss_alpha
# 0 0.038 0.89
# 1 0.048 0.91
# 2 0.043 0.90
Properties After Fitting
| Property | Type | Description |
|---|---|---|
optimal_indices_ |
ndarray | Per-case optimal sample indices |
optimal_params_ |
dict | Per-case optimal parameter values |
swept_params |
list | Names of swept parameters |
regressor |
estimator | Fitted ML regressor |
is_fitted |
bool | Whether fit() has been called |
Methods
| Method | Description |
|---|---|
fit() |
Train the parameter predictor |
predict(X) |
Predict optimal parameters for new features |
get_optimal_indices() |
Get per-case optimal sample indices |
Diagnostics
Parameter Prediction Quality Plot
When using local calibration with cross-validation, WIFA-UQ automatically generates a diagnostic plot showing how well parameters are predicted:
local_parameter_prediction.png
This plot shows predicted vs. actual optimal parameters for each swept parameter, with R² scores indicating prediction quality.
Interpreting the plot:
- High R² (> 0.7): Parameter predictor captures the relationship well
- Low R² (< 0.3): Parameters may not vary systematically with features, or features are insufficient
- Points along 1:1 line: Good predictions
- Systematic offset: Bias in parameter prediction
Manual Diagnostics
import matplotlib.pyplot as plt
import numpy as np
# After fitting
calibrator = LocalParameterPredictor(database, feature_names=[...])
calibrator.fit()
# Get training data
X_train = database.isel(sample=0).to_dataframe().reset_index()[calibrator.feature_names]
optimal_k_b = calibrator.optimal_params_["k_b"]
# Visualize relationship
plt.figure(figsize=(12, 4))
for i, feature in enumerate(calibrator.feature_names):
plt.subplot(1, len(calibrator.feature_names), i+1)
plt.scatter(X_train[feature], optimal_k_b, alpha=0.5)
plt.xlabel(feature)
plt.ylabel("Optimal k_b")
plt.title(f"k_b vs {feature}")
plt.tight_layout()
plt.savefig("parameter_relationships.png")
Comparison: Global vs Local
Running Both
from wifa_uq.postprocessing.calibration import MinBiasCalibrator, LocalParameterPredictor
from sklearn.model_selection import KFold
import numpy as np
database = xr.load_dataset("results_stacked_hh.nc")
features = ["ABL_height", "wind_veer", "lapse_rate"]
cv = KFold(n_splits=5, shuffle=True, random_state=42)
global_rmse = []
local_rmse = []
for train_idx, test_idx in cv.split(database.case_index):
train_data = database.isel(case_index=train_idx)
test_data = database.isel(case_index=test_idx)
# Global calibration
global_cal = MinBiasCalibrator(train_data)
global_cal.fit()
global_bias = test_data["model_bias_cap"].sel(sample=global_cal.best_idx_)
global_rmse.append(np.sqrt(np.mean(global_bias.values**2)))
# Local calibration
local_cal = LocalParameterPredictor(train_data, feature_names=features)
local_cal.fit()
# Predict optimal params for test cases
X_test = test_data.isel(sample=0).to_dataframe().reset_index()[features]
pred_params = local_cal.predict(X_test)
# Find closest samples
local_bias = []
for i, row in pred_params.iterrows():
closest_sample = find_closest_sample(test_data, row)
local_bias.append(test_data["model_bias_cap"].isel(case_index=i, sample=closest_sample).values)
local_rmse.append(np.sqrt(np.mean(np.array(local_bias)**2)))
print(f"Global RMSE: {np.mean(global_rmse):.4f} ± {np.std(global_rmse):.4f}")
print(f"Local RMSE: {np.mean(local_rmse):.4f} ± {np.std(local_rmse):.4f}")
Expected Results
| Scenario | Global RMSE | Local RMSE | Recommendation |
|---|---|---|---|
| Homogeneous conditions | 0.045 | 0.044 | Use global (simpler) |
| Varying stability | 0.055 | 0.038 | Use local |
| Limited data (n<50) | 0.050 | 0.065 | Use global (local overfits) |
Best Practices
1. Start with Global Calibration
Always establish a global baseline first:
# First run
error_prediction:
calibrator: MinBiasCalibrator
# Then compare with
error_prediction:
calibrator: LocalParameterPredictor
2. Use Regularization
Local calibration can overfit with limited data. Start with regularized models:
# Good default
local_regressor: Ridge
local_regressor_params:
alpha: 1.0
# If underfitting, reduce regularization
local_regressor_params:
alpha: 0.1
3. Choose Appropriate Features
Features should have physical connections to wake model parameters:
| Feature | Affects | Physical Reasoning |
|---|---|---|
| ABL height | Wake expansion (k_b) | Deeper boundary layers allow more wake spreading |
| Wind veer | Wake deflection | Directional shear affects wake trajectory |
| Turbulence intensity | Wake recovery | Higher TI → faster wake recovery → different k_b |
| Stability (lapse rate) | Overall wake behavior | Stable conditions suppress mixing |
4. Check for Sufficient Variation
Parameters can only be predicted if they actually vary with features:
# Check variation in optimal parameters
optimal_k_b = calibrator.optimal_params_["k_b"]
print(f"k_b range: {optimal_k_b.min():.3f} - {optimal_k_b.max():.3f}")
print(f"k_b std: {optimal_k_b.std():.4f}")
# If std is very small, parameters don't vary much with conditions
# → Global calibration may be sufficient
5. Validate with Cross-Validation
Always use cross-validation to assess generalization:
cross_validation:
splitting_mode: kfold_shuffled
n_splits: 5
Compare CV metrics between global and local calibration.
Troubleshooting
"Local calibration is worse than global"
Causes: - Insufficient training data → overfitting - Features don't predict optimal parameters well - Too flexible regressor (e.g., deep Random Forest)
Solutions:
- Increase regularization (alpha for Ridge)
- Use simpler regressor (Ridge instead of RandomForest)
- Add more relevant features
- Increase training data if possible
"Parameter predictions are constant"
Causes: - Features don't vary enough in your dataset - Optimal parameters don't actually depend on features - Regressor underfitting
Solutions:
- Check feature variance: database["ABL_height"].std()
- Examine relationships manually (scatter plots)
- Try less regularization or more flexible model
"Predicted parameters outside valid range"
Causes: - Extrapolation beyond training data range - Regressor predicting unrealistic values
Solutions: - The pipeline uses the closest sample in the database, so extreme predictions are automatically bounded - For deployment, consider adding explicit parameter bounds
"High variance across CV folds"
Causes: - Insufficient data in each fold - Unstable parameter-feature relationships - Noisy reference data
Solutions: - Use fewer CV splits (e.g., 3 instead of 5) - Increase regularization - Consider global calibration for more stable results
Integration with Bias Prediction
Local calibration feeds into the bias prediction pipeline:
┌─────────────────────────────────────────────────────────────────────────────┐
│ Stage 1: Local Calibration │
│ For each case: θ*(case) = LocalParameterPredictor(features) │
│ │
│ Stage 2: Bias Extraction │
│ For each case: residual_bias = database["model_bias_cap"][θ*(case)] │
│ │
│ Stage 3: Bias Prediction (ML) │
│ Learn: residual_bias = BiasPredictor(features) │
│ │
│ Final: corrected_power = model(θ*(features)) - predicted_residual_bias │
└─────────────────────────────────────────────────────────────────────────────┘
The key insight: local calibration already reduces bias by using condition-appropriate parameters. The bias predictor then learns the remaining residual patterns.
See Also
- Global Calibration — Simpler single-parameter approach
- Calibration Theory — Mathematical foundations
- Cross-Validation — Validation strategies
- Configuration Reference — Full YAML options