Configuration Reference
WIFA-UQ workflows are driven by YAML configuration files. This page provides a complete reference for all available options.
Overview
A configuration file controls the entire workflow:
description: "Human-readable description of this workflow"
paths:
# Input and output file locations
preprocessing:
# Data preparation options
database_gen:
# Model error database generation
error_prediction:
# ML-based bias prediction and cross-validation
sensitivity_analysis:
# Feature importance analysis
Minimal Configuration
WIFA-UQ uses smart path inference from windIO structures. The minimal configuration requires only:
paths:
system_config: path/to/wind_energy_system.yaml
output_dir: results/
preprocessing:
run: true
steps: [recalculate_params]
database_gen:
run: true
flow_model: pywake
n_samples: 100
param_config:
attributes.analysis.wind_deficit_model.wake_expansion_coefficient.k_b:
range: [0.01, 0.07]
default: 0.04
short_name: k_b
error_prediction:
run: true
features: [ABL_height, wind_veer, lapse_rate]
model: XGB
calibrator: MinBiasCalibrator
bias_predictor: BiasPredictor
cross_validation:
splitting_mode: kfold_shuffled
n_splits: 5
Other paths (reference_power, reference_resource, wind_farm_layout) are automatically inferred from the windIO !include chain.
Section Reference
paths
Specifies input data and output locations. All paths are relative to the config file's directory.
| Key | Required | Description |
|---|---|---|
system_config |
Yes | Path to windIO system YAML (wind_energy_system.yaml) |
reference_power |
No* | NetCDF with observed/LES power data |
reference_resource |
No* | NetCDF with atmospheric profiles |
wind_farm_layout |
No* | YAML with turbine positions and specs |
output_dir |
No | Output directory (default: wifa_uq_results/) |
processed_resource_file |
No | Preprocessed resource filename (default: processed_physical_inputs.nc) |
database_file |
No | Database filename (default: results_stacked_hh.nc) |
*These paths are automatically inferred from the windIO system config if not specified.
Path Inference
WIFA-UQ follows the windIO !include chain to discover files:
system.yaml
├── site: !include energy_site.yaml
│ └── energy_resource: !include resource.nc → reference_resource
├── wind_farm: !include wind_farm.yaml → wind_farm_layout
└── simulation_outputs: !include outputs.yaml
└── turbine_data: !include power.nc → reference_power
Explicit Path Override
You can always specify paths explicitly to override inference:
paths:
system_config: system.yaml
reference_power: custom/path/to/power.nc # Overrides inferred path
reference_resource: custom/path/to/resource.nc
wind_farm_layout: custom/path/to/layout.yaml
output_dir: my_results/
preprocessing
Controls data preparation before database generation.
| Key | Type | Default | Description |
|---|---|---|---|
run |
bool | false |
Whether to run preprocessing |
steps |
list | [] |
Preprocessing steps to apply |
Available Steps
| Step | Description |
|---|---|
recalculate_params |
Calculate derived quantities from vertical profiles |
preprocessing:
run: true
steps:
- recalculate_params
See Preprocessing for detailed documentation on each step.
database_gen
Controls the model error database generation via parameter sweeps.
| Key | Type | Default | Description |
|---|---|---|---|
run |
bool | false |
Whether to generate the database |
flow_model |
string | "pywake" |
Wake model to use ("pywake") |
n_samples |
int | 100 |
Number of parameter samples |
param_config |
dict | — | Parameters to sweep (see below) |
param_config Format
Parameters can be specified in two formats:
Short format (range only):
param_config:
attributes.analysis.wind_deficit_model.wake_expansion_coefficient.k_b: [0.01, 0.07]
Full format (with metadata):
param_config:
attributes.analysis.wind_deficit_model.wake_expansion_coefficient.k_b:
range: [0.01, 0.07] # [min, max] sampling bounds
default: 0.04 # Default value (first sample uses this)
short_name: k_b # Name used in database coordinates
Common Swept Parameters
| Parameter Path | Short Name | Typical Range | Description |
|---|---|---|---|
attributes.analysis.wind_deficit_model.wake_expansion_coefficient.k_b |
k_b |
[0.01, 0.07] | Wake expansion coefficient |
attributes.analysis.wind_deficit_model.ceps |
ceps |
[0.15, 0.3] | Bastankhah epsilon coefficient |
attributes.analysis.blockage_model.ss_alpha |
ss_alpha |
[0.75, 1.0] | Self-similarity blockage alpha |
See Database Generation for complete parameter reference.
error_prediction
Configures the ML-based bias prediction pipeline.
| Key | Type | Default | Description |
|---|---|---|---|
run |
bool | false |
Whether to run error prediction |
features |
list | — | Feature names for ML model (required) |
model |
string | "XGB" |
ML model type |
model_params |
dict | {} |
Model-specific hyperparameters |
calibrator |
string | — | Calibrator class name (required) |
local_regressor |
string | "Ridge" |
Regressor for local calibration |
local_regressor_params |
dict | {} |
Local regressor hyperparameters |
bias_predictor |
string | "BiasPredictor" |
Predictor class name |
cross_validation |
dict | — | CV configuration (see below) |
Available Features
Features from preprocessing:
| Feature | Units | Description |
|---|---|---|
ABL_height |
m | Atmospheric boundary layer height |
wind_veer |
deg/m | Wind direction change with height |
lapse_rate |
K/m | Potential temperature gradient |
turbulence_intensity |
— | TI from TKE profile |
wind_speed |
m/s | Wind speed at hub height |
wind_direction |
deg | Wind direction at hub height |
Features from database generation:
| Feature | Units | Description |
|---|---|---|
Blockage_Ratio |
— | Fraction of rotor blocked [0-1] |
Blocking_Distance |
— | Normalized distance to blockers [0-1] |
Farm_Length |
D | Farm extent in wind direction |
Farm_Width |
D | Farm extent perpendicular to wind |
Available Models
| Model | model value |
Description |
|---|---|---|
| XGBoost | "XGB" |
Gradient boosting (default, uses SHAP) |
| SIR + Polynomial | "SIRPolynomial" |
Dimension reduction + polynomial |
| PCE | "PCE" |
Polynomial Chaos Expansion |
| Linear | "Linear" |
OLS/Ridge/Lasso/ElasticNet |
XGBoost parameters:
model: XGB
model_params:
max_depth: 4
n_estimators: 200
learning_rate: 0.1
random_state: 42
PCE parameters:
model: PCE
model_params:
degree: 5 # Polynomial degree
marginals: kernel # "kernel", "uniform", "normal"
copula: independent # "independent" or "normal"
q: 0.5 # Hyperbolic truncation parameter
max_features: 5 # Safety limit on input dimension
allow_high_dim: false # Allow > max_features inputs
SIR+Polynomial parameters:
model: SIRPolynomial
model_params:
n_directions: 1 # Number of SIR directions
degree: 2 # Polynomial degree
Linear parameters:
model: Linear
model_params:
method: ridge # "ols", "ridge", "lasso", "elasticnet"
alpha: 1.0 # Regularization strength
l1_ratio: 0.5 # ElasticNet mixing (only for elasticnet)
Available Calibrators
| Calibrator | Mode | Description |
|---|---|---|
MinBiasCalibrator |
Global | Single parameter set minimizing total bias |
DefaultParams |
Global | Use default parameter values |
LocalParameterPredictor |
Local | ML-predicted params per flow case |
BayesianCalibration |
Global | Bayesian inference (requires UMBRA) |
For local calibration, specify the regressor:
calibrator: LocalParameterPredictor
local_regressor: Ridge # Linear, Ridge, Lasso, ElasticNet, RandomForest, XGB
local_regressor_params:
alpha: 1.0
Cross-Validation Configuration
cross_validation:
run: true
splitting_mode: kfold_shuffled # or LeaveOneGroupOut
n_splits: 5 # For KFold only
metrics:
- rmse
- r2
- mae
For multi-farm LeaveOneGroupOut:
cross_validation:
splitting_mode: LeaveOneGroupOut
groups:
Offshore:
- Farm1
- Farm2
Onshore:
- Farm3
- Farm4
Group names must match the name field in your farms list (multi-farm config) or the wind_farm coordinate in your database.
sensitivity_analysis
Controls feature importance and sensitivity analysis.
| Key | Type | Default | Description |
|---|---|---|---|
run_observation_sensitivity |
bool | false |
Run SA on observations |
run_bias_sensitivity |
bool | false |
Run SA on bias predictions |
method |
string | "auto" |
SA method ("auto", "shap", "sir", "pce_sobol") |
pce_config |
dict | {} |
PCE config for Sobol indices |
sensitivity_analysis:
run_observation_sensitivity: true
run_bias_sensitivity: true
method: auto # Uses SHAP for XGB, SIR for SIRPolynomial
pce_config: # Only for method: pce_sobol
degree: 5
marginals: kernel
copula: independent
q: 0.5
model_coeff_name: None
plot_options:
scatter: False
distribution: False
metrics: ["RMSE", "R2", "Wasserstein", "KS", "KL"]
Method selection:
- auto: SHAP for tree models, SIR directions for SIR models, Sobol for PCE
- shap: Force SHAP TreeExplainer (requires tree model)
- sir: Force SIR direction coefficients
- pce_sobol: Force PCE-based Sobol indices
Multi-Farm Configuration
For workflows spanning multiple wind farms, use the farms key:
paths:
output_dir: results/multi_farm/
database_file: combined_database.nc
farms:
- name: Farm1 # Required: unique identifier
system_config: data/farm1/wind_energy_system.yaml # Required
# Optional explicit paths (otherwise inferred):
# reference_power: data/farm1/power.nc
# reference_resource: data/farm1/resource.nc
# wind_farm_layout: data/farm1/wind_farm.yaml
- name: Farm2
system_config: data/farm2/wind_energy_system.yaml
- name: Farm3
system_config: data/farm3/wind_energy_system.yaml
preprocessing:
run: true
steps: [recalculate_params]
database_gen:
run: true
flow_model: pywake
n_samples: 100
param_config:
# Shared across all farms
attributes.analysis.wind_deficit_model.wake_expansion_coefficient.k_b:
range: [0.01, 0.07]
default: 0.04
short_name: k_b
error_prediction:
run: true
features: [ABL_height, wind_veer, lapse_rate]
model: XGB
calibrator: MinBiasCalibrator
bias_predictor: BiasPredictor
cross_validation:
splitting_mode: LeaveOneGroupOut
groups:
Group1:
- Farm1
- Farm2
Group2:
- Farm3
Each farm requires:
- name: Unique identifier (used in CV grouping)
- system_config: Path to windIO system YAML
Other paths are auto-inferred per farm using the same logic as single-farm mode.
Complete Example
Here's a fully-specified configuration showing all available options:
description: "Complete WIFA-UQ configuration example"
paths:
# Required
system_config: wind_energy_system/system_pywake.yaml
# Optional (auto-inferred if omitted)
reference_power: observed_output/observedPower.nc
reference_resource: plant_energy_resource/originalData.nc
wind_farm_layout: plant_wind_farm/wind_farm.yaml
# Output paths
output_dir: wifa_uq_results/
processed_resource_file: processed_physical_inputs.nc
database_file: results_stacked_hh.nc
preprocessing:
run: true
steps:
- recalculate_params
database_gen:
run: true
flow_model: pywake
n_samples: 100
param_config:
attributes.analysis.wind_deficit_model.wake_expansion_coefficient.k_b:
range: [0.01, 0.07]
default: 0.04
short_name: k_b
attributes.analysis.wind_deficit_model.ceps:
range: [0.15, 0.3]
default: 0.2154
short_name: ceps
attributes.analysis.blockage_model.ss_alpha:
range: [0.75, 1.0]
default: 0.875
short_name: ss_alpha
error_prediction:
run: true
features:
- ABL_height
- wind_veer
- lapse_rate
- Blockage_Ratio
- Blocking_Distance
- Farm_Length
- Farm_Width
model: XGB
model_params:
max_depth: 4
n_estimators: 200
learning_rate: 0.1
random_state: 42
calibrator: LocalParameterPredictor
local_regressor: Ridge
local_regressor_params:
alpha: 1.0
bias_predictor: BiasPredictor
cross_validation:
run: true
splitting_mode: kfold_shuffled
n_splits: 5
metrics:
- rmse
- r2
- mae
sensitivity_analysis:
run_observation_sensitivity: true
run_bias_sensitivity: true
method: auto
pce_config:
degree: 5
marginals: kernel
copula: independent
q: 0.5
model_coeff_name: 'k_b'
plot_options:
scatter: False
distribution: False
metrics: ["RMSE", "R2", "Wasserstein", "KS", "KL"]
Workflow Execution
Configurations are executed via the run.py script:
cd examples
python run.py my_config.yaml
Or programmatically:
from wifa_uq.workflow import run_workflow
from pathlib import Path
cv_results, y_preds, y_tests = run_workflow(Path("my_config.yaml"))
Output Files
After a successful run, the output directory contains:
| File | Description |
|---|---|
processed_physical_inputs.nc |
Preprocessed atmospheric data |
results_stacked_hh.nc |
Model error database |
cv_results.csv |
Cross-validation metrics per fold |
predictions.npz |
Raw predictions array |
correction_results.png |
Before/after correction scatter plots |
bias_prediction_shap.png |
SHAP beeswarm plot (XGBoost) |
bias_prediction_shap_importance.png |
SHAP bar chart (XGBoost) |
bias_prediction_sir_importance.png |
SIR importance (SIRPolynomial) |
pce_sobol_indices.png |
Sobol indices (PCE) |
local_parameter_prediction.png |
Parameter prediction quality (local calibration) |
For multi-farm workflows, additional plots are generated:
- cv_fold_metrics.png
- cv_fold_heatmap.png
- cv_predictions_by_fold.png
- cv_generalization_summary.png
See Also
- Preprocessing — Details on preprocessing steps
- Database Generation — Parameter sweep mechanics
- Multi-Farm Workflows — Cross-farm studies
- windIO Integration — Data format standards