Skip to content

Configuration Reference

WIFA-UQ workflows are driven by YAML configuration files. This page provides a complete reference for all available options.

Overview

A configuration file controls the entire workflow:

description: "Human-readable description of this workflow"

paths:
  # Input and output file locations

preprocessing:
  # Data preparation options

database_gen:
  # Model error database generation

error_prediction:
  # ML-based bias prediction and cross-validation

sensitivity_analysis:
  # Feature importance analysis

Minimal Configuration

WIFA-UQ uses smart path inference from windIO structures. The minimal configuration requires only:

paths:
  system_config: path/to/wind_energy_system.yaml
  output_dir: results/

preprocessing:
  run: true
  steps: [recalculate_params]

database_gen:
  run: true
  flow_model: pywake
  n_samples: 100
  param_config:
    attributes.analysis.wind_deficit_model.wake_expansion_coefficient.k_b:
      range: [0.01, 0.07]
      default: 0.04
      short_name: k_b

error_prediction:
  run: true
  features: [ABL_height, wind_veer, lapse_rate]
  model: XGB
  calibrator: MinBiasCalibrator
  bias_predictor: BiasPredictor
  cross_validation:
    splitting_mode: kfold_shuffled
    n_splits: 5

Other paths (reference_power, reference_resource, wind_farm_layout) are automatically inferred from the windIO !include chain.

Section Reference

paths

Specifies input data and output locations. All paths are relative to the config file's directory.

Key Required Description
system_config Yes Path to windIO system YAML (wind_energy_system.yaml)
reference_power No* NetCDF with observed/LES power data
reference_resource No* NetCDF with atmospheric profiles
wind_farm_layout No* YAML with turbine positions and specs
output_dir No Output directory (default: wifa_uq_results/)
processed_resource_file No Preprocessed resource filename (default: processed_physical_inputs.nc)
database_file No Database filename (default: results_stacked_hh.nc)

*These paths are automatically inferred from the windIO system config if not specified.

Path Inference

WIFA-UQ follows the windIO !include chain to discover files:

system.yaml
  ├── site: !include energy_site.yaml
  │     └── energy_resource: !include resource.nc  → reference_resource
  ├── wind_farm: !include wind_farm.yaml           → wind_farm_layout
  └── simulation_outputs: !include outputs.yaml
        └── turbine_data: !include power.nc        → reference_power

Explicit Path Override

You can always specify paths explicitly to override inference:

paths:
  system_config: system.yaml
  reference_power: custom/path/to/power.nc      # Overrides inferred path
  reference_resource: custom/path/to/resource.nc
  wind_farm_layout: custom/path/to/layout.yaml
  output_dir: my_results/

preprocessing

Controls data preparation before database generation.

Key Type Default Description
run bool false Whether to run preprocessing
steps list [] Preprocessing steps to apply

Available Steps

Step Description
recalculate_params Calculate derived quantities from vertical profiles
preprocessing:
  run: true
  steps:
    - recalculate_params

See Preprocessing for detailed documentation on each step.


database_gen

Controls the model error database generation via parameter sweeps.

Key Type Default Description
run bool false Whether to generate the database
flow_model string "pywake" Wake model to use ("pywake")
n_samples int 100 Number of parameter samples
param_config dict Parameters to sweep (see below)

param_config Format

Parameters can be specified in two formats:

Short format (range only):

param_config:
  attributes.analysis.wind_deficit_model.wake_expansion_coefficient.k_b: [0.01, 0.07]

Full format (with metadata):

param_config:
  attributes.analysis.wind_deficit_model.wake_expansion_coefficient.k_b:
    range: [0.01, 0.07]    # [min, max] sampling bounds
    default: 0.04          # Default value (first sample uses this)
    short_name: k_b        # Name used in database coordinates

Common Swept Parameters

Parameter Path Short Name Typical Range Description
attributes.analysis.wind_deficit_model.wake_expansion_coefficient.k_b k_b [0.01, 0.07] Wake expansion coefficient
attributes.analysis.wind_deficit_model.ceps ceps [0.15, 0.3] Bastankhah epsilon coefficient
attributes.analysis.blockage_model.ss_alpha ss_alpha [0.75, 1.0] Self-similarity blockage alpha

See Database Generation for complete parameter reference.


error_prediction

Configures the ML-based bias prediction pipeline.

Key Type Default Description
run bool false Whether to run error prediction
features list Feature names for ML model (required)
model string "XGB" ML model type
model_params dict {} Model-specific hyperparameters
calibrator string Calibrator class name (required)
local_regressor string "Ridge" Regressor for local calibration
local_regressor_params dict {} Local regressor hyperparameters
bias_predictor string "BiasPredictor" Predictor class name
cross_validation dict CV configuration (see below)

Available Features

Features from preprocessing:

Feature Units Description
ABL_height m Atmospheric boundary layer height
wind_veer deg/m Wind direction change with height
lapse_rate K/m Potential temperature gradient
turbulence_intensity TI from TKE profile
wind_speed m/s Wind speed at hub height
wind_direction deg Wind direction at hub height

Features from database generation:

Feature Units Description
Blockage_Ratio Fraction of rotor blocked [0-1]
Blocking_Distance Normalized distance to blockers [0-1]
Farm_Length D Farm extent in wind direction
Farm_Width D Farm extent perpendicular to wind

Available Models

Model model value Description
XGBoost "XGB" Gradient boosting (default, uses SHAP)
SIR + Polynomial "SIRPolynomial" Dimension reduction + polynomial
PCE "PCE" Polynomial Chaos Expansion
Linear "Linear" OLS/Ridge/Lasso/ElasticNet

XGBoost parameters:

model: XGB
model_params:
  max_depth: 4
  n_estimators: 200
  learning_rate: 0.1
  random_state: 42

PCE parameters:

model: PCE
model_params:
  degree: 5              # Polynomial degree
  marginals: kernel      # "kernel", "uniform", "normal"
  copula: independent    # "independent" or "normal"
  q: 0.5                 # Hyperbolic truncation parameter
  max_features: 5        # Safety limit on input dimension
  allow_high_dim: false  # Allow > max_features inputs

SIR+Polynomial parameters:

model: SIRPolynomial
model_params:
  n_directions: 1   # Number of SIR directions
  degree: 2         # Polynomial degree

Linear parameters:

model: Linear
model_params:
  method: ridge     # "ols", "ridge", "lasso", "elasticnet"
  alpha: 1.0        # Regularization strength
  l1_ratio: 0.5     # ElasticNet mixing (only for elasticnet)

Available Calibrators

Calibrator Mode Description
MinBiasCalibrator Global Single parameter set minimizing total bias
DefaultParams Global Use default parameter values
LocalParameterPredictor Local ML-predicted params per flow case
BayesianCalibration Global Bayesian inference (requires UMBRA)

For local calibration, specify the regressor:

calibrator: LocalParameterPredictor
local_regressor: Ridge          # Linear, Ridge, Lasso, ElasticNet, RandomForest, XGB
local_regressor_params:
  alpha: 1.0

Cross-Validation Configuration

cross_validation:
  run: true
  splitting_mode: kfold_shuffled    # or LeaveOneGroupOut
  n_splits: 5                       # For KFold only
  metrics:
    - rmse
    - r2
    - mae

For multi-farm LeaveOneGroupOut:

cross_validation:
  splitting_mode: LeaveOneGroupOut
  groups:
    Offshore:
      - Farm1
      - Farm2
    Onshore:
      - Farm3
      - Farm4

Group names must match the name field in your farms list (multi-farm config) or the wind_farm coordinate in your database.


sensitivity_analysis

Controls feature importance and sensitivity analysis.

Key Type Default Description
run_observation_sensitivity bool false Run SA on observations
run_bias_sensitivity bool false Run SA on bias predictions
method string "auto" SA method ("auto", "shap", "sir", "pce_sobol")
pce_config dict {} PCE config for Sobol indices
sensitivity_analysis:
  run_observation_sensitivity: true
  run_bias_sensitivity: true
  method: auto              # Uses SHAP for XGB, SIR for SIRPolynomial
  pce_config:               # Only for method: pce_sobol
    degree: 5
    marginals: kernel
    copula: independent
    q: 0.5
    model_coeff_name: None
    plot_options:
      scatter: False
      distribution: False
      metrics: ["RMSE", "R2", "Wasserstein", "KS", "KL"]

Method selection: - auto: SHAP for tree models, SIR directions for SIR models, Sobol for PCE - shap: Force SHAP TreeExplainer (requires tree model) - sir: Force SIR direction coefficients - pce_sobol: Force PCE-based Sobol indices


Multi-Farm Configuration

For workflows spanning multiple wind farms, use the farms key:

paths:
  output_dir: results/multi_farm/
  database_file: combined_database.nc

farms:
  - name: Farm1                                    # Required: unique identifier
    system_config: data/farm1/wind_energy_system.yaml  # Required
    # Optional explicit paths (otherwise inferred):
    # reference_power: data/farm1/power.nc
    # reference_resource: data/farm1/resource.nc
    # wind_farm_layout: data/farm1/wind_farm.yaml

  - name: Farm2
    system_config: data/farm2/wind_energy_system.yaml

  - name: Farm3
    system_config: data/farm3/wind_energy_system.yaml

preprocessing:
  run: true
  steps: [recalculate_params]

database_gen:
  run: true
  flow_model: pywake
  n_samples: 100
  param_config:
    # Shared across all farms
    attributes.analysis.wind_deficit_model.wake_expansion_coefficient.k_b:
      range: [0.01, 0.07]
      default: 0.04
      short_name: k_b

error_prediction:
  run: true
  features: [ABL_height, wind_veer, lapse_rate]
  model: XGB
  calibrator: MinBiasCalibrator
  bias_predictor: BiasPredictor
  cross_validation:
    splitting_mode: LeaveOneGroupOut
    groups:
      Group1:
        - Farm1
        - Farm2
      Group2:
        - Farm3

Each farm requires: - name: Unique identifier (used in CV grouping) - system_config: Path to windIO system YAML

Other paths are auto-inferred per farm using the same logic as single-farm mode.


Complete Example

Here's a fully-specified configuration showing all available options:

description: "Complete WIFA-UQ configuration example"

paths:
  # Required
  system_config: wind_energy_system/system_pywake.yaml

  # Optional (auto-inferred if omitted)
  reference_power: observed_output/observedPower.nc
  reference_resource: plant_energy_resource/originalData.nc
  wind_farm_layout: plant_wind_farm/wind_farm.yaml

  # Output paths
  output_dir: wifa_uq_results/
  processed_resource_file: processed_physical_inputs.nc
  database_file: results_stacked_hh.nc

preprocessing:
  run: true
  steps:
    - recalculate_params

database_gen:
  run: true
  flow_model: pywake
  n_samples: 100
  param_config:
    attributes.analysis.wind_deficit_model.wake_expansion_coefficient.k_b:
      range: [0.01, 0.07]
      default: 0.04
      short_name: k_b
    attributes.analysis.wind_deficit_model.ceps:
      range: [0.15, 0.3]
      default: 0.2154
      short_name: ceps
    attributes.analysis.blockage_model.ss_alpha:
      range: [0.75, 1.0]
      default: 0.875
      short_name: ss_alpha

error_prediction:
  run: true

  features:
    - ABL_height
    - wind_veer
    - lapse_rate
    - Blockage_Ratio
    - Blocking_Distance
    - Farm_Length
    - Farm_Width

  model: XGB
  model_params:
    max_depth: 4
    n_estimators: 200
    learning_rate: 0.1
    random_state: 42

  calibrator: LocalParameterPredictor
  local_regressor: Ridge
  local_regressor_params:
    alpha: 1.0

  bias_predictor: BiasPredictor

  cross_validation:
    run: true
    splitting_mode: kfold_shuffled
    n_splits: 5
    metrics:
      - rmse
      - r2
      - mae

sensitivity_analysis:
  run_observation_sensitivity: true
  run_bias_sensitivity: true
  method: auto
  pce_config:
    degree: 5
    marginals: kernel
    copula: independent
    q: 0.5
  model_coeff_name: 'k_b'
    plot_options:
      scatter: False
      distribution: False
      metrics: ["RMSE", "R2", "Wasserstein", "KS", "KL"]

Workflow Execution

Configurations are executed via the run.py script:

cd examples
python run.py my_config.yaml

Or programmatically:

from wifa_uq.workflow import run_workflow
from pathlib import Path

cv_results, y_preds, y_tests = run_workflow(Path("my_config.yaml"))

Output Files

After a successful run, the output directory contains:

File Description
processed_physical_inputs.nc Preprocessed atmospheric data
results_stacked_hh.nc Model error database
cv_results.csv Cross-validation metrics per fold
predictions.npz Raw predictions array
correction_results.png Before/after correction scatter plots
bias_prediction_shap.png SHAP beeswarm plot (XGBoost)
bias_prediction_shap_importance.png SHAP bar chart (XGBoost)
bias_prediction_sir_importance.png SIR importance (SIRPolynomial)
pce_sobol_indices.png Sobol indices (PCE)
local_parameter_prediction.png Parameter prediction quality (local calibration)

For multi-farm workflows, additional plots are generated: - cv_fold_metrics.png - cv_fold_heatmap.png - cv_predictions_by_fold.png - cv_generalization_summary.png


See Also