Quickstart
This guide will get you running your first WIFA-UQ workflow in about 5 minutes. By the end, you'll have:
- Run a complete calibration and bias prediction pipeline
- Generated cross-validation metrics
- Produced sensitivity analysis plots
Prerequisites
Make sure you've completed the installation steps. You should be able to run:
python -c "import wifa_uq; print('Ready!')"
The Workflow Runner
WIFA-UQ uses a simple command-line interface. The entry point is examples/run.py, which takes a YAML configuration file as input:
python examples/run.py <path-to-config.yaml>
Step 1: Explore the Example Data
The examples/ directory contains everything you need:
examples/
├── run.py # Workflow entry point
├── kul_les_example.yaml # PCE-based workflow config
├── kul_single_farm_xgb_example.yaml # XGBoost workflow config
├── edf_single_farm_example.yaml # EDF dataset config
├── multi_farm_example.yaml # Multi-farm config
└── data/
├── KUL_LES/ # KU Leuven LES dataset
├── EDF_datasets/ # EDF LES datasets
└── ...
Step 2: Run Your First Workflow
Let's start with the KUL LES dataset using PCE-based bias prediction:
cd examples
python run.py kul_les_example.yaml
You'll see output like:
--- Starting WIFA-UQ Workflow ---
Using config file: kul_les_example.yaml
Preprocessor initialized for originalData.nc.
Applying steps: ['recalculate_params']
Running 'recalculate_params'...
Calculating wind veer...
Running 'ci_fitting' for thermal parameters...
Preprocessing complete.
--- Running Database Generation ---
Case: KUL_LES, 9 turbines, Rated Power: 3.6 MW, Hub Height: 90.0 m
Parameter sweep complete. Processing physical inputs...
...
--- Cross-Validation Results (mean) ---
rmse 0.023456
r2 0.876543
mae 0.018234
--- Workflow complete ---
Step 3: Examine the Results
After the workflow completes, check the output directory:
ls data/KUL_LES/wifa_uq_results/
You'll find:
| File | Description |
|---|---|
processed_physical_inputs.nc |
Preprocessed atmospheric data |
results_stacked_hh.nc |
Model error database |
cv_results.csv |
Cross-validation metrics per fold |
predictions.npz |
Raw predictions for analysis |
correction_results.png |
Scatter plots of model correction |
bias_prediction_*.png |
Sensitivity analysis plots |
Step 4: View the Results
Open correction_results.png to see three panels:
- ML Model Performance — Predicted vs true bias (should cluster around 1:1 line)
- Uncorrected Model — Raw PyWake power vs reference
- Corrected Model — PyWake power after bias correction (should be tighter)
The sensitivity plots show which features most influence the bias:
bias_prediction_shap.png— SHAP beeswarm plot (for XGBoost)bias_prediction_sir_importance.png— SIR direction coefficientspce_sobol_indices.png— PCE-based Sobol indices
Understanding the Config File
Let's look at the key sections of kul_les_example.yaml:
# Input/output paths (relative to the YAML file)
paths:
system_config: data/KUL_LES/wind_energy_system/system_pywake.yaml
reference_power: data/KUL_LES/observed_output/observedPowerKUL.nc
reference_resource: data/KUL_LES/plant_energy_resource/originalData.nc
wind_farm_layout: data/KUL_LES/plant_wind_farm/FLOW_UQ_vnv_toy_study_wind_farm.yaml
output_dir: data/KUL_LES/wifa_uq_results
# Preprocessing: recalculate derived quantities
preprocessing:
run: true
steps: [recalculate_params]
# Database generation: sweep uncertain parameters
database_gen:
run: true
flow_model: pywake
n_samples: 100
param_config:
attributes.analysis.wind_deficit_model.wake_expansion_coefficient.k_b:
range: [0.01, 0.07]
default: 0.04
short_name: "k_b"
# Error prediction: ML model and cross-validation
error_prediction:
run: true
features: [ABL_height, wind_veer, lapse_rate]
model: "PCE"
calibrator: LocalParameterPredictor
cross_validation:
splitting_mode: kfold_shuffled
n_splits: 6
Try Different Configurations
XGBoost Instead of PCE
Run the XGBoost-based workflow:
python run.py kul_single_farm_xgb_example.yaml
This uses: - XGBoost gradient boosting for bias prediction - SHAP values for sensitivity analysis - More physical features (blockage metrics, farm geometry)
EDF Datasets
Try a different wind farm dataset:
python run.py edf_single_farm_example.yaml
The EDF datasets include various virtual and real wind farm configurations.
Common Workflow Patterns
Skip Preprocessing (Use Existing Data)
If you've already preprocessed the data:
preprocessing:
run: false
Skip Database Generation (Use Existing Database)
If you've already generated the error database:
database_gen:
run: false
Change the Number of Parameter Samples
More samples = better coverage but slower:
database_gen:
n_samples: 200 # Default is 100
Change Cross-Validation Strategy
For leave-one-group-out CV (useful with multiple farms):
cross_validation:
splitting_mode: LeaveOneGroupOut
groups:
Group1: [Farm1, Farm2]
Group2: [Farm3, Farm4]
What's Happening Under the Hood?
- Preprocessing (
PreprocessingInputs) - Loads raw atmospheric data
-
Calculates derived quantities: ABL height, wind veer, lapse rate, TI
-
Database Generation (
DatabaseGenerator) - Samples uncertain wake model parameters (k_b, ss_alpha, etc.)
- Runs PyWake for each sample
- Computes bias = (model - reference) / rated_power
-
Adds layout features (blockage ratio, farm dimensions)
-
Calibration (
MinBiasCalibratororLocalParameterPredictor) - Global: Find single best parameter set
-
Local: Predict optimal parameters as f(atmospheric state)
-
Bias Prediction (
BiasPredictorwith XGB/SIR/PCE) - Train ML model: bias = f(features)
-
Cross-validate to assess generalization
-
Sensitivity Analysis
- SHAP/SIR/Sobol to identify important features
Next Steps
- Project Structure — Understand the codebase organization
- Configuration Reference — All YAML options explained
- Tutorials — Step-by-step guides for specific use cases
Troubleshooting
"FileNotFoundError: System YAML not found"
Check that paths in your config are relative to the YAML file location, not the current directory.
"ValueError: Could not find or infer 'rated_power'"
Your wind farm definition is missing turbine power information. Add performance.rated_power to your turbine YAML. See Metadata Note.
Workflow is Very Slow
- Reduce
n_samplesindatabase_gen - Ensure you're not re-running preprocessing/database_gen unnecessarily
- Check that numba JIT compilation is working (first run is slower)