API reference

risk_tools package

risk_tools

Package for bootstrapped wind/load risk calculations.

This package provides functions to:

  • generate bootstrap resampling indices,

  • fit Weibull distributions to wind-speed data,

  • fit lognormal distributions to normalized load data conditioned on wind-speed bins,

  • calculate exceedance risk for operating-envelope boundaries, and

  • split timestamp-indexed data into seasonal subsets.

Modules

bootstrap

Bootstrap index generation.

fv

Weibull fitting for wind-speed data.

fp

Lognormal fitting for load conditioned on wind speed.

risk

Risk calculation from fitted wind-speed and load distributions.

utils

Utility functions for data preparation and formatting.

risk_tools.bootstrap(data: int | Series | DataFrame | ndarray, n_boot: int, random_state: int | None = None) list[ndarray]

Generate bootstrap index arrays for an input dataset.

Parameters:
dataint or pandas.Series or pandas.DataFrame or numpy.ndarray

Input data or number of observations. If an integer is supplied, it is interpreted directly as the number of observations. If a pandas or NumPy object is supplied, the number of observations is inferred using len(data).

n_bootint

Number of bootstrap samples to generate.

random_stateint or None, optional

Seed for the random number generator. If provided, the bootstrap samples are reproducible. The default is None.

Returns:
list of numpy.ndarray

List of bootstrap index arrays. Each array has length equal to the number of observations in the original dataset and contains indices sampled with replacement.

Raises:
ValueError

If the input dataset is empty.

ValueError

If n_boot is negative.

Notes

Bootstrap resampling is performed by drawing indices with replacement from the range 0 to n_obs - 1. The returned indices can then be applied to the relevant pandas objects using .iloc or to NumPy arrays using standard indexing.

Examples

>>> import pandas as pd
>>> from risk_tools.bootstrap import bootstrap
>>> df = pd.DataFrame({"x": [1, 2, 3, 4]})
>>> idx = bootstrap(df, n_boot=2, random_state=42)
>>> len(idx)
2
>>> len(idx[0])
4
risk_tools.coerce_v_char_input(obj: Series | DataFrame) DataFrame

Normalize wind-speed input to a one-column DataFrame named "v_char".

Parameters:
objpandas.Series or pandas.DataFrame

Wind-speed data loaded from file. It may already be a one-column DataFrame or it may be a Series without a column name.

Returns:
pandas.DataFrame

One-column DataFrame with the column name "v_char".

Raises:
TypeError

If the input is neither a pandas Series nor a one-column DataFrame.

Examples

>>> import pandas as pd
>>> from risk_tools.utils import coerce_v_char_input
>>> s = pd.Series([1.0, 2.0, 3.0])
>>> df = coerce_v_char_input(s)
>>> list(df.columns)
['v_char']
risk_tools.format_mean_pm_std(mean_val: float, std_val: float, decimals: int = 6) str

Format a mean and standard deviation as mean ± std.

Parameters:
mean_valfloat

Mean value.

std_valfloat

Standard deviation.

decimalsint, optional

Number of decimal places to display. The default is 6.

Returns:
str

Formatted string of the form "0.123456 ± 0.012345". If either value is missing, returns "NaN".

risk_tools.fp_params(df_fp: DataFrame, n_boot: int, random_state: int | None = None, bin_width: float = 1.0, bin_min: float = 0.0, bin_max: float = 32.0) DataFrame

Fit lognormal distributions to load data within wind-speed bins.

Parameters:
df_fppandas.DataFrame

DataFrame containing columns "p_load" and "v_char". p_load is the normalized load variable and v_char is the associated wind speed.

n_bootint

Number of bootstrap samples to generate and fit.

random_stateint or None, optional

Seed for reproducible bootstrap resampling. The default is None.

bin_widthfloat, optional

Width of the wind-speed bins. The default is 1.0.

bin_minfloat, optional

Lower bound of the wind-speed binning range. The default is 0.0.

bin_maxfloat, optional

Upper bound of the wind-speed binning range. The default is 32.0.

Returns:
pandas.DataFrame

DataFrame indexed by ["sample", "bin"] with columns:

  • shape : fitted lognormal shape parameter

  • scale : fitted lognormal scale parameter

  • n : number of valid observations used in the fit

The sample level contains "original" plus the bootstrap samples "boot_0", "boot_1", and so on.

Raises:
ValueError

If the input DataFrame does not contain both "p_load" and "v_char".

ValueError

If bin_width is not positive.

ValueError

If bin_max is not greater than bin_min.

Notes

The lognormal fit is performed with location fixed at zero by passing floc=0 to scipy.stats.lognorm.fit.

Only finite positive p_load values are retained before fitting.

risk_tools.fv_params(df_v: DataFrame, n_boot: int, random_state: int | None = None) DataFrame

Fit Weibull distributions to original and bootstrapped wind-speed data.

Parameters:
df_vpandas.DataFrame

DataFrame containing a column named "v_char". The values are assumed to represent wind speed. The index is typically a timestamp, but only the data column is used in the fit.

n_bootint

Number of bootstrap samples to generate and fit.

random_stateint or None, optional

Seed for reproducible bootstrap resampling. The default is None.

Returns:
pandas.DataFrame

DataFrame indexed by sample name. The first row corresponds to the original dataset and is labeled "original". Subsequent rows correspond to bootstrap samples labeled "boot_0", "boot_1", and so on. Columns are:

  • shape : fitted Weibull shape parameter

  • scale : fitted Weibull scale parameter

Raises:
ValueError

If the input DataFrame does not contain a column named "v_char".

ValueError

If the wind-speed data contain no finite positive values.

Notes

The fit is a two-parameter Weibull fit with location fixed at zero by passing floc=0 to scipy.stats.weibull_min.fit.

Only finite positive values are retained before fitting.

Internal profiling is performed with aggregated timing categories so that the main cost centers can be identified without printing one line per bootstrap iteration.

Examples

>>> import pandas as pd
>>> from risk_tools.fv import fv_params
>>> df = pd.DataFrame({"v_char": [5.0, 6.0, 7.0, 8.0]})
>>> out = fv_params(df, n_boot=2, random_state=1)
>>> "original" in out.index
True
risk_tools.risk_calculation(df_fp: DataFrame, df_v: DataFrame, df_plimits: DataFrame, n_boot: int, random_state: int | None = None, overall_v_lower: float = 0.0, overall_v_upper: float = 32.0, low_v_lower: float = 0.0, low_v_upper: float = 15.0, high_v_lower: float = 15.0, high_v_upper: float = 32.0, p_upper: float = 1.0, bin_width: float = 1.0, fp_bin_min: float = 0.0, fp_bin_max: float = 32.0) dict[str, DataFrame]

Calculate original and bootstrap risk metrics.

Parameters:
df_fppandas.DataFrame

DataFrame containing columns "p_load" and "v_char" for the paired load-wind dataset.

df_vpandas.DataFrame

DataFrame containing a column named "v_char" for the wind time series used to fit the wind-speed distribution.

df_plimitspandas.DataFrame

DataFrame indexed by wind-speed bin with columns "danger" and "limit". These columns define the lower integration limits of the risk calculation.

n_bootint

Number of bootstrap samples to generate.

random_stateint or None, optional

Seed for reproducible bootstrap resampling. The default is None.

overall_v_lower, overall_v_upperfloat, optional

Wind-speed bounds for the overall risk calculation.

low_v_lower, low_v_upperfloat, optional

Wind-speed bounds for the low-wind regime.

high_v_lower, high_v_upperfloat, optional

Wind-speed bounds for the high-wind regime.

p_upperfloat, optional

Upper integration limit in normalized load space. The default is 1.0.

bin_widthfloat, optional

Width of the wind-speed bins. The default is 1.0.

fp_bin_min, fp_bin_maxfloat, optional

Wind-speed binning range used when fitting the lognormal conditional load distributions.

Returns:
dict of pandas.DataFrame

Dictionary containing the following DataFrames:

  • original_overall : overall risk for original data

  • original_by_regime : low- and high-wind risks for original data

  • bootstrap_overall : overall bootstrap risks

  • bootstrap_by_regime : low- and high-wind bootstrap risks

  • stats_overall : mean and standard deviation of overall bootstrap risks

  • stats_by_regime : mean and standard deviation of regime bootstrap risks

Raises:
ValueError

If required input columns are missing.

Notes

The risk calculation approximates

\[\int_{v_{lower}}^{v_{upper}} f_v(v) \int_{p_{lower}(v)}^{p_{upper}} f_p(p \mid v) \, dp \, dv\]

using integer wind-speed bins and fitted distribution functions.

For each wind-speed bin, the contribution is the product of:

  • the Weibull probability mass of the wind-speed bin, and

  • the lognormal probability mass between the boundary and p_upper.

Bins whose lower integration limit is greater than or equal to p_upper contribute zero risk.

risk_tools.split_by_season(df: DataFrame) dict[str, DataFrame]

Split a timestamp-indexed DataFrame into seasonal subsets.

Parameters:
dfpandas.DataFrame

Input DataFrame with a DatetimeIndex.

Returns:
dict of pandas.DataFrame

Dictionary with keys "spring", "summer", "fall", and "winter". Each value is the subset of the input DataFrame corresponding to that season.

Raises:
TypeError

If the input DataFrame does not have a DatetimeIndex.

Notes

Seasons are defined by calendar month as follows:

  • winter: December, January, February

  • spring: March, April, May

  • summer: June, July, August

  • fall: September, October, November

risk_tools.timed(func: Callable) Callable

Decorate a function so that its total execution time is printed.

Parameters:
funccallable

Function to decorate.

Returns:
callable

Wrapped function that prints elapsed runtime when profiling is enabled.

Notes

This decorator is best suited to top-level functions whose total runtime is of interest. For finer-grained timing inside a function, use timed_block or ProfileAccumulator.

Examples

>>> @timed
... def add(a, b):
...     return a + b

bootstrap module

Bootstrap tools for resampling indexed data.

This module contains the bootstrap function used throughout the risk-calculation workflow. The function generates bootstrap index arrays that can be applied to pandas objects or NumPy arrays without copying the full datasets in advance.

The design keeps the resampling step separate from the fitting steps so that bootstrap logic can be tested independently and reused across the different parts of the workflow.

risk_tools.bootstrap.bootstrap(data: int | Series | DataFrame | ndarray, n_boot: int, random_state: int | None = None) list[ndarray]

Generate bootstrap index arrays for an input dataset.

Parameters:
dataint or pandas.Series or pandas.DataFrame or numpy.ndarray

Input data or number of observations. If an integer is supplied, it is interpreted directly as the number of observations. If a pandas or NumPy object is supplied, the number of observations is inferred using len(data).

n_bootint

Number of bootstrap samples to generate.

random_stateint or None, optional

Seed for the random number generator. If provided, the bootstrap samples are reproducible. The default is None.

Returns:
list of numpy.ndarray

List of bootstrap index arrays. Each array has length equal to the number of observations in the original dataset and contains indices sampled with replacement.

Raises:
ValueError

If the input dataset is empty.

ValueError

If n_boot is negative.

Notes

Bootstrap resampling is performed by drawing indices with replacement from the range 0 to n_obs - 1. The returned indices can then be applied to the relevant pandas objects using .iloc or to NumPy arrays using standard indexing.

Examples

>>> import pandas as pd
>>> from risk_tools.bootstrap import bootstrap
>>> df = pd.DataFrame({"x": [1, 2, 3, 4]})
>>> idx = bootstrap(df, n_boot=2, random_state=42)
>>> len(idx)
2
>>> len(idx[0])
4

fv module

Wind-speed distribution fitting.

This module contains functions for fitting Weibull distributions to wind-speed time series. The main public function, fv_params, fits a two-parameter Weibull distribution to the original dataset and to a set of bootstrap resamples.

The output is intended for later use in the risk calculation, where the fitted Weibull distribution describes the probability of wind speed falling within each integer wind-speed bin.

This module also includes internal profiling hooks so that the runtime of the major steps in the fitting workflow can be inspected. Timing is aggregated across repeated operations to produce a concise summary rather than one line of output per bootstrap iteration.

risk_tools.fv.fv_params(df_v: DataFrame, n_boot: int, random_state: int | None = None) DataFrame

Fit Weibull distributions to original and bootstrapped wind-speed data.

Parameters:
df_vpandas.DataFrame

DataFrame containing a column named "v_char". The values are assumed to represent wind speed. The index is typically a timestamp, but only the data column is used in the fit.

n_bootint

Number of bootstrap samples to generate and fit.

random_stateint or None, optional

Seed for reproducible bootstrap resampling. The default is None.

Returns:
pandas.DataFrame

DataFrame indexed by sample name. The first row corresponds to the original dataset and is labeled "original". Subsequent rows correspond to bootstrap samples labeled "boot_0", "boot_1", and so on. Columns are:

  • shape : fitted Weibull shape parameter

  • scale : fitted Weibull scale parameter

Raises:
ValueError

If the input DataFrame does not contain a column named "v_char".

ValueError

If the wind-speed data contain no finite positive values.

Notes

The fit is a two-parameter Weibull fit with location fixed at zero by passing floc=0 to scipy.stats.weibull_min.fit.

Only finite positive values are retained before fitting.

Internal profiling is performed with aggregated timing categories so that the main cost centers can be identified without printing one line per bootstrap iteration.

Examples

>>> import pandas as pd
>>> from risk_tools.fv import fv_params
>>> df = pd.DataFrame({"v_char": [5.0, 6.0, 7.0, 8.0]})
>>> out = fv_params(df, n_boot=2, random_state=1)
>>> "original" in out.index
True

fp module

Conditional load-distribution fitting.

This module contains functions for fitting lognormal distributions to normalized load data conditioned on wind-speed bins. The main public function, fp_params, bins the paired p_load and v_char data into integer wind-speed bins and fits a lognormal distribution to the load values in each bin.

The output includes the original dataset and a set of bootstrap resamples. The fitted parameters are intended for later use in the risk integration.

risk_tools.fp.fp_params(df_fp: DataFrame, n_boot: int, random_state: int | None = None, bin_width: float = 1.0, bin_min: float = 0.0, bin_max: float = 32.0) DataFrame

Fit lognormal distributions to load data within wind-speed bins.

Parameters:
df_fppandas.DataFrame

DataFrame containing columns "p_load" and "v_char". p_load is the normalized load variable and v_char is the associated wind speed.

n_bootint

Number of bootstrap samples to generate and fit.

random_stateint or None, optional

Seed for reproducible bootstrap resampling. The default is None.

bin_widthfloat, optional

Width of the wind-speed bins. The default is 1.0.

bin_minfloat, optional

Lower bound of the wind-speed binning range. The default is 0.0.

bin_maxfloat, optional

Upper bound of the wind-speed binning range. The default is 32.0.

Returns:
pandas.DataFrame

DataFrame indexed by ["sample", "bin"] with columns:

  • shape : fitted lognormal shape parameter

  • scale : fitted lognormal scale parameter

  • n : number of valid observations used in the fit

The sample level contains "original" plus the bootstrap samples "boot_0", "boot_1", and so on.

Raises:
ValueError

If the input DataFrame does not contain both "p_load" and "v_char".

ValueError

If bin_width is not positive.

ValueError

If bin_max is not greater than bin_min.

Notes

The lognormal fit is performed with location fixed at zero by passing floc=0 to scipy.stats.lognorm.fit.

Only finite positive p_load values are retained before fitting.

risk module

Risk integration functions.

This module contains the main risk-calculation logic. It combines Weibull fits for wind speed and lognormal fits for load conditioned on wind speed to approximate exceedance risk above externally defined operating-envelope boundaries.

The module calculates:

  • overall risk,

  • low-wind risk,

  • high-wind risk,

  • bootstrap realizations of each, and

  • summary statistics of the bootstrap results.

risk_tools.risk.risk_calculation(df_fp: DataFrame, df_v: DataFrame, df_plimits: DataFrame, n_boot: int, random_state: int | None = None, overall_v_lower: float = 0.0, overall_v_upper: float = 32.0, low_v_lower: float = 0.0, low_v_upper: float = 15.0, high_v_lower: float = 15.0, high_v_upper: float = 32.0, p_upper: float = 1.0, bin_width: float = 1.0, fp_bin_min: float = 0.0, fp_bin_max: float = 32.0) dict[str, DataFrame]

Calculate original and bootstrap risk metrics.

Parameters:
df_fppandas.DataFrame

DataFrame containing columns "p_load" and "v_char" for the paired load-wind dataset.

df_vpandas.DataFrame

DataFrame containing a column named "v_char" for the wind time series used to fit the wind-speed distribution.

df_plimitspandas.DataFrame

DataFrame indexed by wind-speed bin with columns "danger" and "limit". These columns define the lower integration limits of the risk calculation.

n_bootint

Number of bootstrap samples to generate.

random_stateint or None, optional

Seed for reproducible bootstrap resampling. The default is None.

overall_v_lower, overall_v_upperfloat, optional

Wind-speed bounds for the overall risk calculation.

low_v_lower, low_v_upperfloat, optional

Wind-speed bounds for the low-wind regime.

high_v_lower, high_v_upperfloat, optional

Wind-speed bounds for the high-wind regime.

p_upperfloat, optional

Upper integration limit in normalized load space. The default is 1.0.

bin_widthfloat, optional

Width of the wind-speed bins. The default is 1.0.

fp_bin_min, fp_bin_maxfloat, optional

Wind-speed binning range used when fitting the lognormal conditional load distributions.

Returns:
dict of pandas.DataFrame

Dictionary containing the following DataFrames:

  • original_overall : overall risk for original data

  • original_by_regime : low- and high-wind risks for original data

  • bootstrap_overall : overall bootstrap risks

  • bootstrap_by_regime : low- and high-wind bootstrap risks

  • stats_overall : mean and standard deviation of overall bootstrap risks

  • stats_by_regime : mean and standard deviation of regime bootstrap risks

Raises:
ValueError

If required input columns are missing.

Notes

The risk calculation approximates

\[\int_{v_{lower}}^{v_{upper}} f_v(v) \int_{p_{lower}(v)}^{p_{upper}} f_p(p \mid v) \, dp \, dv\]

using integer wind-speed bins and fitted distribution functions.

For each wind-speed bin, the contribution is the product of:

  • the Weibull probability mass of the wind-speed bin, and

  • the lognormal probability mass between the boundary and p_upper.

Bins whose lower integration limit is greater than or equal to p_upper contribute zero risk.

utils module

Utility functions for the risk_tools package.

This module contains helper functions for:

  • normalizing wind-speed input loaded from pickle files,

  • splitting timestamp-indexed data into seasonal subsets, and

  • formatting summary statistics for display.

risk_tools.utils.coerce_v_char_input(obj: Series | DataFrame) DataFrame

Normalize wind-speed input to a one-column DataFrame named "v_char".

Parameters:
objpandas.Series or pandas.DataFrame

Wind-speed data loaded from file. It may already be a one-column DataFrame or it may be a Series without a column name.

Returns:
pandas.DataFrame

One-column DataFrame with the column name "v_char".

Raises:
TypeError

If the input is neither a pandas Series nor a one-column DataFrame.

Examples

>>> import pandas as pd
>>> from risk_tools.utils import coerce_v_char_input
>>> s = pd.Series([1.0, 2.0, 3.0])
>>> df = coerce_v_char_input(s)
>>> list(df.columns)
['v_char']
risk_tools.utils.format_mean_pm_std(mean_val: float, std_val: float, decimals: int = 6) str

Format a mean and standard deviation as mean ± std.

Parameters:
mean_valfloat

Mean value.

std_valfloat

Standard deviation.

decimalsint, optional

Number of decimal places to display. The default is 6.

Returns:
str

Formatted string of the form "0.123456 ± 0.012345". If either value is missing, returns "NaN".

risk_tools.utils.split_by_season(df: DataFrame) dict[str, DataFrame]

Split a timestamp-indexed DataFrame into seasonal subsets.

Parameters:
dfpandas.DataFrame

Input DataFrame with a DatetimeIndex.

Returns:
dict of pandas.DataFrame

Dictionary with keys "spring", "summer", "fall", and "winter". Each value is the subset of the input DataFrame corresponding to that season.

Raises:
TypeError

If the input DataFrame does not have a DatetimeIndex.

Notes

Seasons are defined by calendar month as follows:

  • winter: December, January, February

  • spring: March, April, May

  • summer: June, July, August

  • fall: September, October, November