API reference¶

risk_tools package¶

risk_tools¶

Package for bootstrapped wind/load risk calculations.

This package provides functions to:

generate bootstrap resampling indices,
fit Weibull distributions to wind-speed data,
fit lognormal distributions to normalized load data conditioned on wind-speed bins,
calculate exceedance risk for operating-envelope boundaries, and
split timestamp-indexed data into seasonal subsets.

Modules¶

bootstrap: Bootstrap index generation.
fv: Weibull fitting for wind-speed data.
fp: Lognormal fitting for load conditioned on wind speed.
risk: Risk calculation from fitted wind-speed and load distributions.
utils: Utility functions for data preparation and formatting.

risk_tools.bootstrap(data: int | Series | DataFrame | ndarray, n_boot: int, random_state: int | None = None) → list[ndarray]¶

Generate bootstrap index arrays for an input dataset.

Parameters:

dataint or pandas.Series or pandas.DataFrame or numpy.ndarray: Input data or number of observations. If an integer is supplied, it is interpreted directly as the number of observations. If a pandas or NumPy object is supplied, the number of observations is inferred using len(data).
n_bootint: Number of bootstrap samples to generate.
random_stateint or None, optional: Seed for the random number generator. If provided, the bootstrap samples are reproducible. The default is None.

Returns:

list of numpy.ndarray: List of bootstrap index arrays. Each array has length equal to the number of observations in the original dataset and contains indices sampled with replacement.

Raises:

ValueError: If the input dataset is empty.
ValueError: If n_boot is negative.

Notes

Bootstrap resampling is performed by drawing indices with replacement from the range 0 to n_obs - 1. The returned indices can then be applied to the relevant pandas objects using .iloc or to NumPy arrays using standard indexing.

Examples

>>> import pandas as pd
>>> from risk_tools.bootstrap import bootstrap
>>> df = pd.DataFrame({"x": [1, 2, 3, 4]})
>>> idx = bootstrap(df, n_boot=2, random_state=42)
>>> len(idx)
2
>>> len(idx[0])
4

risk_tools.coerce_v_char_input(obj: Series | DataFrame) → DataFrame¶

Normalize wind-speed input to a one-column DataFrame named "v_char".

Parameters:

objpandas.Series or pandas.DataFrame: Wind-speed data loaded from file. It may already be a one-column DataFrame or it may be a Series without a column name.

Returns:

pandas.DataFrame: One-column DataFrame with the column name "v_char".

Raises:

TypeError: If the input is neither a pandas Series nor a one-column DataFrame.

Examples

>>> import pandas as pd
>>> from risk_tools.utils import coerce_v_char_input
>>> s = pd.Series([1.0, 2.0, 3.0])
>>> df = coerce_v_char_input(s)
>>> list(df.columns)
['v_char']

risk_tools.format_mean_pm_std(mean_val: float, std_val: float, decimals: int = 6) → str¶

Format a mean and standard deviation as mean ± std.

Parameters:

mean_valfloat: Mean value.
std_valfloat: Standard deviation.
decimalsint, optional: Number of decimal places to display. The default is 6.

Returns:

str: Formatted string of the form "0.123456 ± 0.012345". If either value is missing, returns "NaN".

risk_tools.fp_params(df_fp: DataFrame, n_boot: int, random_state: int | None = None, bin_width: float = 1.0, bin_min: float = 0.0, bin_max: float = 32.0) → DataFrame¶

Fit lognormal distributions to load data within wind-speed bins.

Parameters:

df_fppandas.DataFrame: DataFrame containing columns "p_load" and "v_char". p_load is the normalized load variable and v_char is the associated wind speed.
n_bootint: Number of bootstrap samples to generate and fit.
random_stateint or None, optional: Seed for reproducible bootstrap resampling. The default is None.
bin_widthfloat, optional: Width of the wind-speed bins. The default is 1.0.
bin_minfloat, optional: Lower bound of the wind-speed binning range. The default is 0.0.
bin_maxfloat, optional: Upper bound of the wind-speed binning range. The default is 32.0.

Returns:

pandas.DataFrame

DataFrame indexed by ["sample", "bin"] with columns:

shape : fitted lognormal shape parameter
scale : fitted lognormal scale parameter
n : number of valid observations used in the fit

The sample level contains "original" plus the bootstrap samples "boot_0", "boot_1", and so on.

Raises:

ValueError: If the input DataFrame does not contain both "p_load" and "v_char".
ValueError: If bin_width is not positive.
ValueError: If bin_max is not greater than bin_min.

Notes

The lognormal fit is performed with location fixed at zero by passing floc=0 to scipy.stats.lognorm.fit.

Only finite positive p_load values are retained before fitting.

risk_tools.fv_params(df_v: DataFrame, n_boot: int, random_state: int | None = None) → DataFrame¶

Fit Weibull distributions to original and bootstrapped wind-speed data.

Parameters:

df_vpandas.DataFrame: DataFrame containing a column named "v_char". The values are assumed to represent wind speed. The index is typically a timestamp, but only the data column is used in the fit.
n_bootint: Number of bootstrap samples to generate and fit.
random_stateint or None, optional: Seed for reproducible bootstrap resampling. The default is None.

Returns:

pandas.DataFrame

DataFrame indexed by sample name. The first row corresponds to the original dataset and is labeled "original". Subsequent rows correspond to bootstrap samples labeled "boot_0", "boot_1", and so on. Columns are:

shape : fitted Weibull shape parameter
scale : fitted Weibull scale parameter

Raises:

ValueError: If the input DataFrame does not contain a column named "v_char".
ValueError: If the wind-speed data contain no finite positive values.

Notes

The fit is a two-parameter Weibull fit with location fixed at zero by passing floc=0 to scipy.stats.weibull_min.fit.

Only finite positive values are retained before fitting.

Internal profiling is performed with aggregated timing categories so that the main cost centers can be identified without printing one line per bootstrap iteration.

Examples

>>> import pandas as pd
>>> from risk_tools.fv import fv_params
>>> df = pd.DataFrame({"v_char": [5.0, 6.0, 7.0, 8.0]})
>>> out = fv_params(df, n_boot=2, random_state=1)
>>> "original" in out.index
True

risk_tools.risk_calculation(df_fp: DataFrame, df_v: DataFrame, df_plimits: DataFrame, n_boot: int, random_state: int | None = None, overall_v_lower: float = 0.0, overall_v_upper: float = 32.0, low_v_lower: float = 0.0, low_v_upper: float = 15.0, high_v_lower: float = 15.0, high_v_upper: float = 32.0, p_upper: float = 1.0, bin_width: float = 1.0, fp_bin_min: float = 0.0, fp_bin_max: float = 32.0) → dict[str, DataFrame]¶

Calculate original and bootstrap risk metrics.

Parameters:

df_fppandas.DataFrame: DataFrame containing columns "p_load" and "v_char" for the paired load-wind dataset.
df_vpandas.DataFrame: DataFrame containing a column named "v_char" for the wind time series used to fit the wind-speed distribution.
df_plimitspandas.DataFrame: DataFrame indexed by wind-speed bin with columns "danger" and "limit". These columns define the lower integration limits of the risk calculation.
n_bootint: Number of bootstrap samples to generate.
random_stateint or None, optional: Seed for reproducible bootstrap resampling. The default is None.
overall_v_lower, overall_v_upperfloat, optional: Wind-speed bounds for the overall risk calculation.
low_v_lower, low_v_upperfloat, optional: Wind-speed bounds for the low-wind regime.
high_v_lower, high_v_upperfloat, optional: Wind-speed bounds for the high-wind regime.
p_upperfloat, optional: Upper integration limit in normalized load space. The default is 1.0.
bin_widthfloat, optional: Width of the wind-speed bins. The default is 1.0.
fp_bin_min, fp_bin_maxfloat, optional: Wind-speed binning range used when fitting the lognormal conditional load distributions.

Returns:

dict of pandas.DataFrame

Dictionary containing the following DataFrames:

original_overall : overall risk for original data
original_by_regime : low- and high-wind risks for original data
bootstrap_overall : overall bootstrap risks
bootstrap_by_regime : low- and high-wind bootstrap risks
stats_overall : mean and standard deviation of overall bootstrap risks
stats_by_regime : mean and standard deviation of regime bootstrap risks

Raises:

ValueError: If required input columns are missing.

Notes

The risk calculation approximates

\[\int_{v_{lower}}^{v_{upper}} f_v(v) \int_{p_{lower}(v)}^{p_{upper}} f_p(p \mid v) \, dp \, dv\]

using integer wind-speed bins and fitted distribution functions.

For each wind-speed bin, the contribution is the product of:

the Weibull probability mass of the wind-speed bin, and
the lognormal probability mass between the boundary and p_upper.

Bins whose lower integration limit is greater than or equal to p_upper contribute zero risk.

risk_tools.split_by_season(df: DataFrame) → dict[str, DataFrame]¶

Split a timestamp-indexed DataFrame into seasonal subsets.

Parameters:

dfpandas.DataFrame: Input DataFrame with a DatetimeIndex.

Returns:

dict of pandas.DataFrame: Dictionary with keys "spring", "summer", "fall", and "winter". Each value is the subset of the input DataFrame corresponding to that season.

Raises:

TypeError: If the input DataFrame does not have a DatetimeIndex.

Notes

Seasons are defined by calendar month as follows:

winter: December, January, February
spring: March, April, May
summer: June, July, August
fall: September, October, November

risk_tools.timed(func: Callable) → Callable¶

Decorate a function so that its total execution time is printed.

Parameters:

funccallable: Function to decorate.

Returns:

callable: Wrapped function that prints elapsed runtime when profiling is enabled.

Notes

This decorator is best suited to top-level functions whose total runtime is of interest. For finer-grained timing inside a function, use timed_block or ProfileAccumulator.

Examples

>>> @timed
... def add(a, b):
...     return a + b

bootstrap module¶

Bootstrap tools for resampling indexed data.

This module contains the bootstrap function used throughout the risk-calculation workflow. The function generates bootstrap index arrays that can be applied to pandas objects or NumPy arrays without copying the full datasets in advance.

The design keeps the resampling step separate from the fitting steps so that bootstrap logic can be tested independently and reused across the different parts of the workflow.

risk_tools.bootstrap.bootstrap(data: int | Series | DataFrame | ndarray, n_boot: int, random_state: int | None = None) → list[ndarray]¶

Generate bootstrap index arrays for an input dataset.

Parameters:

dataint or pandas.Series or pandas.DataFrame or numpy.ndarray: Input data or number of observations. If an integer is supplied, it is interpreted directly as the number of observations. If a pandas or NumPy object is supplied, the number of observations is inferred using len(data).
n_bootint: Number of bootstrap samples to generate.
random_stateint or None, optional: Seed for the random number generator. If provided, the bootstrap samples are reproducible. The default is None.

Returns:

list of numpy.ndarray: List of bootstrap index arrays. Each array has length equal to the number of observations in the original dataset and contains indices sampled with replacement.

Raises:

ValueError: If the input dataset is empty.
ValueError: If n_boot is negative.

Notes

Bootstrap resampling is performed by drawing indices with replacement from the range 0 to n_obs - 1. The returned indices can then be applied to the relevant pandas objects using .iloc or to NumPy arrays using standard indexing.

Examples

>>> import pandas as pd
>>> from risk_tools.bootstrap import bootstrap
>>> df = pd.DataFrame({"x": [1, 2, 3, 4]})
>>> idx = bootstrap(df, n_boot=2, random_state=42)
>>> len(idx)
2
>>> len(idx[0])
4

fv module¶

Wind-speed distribution fitting.

This module contains functions for fitting Weibull distributions to wind-speed time series. The main public function, fv_params, fits a two-parameter Weibull distribution to the original dataset and to a set of bootstrap resamples.

The output is intended for later use in the risk calculation, where the fitted Weibull distribution describes the probability of wind speed falling within each integer wind-speed bin.

This module also includes internal profiling hooks so that the runtime of the major steps in the fitting workflow can be inspected. Timing is aggregated across repeated operations to produce a concise summary rather than one line of output per bootstrap iteration.

risk_tools.fv.fv_params(df_v: DataFrame, n_boot: int, random_state: int | None = None) → DataFrame¶

Fit Weibull distributions to original and bootstrapped wind-speed data.

Parameters:

df_vpandas.DataFrame: DataFrame containing a column named "v_char". The values are assumed to represent wind speed. The index is typically a timestamp, but only the data column is used in the fit.
n_bootint: Number of bootstrap samples to generate and fit.
random_stateint or None, optional: Seed for reproducible bootstrap resampling. The default is None.

Returns:

pandas.DataFrame

DataFrame indexed by sample name. The first row corresponds to the original dataset and is labeled "original". Subsequent rows correspond to bootstrap samples labeled "boot_0", "boot_1", and so on. Columns are:

shape : fitted Weibull shape parameter
scale : fitted Weibull scale parameter

Raises:

ValueError: If the input DataFrame does not contain a column named "v_char".
ValueError: If the wind-speed data contain no finite positive values.

Notes

The fit is a two-parameter Weibull fit with location fixed at zero by passing floc=0 to scipy.stats.weibull_min.fit.

Only finite positive values are retained before fitting.

Internal profiling is performed with aggregated timing categories so that the main cost centers can be identified without printing one line per bootstrap iteration.

Examples

>>> import pandas as pd
>>> from risk_tools.fv import fv_params
>>> df = pd.DataFrame({"v_char": [5.0, 6.0, 7.0, 8.0]})
>>> out = fv_params(df, n_boot=2, random_state=1)
>>> "original" in out.index
True

fp module¶

Conditional load-distribution fitting.

This module contains functions for fitting lognormal distributions to normalized load data conditioned on wind-speed bins. The main public function, fp_params, bins the paired p_load and v_char data into integer wind-speed bins and fits a lognormal distribution to the load values in each bin.

The output includes the original dataset and a set of bootstrap resamples. The fitted parameters are intended for later use in the risk integration.

risk_tools.fp.fp_params(df_fp: DataFrame, n_boot: int, random_state: int | None = None, bin_width: float = 1.0, bin_min: float = 0.0, bin_max: float = 32.0) → DataFrame¶

Fit lognormal distributions to load data within wind-speed bins.

Parameters:

df_fppandas.DataFrame: DataFrame containing columns "p_load" and "v_char". p_load is the normalized load variable and v_char is the associated wind speed.
n_bootint: Number of bootstrap samples to generate and fit.
random_stateint or None, optional: Seed for reproducible bootstrap resampling. The default is None.
bin_widthfloat, optional: Width of the wind-speed bins. The default is 1.0.
bin_minfloat, optional: Lower bound of the wind-speed binning range. The default is 0.0.
bin_maxfloat, optional: Upper bound of the wind-speed binning range. The default is 32.0.

Returns:

pandas.DataFrame

DataFrame indexed by ["sample", "bin"] with columns:

shape : fitted lognormal shape parameter
scale : fitted lognormal scale parameter
n : number of valid observations used in the fit

The sample level contains "original" plus the bootstrap samples "boot_0", "boot_1", and so on.

Raises:

ValueError: If the input DataFrame does not contain both "p_load" and "v_char".
ValueError: If bin_width is not positive.
ValueError: If bin_max is not greater than bin_min.

Notes

The lognormal fit is performed with location fixed at zero by passing floc=0 to scipy.stats.lognorm.fit.

Only finite positive p_load values are retained before fitting.

risk module¶

Risk integration functions.

This module contains the main risk-calculation logic. It combines Weibull fits for wind speed and lognormal fits for load conditioned on wind speed to approximate exceedance risk above externally defined operating-envelope boundaries.

The module calculates:

overall risk,
low-wind risk,
high-wind risk,
bootstrap realizations of each, and
summary statistics of the bootstrap results.

risk_tools.risk.risk_calculation(df_fp: DataFrame, df_v: DataFrame, df_plimits: DataFrame, n_boot: int, random_state: int | None = None, overall_v_lower: float = 0.0, overall_v_upper: float = 32.0, low_v_lower: float = 0.0, low_v_upper: float = 15.0, high_v_lower: float = 15.0, high_v_upper: float = 32.0, p_upper: float = 1.0, bin_width: float = 1.0, fp_bin_min: float = 0.0, fp_bin_max: float = 32.0) → dict[str, DataFrame]¶

Calculate original and bootstrap risk metrics.

Parameters:

df_fppandas.DataFrame: DataFrame containing columns "p_load" and "v_char" for the paired load-wind dataset.
df_vpandas.DataFrame: DataFrame containing a column named "v_char" for the wind time series used to fit the wind-speed distribution.
df_plimitspandas.DataFrame: DataFrame indexed by wind-speed bin with columns "danger" and "limit". These columns define the lower integration limits of the risk calculation.
n_bootint: Number of bootstrap samples to generate.
random_stateint or None, optional: Seed for reproducible bootstrap resampling. The default is None.
overall_v_lower, overall_v_upperfloat, optional: Wind-speed bounds for the overall risk calculation.
low_v_lower, low_v_upperfloat, optional: Wind-speed bounds for the low-wind regime.
high_v_lower, high_v_upperfloat, optional: Wind-speed bounds for the high-wind regime.
p_upperfloat, optional: Upper integration limit in normalized load space. The default is 1.0.
bin_widthfloat, optional: Width of the wind-speed bins. The default is 1.0.
fp_bin_min, fp_bin_maxfloat, optional: Wind-speed binning range used when fitting the lognormal conditional load distributions.

Returns:

dict of pandas.DataFrame

Dictionary containing the following DataFrames:

original_overall : overall risk for original data
original_by_regime : low- and high-wind risks for original data
bootstrap_overall : overall bootstrap risks
bootstrap_by_regime : low- and high-wind bootstrap risks
stats_overall : mean and standard deviation of overall bootstrap risks
stats_by_regime : mean and standard deviation of regime bootstrap risks

Raises:

ValueError: If required input columns are missing.

Notes

The risk calculation approximates

\[\int_{v_{lower}}^{v_{upper}} f_v(v) \int_{p_{lower}(v)}^{p_{upper}} f_p(p \mid v) \, dp \, dv\]

using integer wind-speed bins and fitted distribution functions.

For each wind-speed bin, the contribution is the product of:

the Weibull probability mass of the wind-speed bin, and
the lognormal probability mass between the boundary and p_upper.

Bins whose lower integration limit is greater than or equal to p_upper contribute zero risk.

utils module¶

Utility functions for the risk_tools package.

This module contains helper functions for:

normalizing wind-speed input loaded from pickle files,
splitting timestamp-indexed data into seasonal subsets, and
formatting summary statistics for display.

risk_tools.utils.coerce_v_char_input(obj: Series | DataFrame) → DataFrame¶

Normalize wind-speed input to a one-column DataFrame named "v_char".

Parameters:

objpandas.Series or pandas.DataFrame: Wind-speed data loaded from file. It may already be a one-column DataFrame or it may be a Series without a column name.

Returns:

pandas.DataFrame: One-column DataFrame with the column name "v_char".

Raises:

TypeError: If the input is neither a pandas Series nor a one-column DataFrame.

Examples

>>> import pandas as pd
>>> from risk_tools.utils import coerce_v_char_input
>>> s = pd.Series([1.0, 2.0, 3.0])
>>> df = coerce_v_char_input(s)
>>> list(df.columns)
['v_char']

risk_tools.utils.format_mean_pm_std(mean_val: float, std_val: float, decimals: int = 6) → str¶

Format a mean and standard deviation as mean ± std.

Parameters:

mean_valfloat: Mean value.
std_valfloat: Standard deviation.
decimalsint, optional: Number of decimal places to display. The default is 6.

Returns:

str: Formatted string of the form "0.123456 ± 0.012345". If either value is missing, returns "NaN".

risk_tools.utils.split_by_season(df: DataFrame) → dict[str, DataFrame]¶

Split a timestamp-indexed DataFrame into seasonal subsets.

Parameters:

dfpandas.DataFrame: Input DataFrame with a DatetimeIndex.

Returns:

dict of pandas.DataFrame: Dictionary with keys "spring", "summer", "fall", and "winter". Each value is the subset of the input DataFrame corresponding to that season.

Raises:

TypeError: If the input DataFrame does not have a DatetimeIndex.

Notes

Seasons are defined by calendar month as follows:

winter: December, January, February
spring: March, April, May
summer: June, July, August
fall: September, October, November

API reference¶

risk_tools package¶

risk_tools¶

Modules¶

bootstrap module¶

fv module¶

fp module¶

risk module¶

utils module¶

risk_tools

Navigation

Related Topics