API reference¶
risk_tools package¶
risk_tools¶
Package for bootstrapped wind/load risk calculations.
This package provides functions to:
generate bootstrap resampling indices,
fit Weibull distributions to wind-speed data,
fit lognormal distributions to normalized load data conditioned on wind-speed bins,
calculate exceedance risk for operating-envelope boundaries, and
split timestamp-indexed data into seasonal subsets.
Modules¶
- bootstrap
Bootstrap index generation.
- fv
Weibull fitting for wind-speed data.
- fp
Lognormal fitting for load conditioned on wind speed.
- risk
Risk calculation from fitted wind-speed and load distributions.
- utils
Utility functions for data preparation and formatting.
- risk_tools.bootstrap(data: int | Series | DataFrame | ndarray, n_boot: int, random_state: int | None = None) list[ndarray]¶
Generate bootstrap index arrays for an input dataset.
- Parameters:
- dataint or pandas.Series or pandas.DataFrame or numpy.ndarray
Input data or number of observations. If an integer is supplied, it is interpreted directly as the number of observations. If a pandas or NumPy object is supplied, the number of observations is inferred using
len(data).- n_bootint
Number of bootstrap samples to generate.
- random_stateint or None, optional
Seed for the random number generator. If provided, the bootstrap samples are reproducible. The default is
None.
- Returns:
- list of numpy.ndarray
List of bootstrap index arrays. Each array has length equal to the number of observations in the original dataset and contains indices sampled with replacement.
- Raises:
- ValueError
If the input dataset is empty.
- ValueError
If
n_bootis negative.
Notes
Bootstrap resampling is performed by drawing indices with replacement from the range
0ton_obs - 1. The returned indices can then be applied to the relevant pandas objects using.ilocor to NumPy arrays using standard indexing.Examples
>>> import pandas as pd >>> from risk_tools.bootstrap import bootstrap >>> df = pd.DataFrame({"x": [1, 2, 3, 4]}) >>> idx = bootstrap(df, n_boot=2, random_state=42) >>> len(idx) 2 >>> len(idx[0]) 4
- risk_tools.coerce_v_char_input(obj: Series | DataFrame) DataFrame¶
Normalize wind-speed input to a one-column DataFrame named
"v_char".- Parameters:
- objpandas.Series or pandas.DataFrame
Wind-speed data loaded from file. It may already be a one-column DataFrame or it may be a Series without a column name.
- Returns:
- pandas.DataFrame
One-column DataFrame with the column name
"v_char".
- Raises:
- TypeError
If the input is neither a pandas Series nor a one-column DataFrame.
Examples
>>> import pandas as pd >>> from risk_tools.utils import coerce_v_char_input >>> s = pd.Series([1.0, 2.0, 3.0]) >>> df = coerce_v_char_input(s) >>> list(df.columns) ['v_char']
- risk_tools.format_mean_pm_std(mean_val: float, std_val: float, decimals: int = 6) str¶
Format a mean and standard deviation as
mean ± std.- Parameters:
- mean_valfloat
Mean value.
- std_valfloat
Standard deviation.
- decimalsint, optional
Number of decimal places to display. The default is
6.
- Returns:
- str
Formatted string of the form
"0.123456 ± 0.012345". If either value is missing, returns"NaN".
- risk_tools.fp_params(df_fp: DataFrame, n_boot: int, random_state: int | None = None, bin_width: float = 1.0, bin_min: float = 0.0, bin_max: float = 32.0) DataFrame¶
Fit lognormal distributions to load data within wind-speed bins.
- Parameters:
- df_fppandas.DataFrame
DataFrame containing columns
"p_load"and"v_char".p_loadis the normalized load variable andv_charis the associated wind speed.- n_bootint
Number of bootstrap samples to generate and fit.
- random_stateint or None, optional
Seed for reproducible bootstrap resampling. The default is
None.- bin_widthfloat, optional
Width of the wind-speed bins. The default is
1.0.- bin_minfloat, optional
Lower bound of the wind-speed binning range. The default is
0.0.- bin_maxfloat, optional
Upper bound of the wind-speed binning range. The default is
32.0.
- Returns:
- pandas.DataFrame
DataFrame indexed by
["sample", "bin"]with columns:shape: fitted lognormal shape parameterscale: fitted lognormal scale parametern: number of valid observations used in the fit
The
samplelevel contains"original"plus the bootstrap samples"boot_0","boot_1", and so on.
- Raises:
- ValueError
If the input DataFrame does not contain both
"p_load"and"v_char".- ValueError
If
bin_widthis not positive.- ValueError
If
bin_maxis not greater thanbin_min.
Notes
The lognormal fit is performed with location fixed at zero by passing
floc=0toscipy.stats.lognorm.fit.Only finite positive
p_loadvalues are retained before fitting.
- risk_tools.fv_params(df_v: DataFrame, n_boot: int, random_state: int | None = None) DataFrame¶
Fit Weibull distributions to original and bootstrapped wind-speed data.
- Parameters:
- df_vpandas.DataFrame
DataFrame containing a column named
"v_char". The values are assumed to represent wind speed. The index is typically a timestamp, but only the data column is used in the fit.- n_bootint
Number of bootstrap samples to generate and fit.
- random_stateint or None, optional
Seed for reproducible bootstrap resampling. The default is
None.
- Returns:
- pandas.DataFrame
DataFrame indexed by sample name. The first row corresponds to the original dataset and is labeled
"original". Subsequent rows correspond to bootstrap samples labeled"boot_0","boot_1", and so on. Columns are:shape: fitted Weibull shape parameterscale: fitted Weibull scale parameter
- Raises:
- ValueError
If the input DataFrame does not contain a column named
"v_char".- ValueError
If the wind-speed data contain no finite positive values.
Notes
The fit is a two-parameter Weibull fit with location fixed at zero by passing
floc=0toscipy.stats.weibull_min.fit.Only finite positive values are retained before fitting.
Internal profiling is performed with aggregated timing categories so that the main cost centers can be identified without printing one line per bootstrap iteration.
Examples
>>> import pandas as pd >>> from risk_tools.fv import fv_params >>> df = pd.DataFrame({"v_char": [5.0, 6.0, 7.0, 8.0]}) >>> out = fv_params(df, n_boot=2, random_state=1) >>> "original" in out.index True
- risk_tools.risk_calculation(df_fp: DataFrame, df_v: DataFrame, df_plimits: DataFrame, n_boot: int, random_state: int | None = None, overall_v_lower: float = 0.0, overall_v_upper: float = 32.0, low_v_lower: float = 0.0, low_v_upper: float = 15.0, high_v_lower: float = 15.0, high_v_upper: float = 32.0, p_upper: float = 1.0, bin_width: float = 1.0, fp_bin_min: float = 0.0, fp_bin_max: float = 32.0) dict[str, DataFrame]¶
Calculate original and bootstrap risk metrics.
- Parameters:
- df_fppandas.DataFrame
DataFrame containing columns
"p_load"and"v_char"for the paired load-wind dataset.- df_vpandas.DataFrame
DataFrame containing a column named
"v_char"for the wind time series used to fit the wind-speed distribution.- df_plimitspandas.DataFrame
DataFrame indexed by wind-speed bin with columns
"danger"and"limit". These columns define the lower integration limits of the risk calculation.- n_bootint
Number of bootstrap samples to generate.
- random_stateint or None, optional
Seed for reproducible bootstrap resampling. The default is
None.- overall_v_lower, overall_v_upperfloat, optional
Wind-speed bounds for the overall risk calculation.
- low_v_lower, low_v_upperfloat, optional
Wind-speed bounds for the low-wind regime.
- high_v_lower, high_v_upperfloat, optional
Wind-speed bounds for the high-wind regime.
- p_upperfloat, optional
Upper integration limit in normalized load space. The default is
1.0.- bin_widthfloat, optional
Width of the wind-speed bins. The default is
1.0.- fp_bin_min, fp_bin_maxfloat, optional
Wind-speed binning range used when fitting the lognormal conditional load distributions.
- Returns:
- dict of pandas.DataFrame
Dictionary containing the following DataFrames:
original_overall: overall risk for original dataoriginal_by_regime: low- and high-wind risks for original databootstrap_overall: overall bootstrap risksbootstrap_by_regime: low- and high-wind bootstrap risksstats_overall: mean and standard deviation of overall bootstrap risksstats_by_regime: mean and standard deviation of regime bootstrap risks
- Raises:
- ValueError
If required input columns are missing.
Notes
The risk calculation approximates
\[\int_{v_{lower}}^{v_{upper}} f_v(v) \int_{p_{lower}(v)}^{p_{upper}} f_p(p \mid v) \, dp \, dv\]using integer wind-speed bins and fitted distribution functions.
For each wind-speed bin, the contribution is the product of:
the Weibull probability mass of the wind-speed bin, and
the lognormal probability mass between the boundary and
p_upper.
Bins whose lower integration limit is greater than or equal to
p_uppercontribute zero risk.
- risk_tools.split_by_season(df: DataFrame) dict[str, DataFrame]¶
Split a timestamp-indexed DataFrame into seasonal subsets.
- Parameters:
- dfpandas.DataFrame
Input DataFrame with a
DatetimeIndex.
- Returns:
- dict of pandas.DataFrame
Dictionary with keys
"spring","summer","fall", and"winter". Each value is the subset of the input DataFrame corresponding to that season.
- Raises:
- TypeError
If the input DataFrame does not have a
DatetimeIndex.
Notes
Seasons are defined by calendar month as follows:
winter: December, January, February
spring: March, April, May
summer: June, July, August
fall: September, October, November
- risk_tools.timed(func: Callable) Callable¶
Decorate a function so that its total execution time is printed.
- Parameters:
- funccallable
Function to decorate.
- Returns:
- callable
Wrapped function that prints elapsed runtime when profiling is enabled.
Notes
This decorator is best suited to top-level functions whose total runtime is of interest. For finer-grained timing inside a function, use
timed_blockorProfileAccumulator.Examples
>>> @timed ... def add(a, b): ... return a + b
bootstrap module¶
Bootstrap tools for resampling indexed data.
This module contains the bootstrap function used throughout the risk-calculation workflow. The function generates bootstrap index arrays that can be applied to pandas objects or NumPy arrays without copying the full datasets in advance.
The design keeps the resampling step separate from the fitting steps so that bootstrap logic can be tested independently and reused across the different parts of the workflow.
- risk_tools.bootstrap.bootstrap(data: int | Series | DataFrame | ndarray, n_boot: int, random_state: int | None = None) list[ndarray]¶
Generate bootstrap index arrays for an input dataset.
- Parameters:
- dataint or pandas.Series or pandas.DataFrame or numpy.ndarray
Input data or number of observations. If an integer is supplied, it is interpreted directly as the number of observations. If a pandas or NumPy object is supplied, the number of observations is inferred using
len(data).- n_bootint
Number of bootstrap samples to generate.
- random_stateint or None, optional
Seed for the random number generator. If provided, the bootstrap samples are reproducible. The default is
None.
- Returns:
- list of numpy.ndarray
List of bootstrap index arrays. Each array has length equal to the number of observations in the original dataset and contains indices sampled with replacement.
- Raises:
- ValueError
If the input dataset is empty.
- ValueError
If
n_bootis negative.
Notes
Bootstrap resampling is performed by drawing indices with replacement from the range
0ton_obs - 1. The returned indices can then be applied to the relevant pandas objects using.ilocor to NumPy arrays using standard indexing.Examples
>>> import pandas as pd >>> from risk_tools.bootstrap import bootstrap >>> df = pd.DataFrame({"x": [1, 2, 3, 4]}) >>> idx = bootstrap(df, n_boot=2, random_state=42) >>> len(idx) 2 >>> len(idx[0]) 4
fv module¶
Wind-speed distribution fitting.
This module contains functions for fitting Weibull distributions to
wind-speed time series. The main public function, fv_params, fits a
two-parameter Weibull distribution to the original dataset and to a set
of bootstrap resamples.
The output is intended for later use in the risk calculation, where the fitted Weibull distribution describes the probability of wind speed falling within each integer wind-speed bin.
This module also includes internal profiling hooks so that the runtime of the major steps in the fitting workflow can be inspected. Timing is aggregated across repeated operations to produce a concise summary rather than one line of output per bootstrap iteration.
- risk_tools.fv.fv_params(df_v: DataFrame, n_boot: int, random_state: int | None = None) DataFrame¶
Fit Weibull distributions to original and bootstrapped wind-speed data.
- Parameters:
- df_vpandas.DataFrame
DataFrame containing a column named
"v_char". The values are assumed to represent wind speed. The index is typically a timestamp, but only the data column is used in the fit.- n_bootint
Number of bootstrap samples to generate and fit.
- random_stateint or None, optional
Seed for reproducible bootstrap resampling. The default is
None.
- Returns:
- pandas.DataFrame
DataFrame indexed by sample name. The first row corresponds to the original dataset and is labeled
"original". Subsequent rows correspond to bootstrap samples labeled"boot_0","boot_1", and so on. Columns are:shape: fitted Weibull shape parameterscale: fitted Weibull scale parameter
- Raises:
- ValueError
If the input DataFrame does not contain a column named
"v_char".- ValueError
If the wind-speed data contain no finite positive values.
Notes
The fit is a two-parameter Weibull fit with location fixed at zero by passing
floc=0toscipy.stats.weibull_min.fit.Only finite positive values are retained before fitting.
Internal profiling is performed with aggregated timing categories so that the main cost centers can be identified without printing one line per bootstrap iteration.
Examples
>>> import pandas as pd >>> from risk_tools.fv import fv_params >>> df = pd.DataFrame({"v_char": [5.0, 6.0, 7.0, 8.0]}) >>> out = fv_params(df, n_boot=2, random_state=1) >>> "original" in out.index True
fp module¶
Conditional load-distribution fitting.
This module contains functions for fitting lognormal distributions to
normalized load data conditioned on wind-speed bins. The main public
function, fp_params, bins the paired p_load and v_char data
into integer wind-speed bins and fits a lognormal distribution to the
load values in each bin.
The output includes the original dataset and a set of bootstrap resamples. The fitted parameters are intended for later use in the risk integration.
- risk_tools.fp.fp_params(df_fp: DataFrame, n_boot: int, random_state: int | None = None, bin_width: float = 1.0, bin_min: float = 0.0, bin_max: float = 32.0) DataFrame¶
Fit lognormal distributions to load data within wind-speed bins.
- Parameters:
- df_fppandas.DataFrame
DataFrame containing columns
"p_load"and"v_char".p_loadis the normalized load variable andv_charis the associated wind speed.- n_bootint
Number of bootstrap samples to generate and fit.
- random_stateint or None, optional
Seed for reproducible bootstrap resampling. The default is
None.- bin_widthfloat, optional
Width of the wind-speed bins. The default is
1.0.- bin_minfloat, optional
Lower bound of the wind-speed binning range. The default is
0.0.- bin_maxfloat, optional
Upper bound of the wind-speed binning range. The default is
32.0.
- Returns:
- pandas.DataFrame
DataFrame indexed by
["sample", "bin"]with columns:shape: fitted lognormal shape parameterscale: fitted lognormal scale parametern: number of valid observations used in the fit
The
samplelevel contains"original"plus the bootstrap samples"boot_0","boot_1", and so on.
- Raises:
- ValueError
If the input DataFrame does not contain both
"p_load"and"v_char".- ValueError
If
bin_widthis not positive.- ValueError
If
bin_maxis not greater thanbin_min.
Notes
The lognormal fit is performed with location fixed at zero by passing
floc=0toscipy.stats.lognorm.fit.Only finite positive
p_loadvalues are retained before fitting.
risk module¶
Risk integration functions.
This module contains the main risk-calculation logic. It combines Weibull fits for wind speed and lognormal fits for load conditioned on wind speed to approximate exceedance risk above externally defined operating-envelope boundaries.
The module calculates:
overall risk,
low-wind risk,
high-wind risk,
bootstrap realizations of each, and
summary statistics of the bootstrap results.
- risk_tools.risk.risk_calculation(df_fp: DataFrame, df_v: DataFrame, df_plimits: DataFrame, n_boot: int, random_state: int | None = None, overall_v_lower: float = 0.0, overall_v_upper: float = 32.0, low_v_lower: float = 0.0, low_v_upper: float = 15.0, high_v_lower: float = 15.0, high_v_upper: float = 32.0, p_upper: float = 1.0, bin_width: float = 1.0, fp_bin_min: float = 0.0, fp_bin_max: float = 32.0) dict[str, DataFrame]¶
Calculate original and bootstrap risk metrics.
- Parameters:
- df_fppandas.DataFrame
DataFrame containing columns
"p_load"and"v_char"for the paired load-wind dataset.- df_vpandas.DataFrame
DataFrame containing a column named
"v_char"for the wind time series used to fit the wind-speed distribution.- df_plimitspandas.DataFrame
DataFrame indexed by wind-speed bin with columns
"danger"and"limit". These columns define the lower integration limits of the risk calculation.- n_bootint
Number of bootstrap samples to generate.
- random_stateint or None, optional
Seed for reproducible bootstrap resampling. The default is
None.- overall_v_lower, overall_v_upperfloat, optional
Wind-speed bounds for the overall risk calculation.
- low_v_lower, low_v_upperfloat, optional
Wind-speed bounds for the low-wind regime.
- high_v_lower, high_v_upperfloat, optional
Wind-speed bounds for the high-wind regime.
- p_upperfloat, optional
Upper integration limit in normalized load space. The default is
1.0.- bin_widthfloat, optional
Width of the wind-speed bins. The default is
1.0.- fp_bin_min, fp_bin_maxfloat, optional
Wind-speed binning range used when fitting the lognormal conditional load distributions.
- Returns:
- dict of pandas.DataFrame
Dictionary containing the following DataFrames:
original_overall: overall risk for original dataoriginal_by_regime: low- and high-wind risks for original databootstrap_overall: overall bootstrap risksbootstrap_by_regime: low- and high-wind bootstrap risksstats_overall: mean and standard deviation of overall bootstrap risksstats_by_regime: mean and standard deviation of regime bootstrap risks
- Raises:
- ValueError
If required input columns are missing.
Notes
The risk calculation approximates
\[\int_{v_{lower}}^{v_{upper}} f_v(v) \int_{p_{lower}(v)}^{p_{upper}} f_p(p \mid v) \, dp \, dv\]using integer wind-speed bins and fitted distribution functions.
For each wind-speed bin, the contribution is the product of:
the Weibull probability mass of the wind-speed bin, and
the lognormal probability mass between the boundary and
p_upper.
Bins whose lower integration limit is greater than or equal to
p_uppercontribute zero risk.
utils module¶
Utility functions for the risk_tools package.
This module contains helper functions for:
normalizing wind-speed input loaded from pickle files,
splitting timestamp-indexed data into seasonal subsets, and
formatting summary statistics for display.
- risk_tools.utils.coerce_v_char_input(obj: Series | DataFrame) DataFrame¶
Normalize wind-speed input to a one-column DataFrame named
"v_char".- Parameters:
- objpandas.Series or pandas.DataFrame
Wind-speed data loaded from file. It may already be a one-column DataFrame or it may be a Series without a column name.
- Returns:
- pandas.DataFrame
One-column DataFrame with the column name
"v_char".
- Raises:
- TypeError
If the input is neither a pandas Series nor a one-column DataFrame.
Examples
>>> import pandas as pd >>> from risk_tools.utils import coerce_v_char_input >>> s = pd.Series([1.0, 2.0, 3.0]) >>> df = coerce_v_char_input(s) >>> list(df.columns) ['v_char']
- risk_tools.utils.format_mean_pm_std(mean_val: float, std_val: float, decimals: int = 6) str¶
Format a mean and standard deviation as
mean ± std.- Parameters:
- mean_valfloat
Mean value.
- std_valfloat
Standard deviation.
- decimalsint, optional
Number of decimal places to display. The default is
6.
- Returns:
- str
Formatted string of the form
"0.123456 ± 0.012345". If either value is missing, returns"NaN".
- risk_tools.utils.split_by_season(df: DataFrame) dict[str, DataFrame]¶
Split a timestamp-indexed DataFrame into seasonal subsets.
- Parameters:
- dfpandas.DataFrame
Input DataFrame with a
DatetimeIndex.
- Returns:
- dict of pandas.DataFrame
Dictionary with keys
"spring","summer","fall", and"winter". Each value is the subset of the input DataFrame corresponding to that season.
- Raises:
- TypeError
If the input DataFrame does not have a
DatetimeIndex.
Notes
Seasons are defined by calendar month as follows:
winter: December, January, February
spring: March, April, May
summer: June, July, August
fall: September, October, November