strata preprocessing

StrataBuilderDiscretizer

Builds strata keys from numeric data.

StrataBuilderEncoder

Builds strata keys from categorical data.

StrataColumnTransformer

Applies StrataBuilders to columns of an array or DataFrame.

make_strata_column_transformer

Construct a StrataColumnTransformer from the given transformers and columns.

class survivalpredict.strata_preprocessing.StrataBuilderDiscretizer(n_bins=5, strategy='quantile', splits=None)

Builds strata keys from numeric data.

If predefined ‘splits’ are given, strata are built via the given bins and ‘n_splits’ and ‘strategy’ are ignored. Otherwise ‘n_splits’ and ‘strategy’ is used to generate bins. Largely inspired by scikitlearn’s KBinsDiscretizer.Adds onto existing strata, if existing strata are passed in.

Parameters:
  • n_bins (int , default=5) –

    The number of bins to produce. Raises ValueError if n_bins < 2.

    ’n_bins’ is ignored if ‘splits’ is not None.

  • strategy ({'uniform','quantile','kmeans'}, default='quantile') –

    Strategy used to define the widths of the bins.

    • ’uniform’: All bins in each feature have identical widths.

    • ’quantile’: All bins in each feature have the same number of points.

    • ’kmeans’: Values in each bin have the same nearest center of a 1D k-means cluster.

    ’strategy’ is ignored if ‘splits’ is not None.

  • splits (numeric array-like, default=None) – Predefined splits to build bins. If ‘splits’ is None, strategy and n_bins is ignored.

_splits

Splits used to generate bins.

Type:

ndarray of ndarray of shape (n_features,)

_uses_strata

True if fitted on preexising strata, False otherwise.

Type:

bool

Methods

fit(X[, times, events, strata, check_input])

Learn the strata.

fit_transform(X[, times, events, strata])

Fit and build strata.

set_output(*[, transform])

Set output container.

set_transform_request(*[, events, strata, times])

Configure whether metadata should be requested to be passed to the transform method.

transform(X[, times, events, strata])

Discretize numerical data to build strata.

fit(X, times=None, events=None, strata=None, check_input=True)

Learn the strata.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Data to be discretized.

  • times (array-like of shape n_samples, default=None) – Ignored.

  • events (array-like of shape n_samples, default=None) – Ignored.

  • strata (array-like of shape n_samples, default=None) – Preexsting strata, the strata built will add onto the preexsting strata.

  • check_input (bool, default True) – If True, runs checks and casting on data to ensure data is valid.

Returns:

Returns the instance itself.

Return type:

object

fit_transform(X, times=None, events=None, strata=None)

Fit and build strata.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Data to be discretized.

  • times (array-like of shape n_samples, default=None) – Ignored.

  • events (array-like of shape n_samples, default=None) – Ignored.

  • strata (array-like of shape n_samples, default=None) – Preexsting strata, the strata built will add onto the preexsting strata.

Returns:

Build strata.

Return type:

ndarray of shape (n_samples) , dtype=np.int64

set_output(*, transform=None)

Set output container.

See sphx_glr_auto_examples_miscellaneous_plot_set_output.py for an example on how to use the API.

Parameters:

transform ({"default", "pandas", "polars"}, default=None) –

Configure output of transform and fit_transform.

  • ”default”: Default output format of a transformer

  • ”pandas”: DataFrame output

  • ”polars”: Polars output

  • None: Transform configuration is unchanged

Added in version 1.4: “polars” option was added.

Returns:

self – Estimator instance.

Return type:

estimator instance

set_transform_request(*, events='$UNCHANGED$', strata='$UNCHANGED$', times='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the transform method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to transform.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • events (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for events parameter in transform.

  • strata (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for strata parameter in transform.

  • times (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for times parameter in transform.

  • self (StrataBuilderDiscretizer)

Returns:

self – The updated object.

Return type:

object

transform(X, times=None, events=None, strata=None)

Discretize numerical data to build strata.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Data to be discretized.

  • times (array-like of shape n_samples, default=None) – Ignored.

  • events (array-like of shape n_samples, default=None) – Ignored.

  • strata (array-like of shape n_samples, default=None) – Preexsting strata, the strata built will add onto the preexsting strata.

Returns:

Build strata.

Return type:

array-like of shape (n_samples) , dtype=np.int64

class survivalpredict.strata_preprocessing.StrataBuilderEncoder

Builds strata keys from categorical data.

If existing strata are passed in, it adds onto existing strata. StrataBuilderEncoder works on categorical data encoded in numerical or string types. One or many columns of mixed types can be used.

Methods

fit(X[, times, events, strata, check_input])

Learn the strata.

fit_transform(X[, times, events, strata])

Fit and build strata.

set_output(*[, transform])

Set output container.

set_transform_request(*[, events, strata, times])

Configure whether metadata should be requested to be passed to the transform method.

transform(X[, times, events, strata])

Encode categorical data to build strata.

fit(X, times=None, events=None, strata=None, check_input=True)

Learn the strata.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Data to be discretized.

  • times (array-like of shape n_samples, default=None) – Ignored.

  • events (array-like of shape n_samples, default=None) – Ignored.

  • strata (array-like of shape n_samples, default=None) – Preexsting strata, the strata built will add onto the preexsting strata.

  • check_input (bool, default True) – If True, runs checks and casting on data to ensure data is valid.

Returns:

Returns the instance itself.

Return type:

object

fit_transform(X, times=None, events=None, strata=None)

Fit and build strata.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Data to be encoded.

  • times (array-like of shape n_samples, default=None) – Ignored.

  • events (array-like of shape n_samples, default=None) – Ignored.

  • strata (array-like of shape n_samples, default=None) – Preexsting strata, the strata built will add onto the preexsting strata.

Returns:

Build strata.

Return type:

ndarray of shape (n_samples) , dtype=np.int64

set_output(*, transform=None)

Set output container.

See sphx_glr_auto_examples_miscellaneous_plot_set_output.py for an example on how to use the API.

Parameters:

transform ({"default", "pandas", "polars"}, default=None) –

Configure output of transform and fit_transform.

  • ”default”: Default output format of a transformer

  • ”pandas”: DataFrame output

  • ”polars”: Polars output

  • None: Transform configuration is unchanged

Added in version 1.4: “polars” option was added.

Returns:

self – Estimator instance.

Return type:

estimator instance

set_transform_request(*, events='$UNCHANGED$', strata='$UNCHANGED$', times='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the transform method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to transform.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • events (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for events parameter in transform.

  • strata (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for strata parameter in transform.

  • times (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for times parameter in transform.

  • self (StrataBuilderEncoder)

Returns:

self – The updated object.

Return type:

object

transform(X, times=None, events=None, strata=None)

Encode categorical data to build strata.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Data to be encoded.

  • times (array-like of shape n_samples, default=None) – Ignored.

  • events (array-like of shape n_samples, default=None) – Ignored.

  • strata (array-like of shape n_samples, default=None) – Preexsting strata, the strata built will add onto the preexsting strata.

Returns:

Build strata.

Return type:

ndarray of shape (n_samples) , dtype=np.int64

class survivalpredict.strata_preprocessing.StrataColumnTransformer(strata_transformers)

Applies StrataBuilders to columns of an array or DataFrame.

Functions much like scikit-learn’s ColumnTransformer class, but for survivalpredict’s StrataBuilders instead of scikit-learn’s Transformers. Different columns or column subsets of the input are separately run through different StrataBuilders. If there are pre-existing strata, it will be added to the created strata. After the strata are built, columns used for building said strata are then removed from the feature set. Works on Numpy arrays as well as Pandas/Polars dataframes.

Designed to be used with ‘survivalpredict.pipeline.SklearnSurvivalPipeline’. The columns in strata_transformers tuples are exposed as parameters that can be tuned. This is useful in model selection, where different strata are tried. Said prams are named as ‘{name}_columns’.

Parameters:

strata_transformers (list of tuples) – List of (name, strat builder object, columns) tuples specifying the name, strat builder object and columns used for building the strata.

Methods

fit(X[, times, events, strata, check_input])

Learn the strata.

fit_transform(X[, times, events, strata, ...])

Fit and build strata.

set_output(*[, transform])

Set output container.

set_transform_request(*[, check_input, ...])

Configure whether metadata should be requested to be passed to the transform method.

transform(X[, times, events, strata, ...])

Build strata.

fit(X, times=None, events=None, strata=None, check_input=True)

Learn the strata.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Dataset used for building strata.

  • times (array-like of shape n_samples, default=None) – Ignored.

  • events (array-like of shape n_samples, default=None) – Ignored.

  • strata (array-like of shape n_samples, default=None) – Preexsting strata, the strata built will add onto the preexsting strata.

  • check_input (bool, default True) – If True, runs checks and casting on data to ensure data is valid.

Returns:

Returns the instance itself.

Return type:

self

fit_transform(X, times=None, events=None, strata=None, check_input=True)

Fit and build strata.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Data to be discretized.

  • times (array-like of shape n_samples, default=None) – Ignored.

  • events (array-like of shape n_samples, default=None) – Ignored.

  • strata (array-like of shape n_samples, default=None) – Preexsting strata, the strata built will add onto the preexsting strata.

  • check_input (bool, default True) – If True, runs checks and casting on data to ensure data is valid.

Returns:

  • X (ndarray of shape (n_samples, ???)) – The feature set without the columns used for strata.

  • times (ndarray of shape (n_samples)) – The same times array passed into method.

  • events (ndarray of shape (n_samples)) – The same events array passed into method.

  • strata (ndarray of shape (n_samples) , dtype=np.int64) – The strata build from the original feature set.

set_output(*, transform=None)

Set output container.

See sphx_glr_auto_examples_miscellaneous_plot_set_output.py for an example on how to use the API.

Parameters:

transform ({"default", "pandas", "polars"}, default=None) –

Configure output of transform and fit_transform.

  • ”default”: Default output format of a transformer

  • ”pandas”: DataFrame output

  • ”polars”: Polars output

  • None: Transform configuration is unchanged

Added in version 1.4: “polars” option was added.

Returns:

self – Estimator instance.

Return type:

estimator instance

set_transform_request(*, check_input='$UNCHANGED$', events='$UNCHANGED$', strata='$UNCHANGED$', times='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the transform method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to transform.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • check_input (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for check_input parameter in transform.

  • events (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for events parameter in transform.

  • strata (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for strata parameter in transform.

  • times (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for times parameter in transform.

  • self (StrataColumnTransformer)

Returns:

self – The updated object.

Return type:

object

transform(X, times=None, events=None, strata=None, check_input=True)

Build strata.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Dataset used for building strata.

  • times (array-like of shape n_samples, default=None) – Ignored.

  • events (array-like of shape n_samples, default=None) – Ignored.

  • strata (array-like of shape n_samples, default=None) – Preexsting strata, the strata built will add onto the preexsting strata.

  • check_input (bool, default True) – If True, runs checks and casting on data to ensure data is valid.

Returns:

  • X (ndarray of shape (n_samples, ???)) – The feature set without the columns used for strata.

  • times (ndarray of shape (n_samples)) – The same times array passed into method.

  • events (ndarray of shape (n_samples)) – The same events array passed into method.

  • strata (ndarray of shape (n_samples) , dtype=np.int64) – The strata build from the original feature set.

survivalpredict.strata_preprocessing.make_strata_column_transformer(*strata_transformer_columns)

Construct a StrataColumnTransformer from the given transformers and columns.

Functions much like scikit-learn’s make_column_transformer class, but for survivalpredict’s StrataBuilders instead of scikit-learn’s Transformers. A utility for building StrataColumnTransformer without naming strata builders.

Parameters:
  • strata_transformers (list of tuples) – List of (strat builder object, columns) tuples specifying strat builder object and columns used for building the strata.

  • strata_transformer_columns (tuple[StrataBuilderProtocal, Sequence[Any] | int | str | slice])

Returns:

Returns a StrataColumnTransformer object.

Return type:

StrataColumnTransformer