strata preprocessing¶
Builds strata keys from numeric data. |
|
Builds strata keys from categorical data. |
|
Applies StrataBuilders to columns of an array or DataFrame. |
|
Construct a StrataColumnTransformer from the given transformers and columns. |
- class survivalpredict.strata_preprocessing.StrataBuilderDiscretizer(n_bins=5, strategy='quantile', splits=None)¶
Builds strata keys from numeric data.
If predefined ‘splits’ are given, strata are built via the given bins and ‘n_splits’ and ‘strategy’ are ignored. Otherwise ‘n_splits’ and ‘strategy’ is used to generate bins. Largely inspired by scikitlearn’s KBinsDiscretizer.Adds onto existing strata, if existing strata are passed in.
- Parameters:
n_bins (int , default=5) –
The number of bins to produce. Raises ValueError if
n_bins < 2.’n_bins’ is ignored if ‘splits’ is not None.
strategy ({'uniform','quantile','kmeans'}, default='quantile') –
Strategy used to define the widths of the bins.
’uniform’: All bins in each feature have identical widths.
’quantile’: All bins in each feature have the same number of points.
’kmeans’: Values in each bin have the same nearest center of a 1D k-means cluster.
’strategy’ is ignored if ‘splits’ is not None.
splits (numeric array-like, default=None) – Predefined splits to build bins. If ‘splits’ is None, strategy and n_bins is ignored.
- _splits¶
Splits used to generate bins.
- Type:
ndarray of ndarray of shape (n_features,)
- _uses_strata¶
True if fitted on preexising strata, False otherwise.
- Type:
bool
Methods
fit(X[, times, events, strata, check_input])Learn the strata.
fit_transform(X[, times, events, strata])Fit and build strata.
set_output(*[, transform])Set output container.
set_transform_request(*[, events, strata, times])Configure whether metadata should be requested to be passed to the
transformmethod.transform(X[, times, events, strata])Discretize numerical data to build strata.
- fit(X, times=None, events=None, strata=None, check_input=True)¶
Learn the strata.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Data to be discretized.
times (array-like of shape n_samples, default=None) – Ignored.
events (array-like of shape n_samples, default=None) – Ignored.
strata (array-like of shape n_samples, default=None) – Preexsting strata, the strata built will add onto the preexsting strata.
check_input (bool, default True) – If True, runs checks and casting on data to ensure data is valid.
- Returns:
Returns the instance itself.
- Return type:
object
- fit_transform(X, times=None, events=None, strata=None)¶
Fit and build strata.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Data to be discretized.
times (array-like of shape n_samples, default=None) – Ignored.
events (array-like of shape n_samples, default=None) – Ignored.
strata (array-like of shape n_samples, default=None) – Preexsting strata, the strata built will add onto the preexsting strata.
- Returns:
Build strata.
- Return type:
ndarray of shape (n_samples) , dtype=np.int64
- set_output(*, transform=None)¶
Set output container.
See sphx_glr_auto_examples_miscellaneous_plot_set_output.py for an example on how to use the API.
- Parameters:
transform ({"default", "pandas", "polars"}, default=None) –
Configure output of transform and fit_transform.
”default”: Default output format of a transformer
”pandas”: DataFrame output
”polars”: Polars output
None: Transform configuration is unchanged
Added in version 1.4: “polars” option was added.
- Returns:
self – Estimator instance.
- Return type:
estimator instance
- set_transform_request(*, events='$UNCHANGED$', strata='$UNCHANGED$', times='$UNCHANGED$')¶
Configure whether metadata should be requested to be passed to the
transformmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed totransformif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it totransform.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
events (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
eventsparameter intransform.strata (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
strataparameter intransform.times (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
timesparameter intransform.self (StrataBuilderDiscretizer)
- Returns:
self – The updated object.
- Return type:
object
- transform(X, times=None, events=None, strata=None)¶
Discretize numerical data to build strata.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Data to be discretized.
times (array-like of shape n_samples, default=None) – Ignored.
events (array-like of shape n_samples, default=None) – Ignored.
strata (array-like of shape n_samples, default=None) – Preexsting strata, the strata built will add onto the preexsting strata.
- Returns:
Build strata.
- Return type:
array-like of shape (n_samples) , dtype=np.int64
- class survivalpredict.strata_preprocessing.StrataBuilderEncoder¶
Builds strata keys from categorical data.
If existing strata are passed in, it adds onto existing strata. StrataBuilderEncoder works on categorical data encoded in numerical or string types. One or many columns of mixed types can be used.
Methods
fit(X[, times, events, strata, check_input])Learn the strata.
fit_transform(X[, times, events, strata])Fit and build strata.
set_output(*[, transform])Set output container.
set_transform_request(*[, events, strata, times])Configure whether metadata should be requested to be passed to the
transformmethod.transform(X[, times, events, strata])Encode categorical data to build strata.
- fit(X, times=None, events=None, strata=None, check_input=True)¶
Learn the strata.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Data to be discretized.
times (array-like of shape n_samples, default=None) – Ignored.
events (array-like of shape n_samples, default=None) – Ignored.
strata (array-like of shape n_samples, default=None) – Preexsting strata, the strata built will add onto the preexsting strata.
check_input (bool, default True) – If True, runs checks and casting on data to ensure data is valid.
- Returns:
Returns the instance itself.
- Return type:
object
- fit_transform(X, times=None, events=None, strata=None)¶
Fit and build strata.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Data to be encoded.
times (array-like of shape n_samples, default=None) – Ignored.
events (array-like of shape n_samples, default=None) – Ignored.
strata (array-like of shape n_samples, default=None) – Preexsting strata, the strata built will add onto the preexsting strata.
- Returns:
Build strata.
- Return type:
ndarray of shape (n_samples) , dtype=np.int64
- set_output(*, transform=None)¶
Set output container.
See sphx_glr_auto_examples_miscellaneous_plot_set_output.py for an example on how to use the API.
- Parameters:
transform ({"default", "pandas", "polars"}, default=None) –
Configure output of transform and fit_transform.
”default”: Default output format of a transformer
”pandas”: DataFrame output
”polars”: Polars output
None: Transform configuration is unchanged
Added in version 1.4: “polars” option was added.
- Returns:
self – Estimator instance.
- Return type:
estimator instance
- set_transform_request(*, events='$UNCHANGED$', strata='$UNCHANGED$', times='$UNCHANGED$')¶
Configure whether metadata should be requested to be passed to the
transformmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed totransformif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it totransform.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
events (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
eventsparameter intransform.strata (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
strataparameter intransform.times (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
timesparameter intransform.self (StrataBuilderEncoder)
- Returns:
self – The updated object.
- Return type:
object
- transform(X, times=None, events=None, strata=None)¶
Encode categorical data to build strata.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Data to be encoded.
times (array-like of shape n_samples, default=None) – Ignored.
events (array-like of shape n_samples, default=None) – Ignored.
strata (array-like of shape n_samples, default=None) – Preexsting strata, the strata built will add onto the preexsting strata.
- Returns:
Build strata.
- Return type:
ndarray of shape (n_samples) , dtype=np.int64
- class survivalpredict.strata_preprocessing.StrataColumnTransformer(strata_transformers)¶
Applies StrataBuilders to columns of an array or DataFrame.
Functions much like scikit-learn’s ColumnTransformer class, but for survivalpredict’s StrataBuilders instead of scikit-learn’s Transformers. Different columns or column subsets of the input are separately run through different StrataBuilders. If there are pre-existing strata, it will be added to the created strata. After the strata are built, columns used for building said strata are then removed from the feature set. Works on Numpy arrays as well as Pandas/Polars dataframes.
Designed to be used with ‘survivalpredict.pipeline.SklearnSurvivalPipeline’. The columns in strata_transformers tuples are exposed as parameters that can be tuned. This is useful in model selection, where different strata are tried. Said prams are named as ‘{name}_columns’.
- Parameters:
strata_transformers (list of tuples) – List of (name, strat builder object, columns) tuples specifying the name, strat builder object and columns used for building the strata.
Methods
fit(X[, times, events, strata, check_input])Learn the strata.
fit_transform(X[, times, events, strata, ...])Fit and build strata.
set_output(*[, transform])Set output container.
set_transform_request(*[, check_input, ...])Configure whether metadata should be requested to be passed to the
transformmethod.transform(X[, times, events, strata, ...])Build strata.
- fit(X, times=None, events=None, strata=None, check_input=True)¶
Learn the strata.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Dataset used for building strata.
times (array-like of shape n_samples, default=None) – Ignored.
events (array-like of shape n_samples, default=None) – Ignored.
strata (array-like of shape n_samples, default=None) – Preexsting strata, the strata built will add onto the preexsting strata.
check_input (bool, default True) – If True, runs checks and casting on data to ensure data is valid.
- Returns:
Returns the instance itself.
- Return type:
self
- fit_transform(X, times=None, events=None, strata=None, check_input=True)¶
Fit and build strata.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Data to be discretized.
times (array-like of shape n_samples, default=None) – Ignored.
events (array-like of shape n_samples, default=None) – Ignored.
strata (array-like of shape n_samples, default=None) – Preexsting strata, the strata built will add onto the preexsting strata.
check_input (bool, default True) – If True, runs checks and casting on data to ensure data is valid.
- Returns:
X (ndarray of shape (n_samples, ???)) – The feature set without the columns used for strata.
times (ndarray of shape (n_samples)) – The same times array passed into method.
events (ndarray of shape (n_samples)) – The same events array passed into method.
strata (ndarray of shape (n_samples) , dtype=np.int64) – The strata build from the original feature set.
- set_output(*, transform=None)¶
Set output container.
See sphx_glr_auto_examples_miscellaneous_plot_set_output.py for an example on how to use the API.
- Parameters:
transform ({"default", "pandas", "polars"}, default=None) –
Configure output of transform and fit_transform.
”default”: Default output format of a transformer
”pandas”: DataFrame output
”polars”: Polars output
None: Transform configuration is unchanged
Added in version 1.4: “polars” option was added.
- Returns:
self – Estimator instance.
- Return type:
estimator instance
- set_transform_request(*, check_input='$UNCHANGED$', events='$UNCHANGED$', strata='$UNCHANGED$', times='$UNCHANGED$')¶
Configure whether metadata should be requested to be passed to the
transformmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed totransformif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it totransform.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
check_input (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
check_inputparameter intransform.events (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
eventsparameter intransform.strata (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
strataparameter intransform.times (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
timesparameter intransform.self (StrataColumnTransformer)
- Returns:
self – The updated object.
- Return type:
object
- transform(X, times=None, events=None, strata=None, check_input=True)¶
Build strata.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Dataset used for building strata.
times (array-like of shape n_samples, default=None) – Ignored.
events (array-like of shape n_samples, default=None) – Ignored.
strata (array-like of shape n_samples, default=None) – Preexsting strata, the strata built will add onto the preexsting strata.
check_input (bool, default True) – If True, runs checks and casting on data to ensure data is valid.
- Returns:
X (ndarray of shape (n_samples, ???)) – The feature set without the columns used for strata.
times (ndarray of shape (n_samples)) – The same times array passed into method.
events (ndarray of shape (n_samples)) – The same events array passed into method.
strata (ndarray of shape (n_samples) , dtype=np.int64) – The strata build from the original feature set.
- survivalpredict.strata_preprocessing.make_strata_column_transformer(*strata_transformer_columns)¶
Construct a StrataColumnTransformer from the given transformers and columns.
Functions much like scikit-learn’s make_column_transformer class, but for survivalpredict’s StrataBuilders instead of scikit-learn’s Transformers. A utility for building StrataColumnTransformer without naming strata builders.
- Parameters:
strata_transformers (list of tuples) – List of (strat builder object, columns) tuples specifying strat builder object and columns used for building the strata.
strata_transformer_columns (tuple[StrataBuilderProtocal, Sequence[Any] | int | str | slice])
- Returns:
Returns a StrataColumnTransformer object.
- Return type: