pipeline

build_sklearn_pipeline_target

Target builder for survivalpredict's ‘SklearnSurvivalPipeline’.

SklearnSurvivalPipeline

Scikit-learn compatible pipeline class for survivalpredict.

make_sklearn_survival_pipeline

Construct a SklearnSurvivalPipeline from given steps.

survivalpredict.pipeline.build_sklearn_pipeline_target(times, events, strata=None, times_start=None)

Target builder for survivalpredict’s ‘SklearnSurvivalPipeline’.

Takes ‘times’, ‘events’ arrays, and optionally ‘strata’ and ‘times_start’ inputs; and builds a singular numpy array that can function as the ‘y’/observed for ‘SklearnSurvivalPipeline’ and scikit-learn’s api.

Parameters:
  • times (array-like of shape (n_samples), dtype=np.int64) – Point in time last observed.

  • events (array-like of shape (n_samples), dtype=np.bool_) – Experianed event.

  • strata (array-like of shape (n_samples,), dtype=np.int64, default=None) – If passed in, associated strata for per observation.

  • times_start (array-like of shape (n_samples, dtype=np.int64), default=None) – Starting point for observation. If not passed in, all times_start times are assumed to be 0.

Returns:

Returns a numpy array that survivalpredict knows how to unpack, while allowing said numpy array to flow through the various machinery of scikit-learn.

Return type:

ndarray

class survivalpredict.pipeline.SklearnSurvivalPipeline(steps, max_time, *, memory=None)

Scikit-learn compatible pipeline class for survivalpredict.

A sequence of data transformers and strata preprocessing with a final predictor. Takes a feature matrix/X as well as the output ‘build_sklearn_pipeline_target’ as the ‘y’. Combined survivalpredict’s ‘sklearn_scorer’s, it allows users to build pipelines that can interface with the rest of Scikit-learn’s api. Parameters of the various steps using their names and the parameter name separated by a ‘__’, allowing for parameters of various steps to be tuned during cross-validation searches.

Parameters:
  • steps (list[tuple[str, BaseEstimator]]) – List of the tuples with names and class instances that are chained together. The class instances are assumped to be scikit-learn transformers/survivalpredict StrataBuilders/StrataColumnTransformers. The final instance is assumed to be a survivalpredict estimator predictor.

  • max_time (int) – Maximum time for building survival curves.

  • memory (str or object with the joblib.Memory interface, default=None) – Used to cache the fitted transformers of the pipeline. The last step will never be cached, even if it is a transformer. By default, no caching is performed. If a string is given, it is the path to the caching directory. Enabling caching triggers a clone of the transformers before fitting. Therefore, the transformer instance given to the pipeline cannot be inspected directly. Use the attribute named_steps or steps to inspect estimators within the pipeline. Caching the transformers is advantageous when fitting is time consuming.

Methods

fit(X, y)

Fit the model.

predict(X[, strata])

Predict using the pipeline.

fit_predict

fit(X, y)

Fit the model.

Parameters:
  • X (ndarray of shape (n_samples, n_features)) – Training data.

  • y (ndarray of shape) – Target values. Assumes that output of ‘build_sklearn_pipeline_target’.

Returns:

Returns the instance itself.

Return type:

object

predict(X, strata=None)

Predict using the pipeline.

Parameters:
  • X (ndarray of shape (n_samples, n_features)) – Samples.

  • strata (array-like of shape (n_samples,), dtype=np.int64, default=None) – If y from training/fit had prebuilt strata; strata can be passed into fit.

Returns:

The estimated survival curves, the left-most column is the probability of survival at time 1, and the right-most column ends at max_time.

Return type:

ndarray of shape (n_samples, max_time), dtype=np.float64

survivalpredict.pipeline.make_sklearn_survival_pipeline(*steps_no_names, max_time, memory=None)

Construct a SklearnSurvivalPipeline from given steps.

This is a shorthand for the SklearnSurvivalPipeline constructor; it does not require, and does not permit, naming the steps. Instead, their names will be set to the lowercase of their types automatically.

Parameters:
  • *steps_no_names (list of Estimator objects) – List of class instances that are chained together. The class instances are assumped to be scikit-learn transformers/survivalpredict StrataBuilders/StrataColumnTransformers. The final instance is assumed to be a survivalpredict estimator predictor.

  • max_time (int) – Maximum time for building survival curves.

  • memory (str or object with the joblib.Memory interface, default=None Used) – Used to cache the fitted transformers of the pipeline. The last step will never be cached, even if it is a transformer. By default, no caching is performed. If a string is given, it is the path to the caching directory. Enabling caching triggers a clone of the transformers before fitting. Therefore, the transformer instance given to the pipeline cannot be inspected directly. Use the attribute named_steps or steps to inspect estimators within the pipeline. Caching the transformers is advantageous when fitting is time consuming.