survivalpredict.estimators.ParametricDiscreteTimePH¶
- class survivalpredict.estimators.ParametricDiscreteTimePH(*, distribution='chen', alpha=0.0, l1_ratio=0.5, pytensor_mode='JAX', strata_uses_pytensor_scan=False, coef_prior_normal_sigma=1.5, base_harard_prior_exponential_lam=5.0, scipy_minimize_method='L-BFGS-B')¶
Parametric Discrete Time Proportional Hazards.
A fully parametric linear proportional hazards model. Unlike Cox, both the coefficients and the base hazard are directly estimated from observed survival over time. Various distributions are available as base hazards; namely, Chen, Weibull, Log-Normal, Log-logistic, Gompertz, Gamma and Additive-Chen-Weibull[1] are available as hyperparameters. Maximum likelihood is estimated using a survival distinct time likelihood[2] with censorship. Implemented with Pymc/Pytensor, with either a Jax or numba backend.
- Parameters:
distribution ({"chen", "weibull","log_normal","log_logistic","gamma","gompertz","additive_chen_weibull"}, default='chen') – Distribution of base hazard.
alpha (float, default=0.0) – Constant that multiplies the penalty terms. Used to penalize coefficients durring training.
l1_ratio (float, default=0.5) – The ElasticNet mixing parameter, with
0 <= l1_ratio <= 1. Forl1_ratio = 0the penalty is an L2 penalty.For l1_ratio = 1it is an L1 penalty. For0 < l1_ratio < 1, the penalty is a combination of L1 and L2.pytensor_mode ({"JAX", "NUMBA","FAST_COMPILE"}, default='JAX') – Pytensor backend. ‘JAX’ has the fastest compile time, but is not multiprocessing safe. NUMBA is multiprocessing safe, but has a long compile time. Pytensor’s ‘FAST_COMPILE’ mode is multiprocessing safe and has a fast compile time, but runs slower than the other modes. ‘JAX’ is a good default, but ‘NUMBA’ is recommended when using multiprocessing.
strata_uses_pytensor_scan (bool, default=False) – If strata are present and ‘strata_uses_pytensor_scan’ is True, Pytensor’s ‘scan’ functionality is used to map strata to observations during training. Using Pytensor scan might increase the Pytensor compile time, but will lead to a faster runtime. For considerable data and a high quantity of starta, it is recommended to set strata_uses_pytensor_scan to True.
coef_prior_normal_sigma (float, default=1.5) – This class runs a Pymc model under the hood. The coefficients are modeled as normal distributions. This parameter is the sigma of the prior. The larger the sigma, the wider the possible set of values coverage. It is recommended to scale the data to avoid tuning this parameter.
base_harard_prior_exponential_lam (float, default=5.0) – This class runs a Pymc model under the hood. The base hazard distribution’s parameters are modeled as exponential distributions. This parameter is the ‘lam’ of the prior of the base hazard distribution’ parameters. It is recommended to scale the data to avoid tuning this parameter.
scipy_minimize_method ({"nelder-mead","powell","CG","BFGS","Newton-CG","L-BFGS-B","TNC","COBYLA","SLSQP","trust-constr","dogleg","trust-ncg","trust-exact","trust-krylov","basinhopping",}, default='L-BFGS-B') – This class runs a Pymc model under the hood. Durring training we simply find the ‘Maximum likelihood estimation’(MLE)/ ‘Maximum a posteriori estimation’(MAP). This is the exposed method that PYMC find the MLE/MAP.
References
[1] Thanh Thach T, Briš R. An additive Chen-Weibull distribution and its applications in reliability modeling. Qual Reliab Engng Int. 2021;37:352–373. https://doi.org/10.1002/qre.2740
[2] Suresh K, Severn C, Ghosh D. Survival prediction models: an introduction to discrete-time modeling. BMC Med Res Methodol. 2022 Jul 26;22(1):207. doi: 10.1186/s12874-022-01679-6. PMID: 35883032; PMCID: PMC9316420.
Methods
fit(X, times, events[, strata, check_input, ...])Fit model.
fit_predict(*args, **kwargs)Fit model and Build survival curves.
get_base_hazard([max_time])Retrieve base hazards estimated by model.
get_pymc_model(X, times, events[, max_time, ...])Return the underlying Pymc model.
predict(X[, strata, max_time])Build survival curves on an array of vectors X.
predict_risk(X)Build relative risk on an array of vectors X.
- __init__(*, distribution='chen', alpha=0.0, l1_ratio=0.5, pytensor_mode='JAX', strata_uses_pytensor_scan=False, coef_prior_normal_sigma=1.5, base_harard_prior_exponential_lam=5.0, scipy_minimize_method='L-BFGS-B')¶
- Parameters:
distribution (Literal['chen', 'weibull', 'log_normal', 'log_logistic', 'gamma', 'gompertz', 'additive_chen_weibull'] | None)
alpha (float)
l1_ratio (float)
pytensor_mode (Literal['JAX', 'NUMBA', 'FAST_COMPILE'])
strata_uses_pytensor_scan (bool)
coef_prior_normal_sigma (float)
base_harard_prior_exponential_lam (float)
scipy_minimize_method (Literal['nelder-mead', 'powell', 'CG', 'BFGS', 'Newton-CG', 'L-BFGS-B', 'TNC', 'COBYLA', 'SLSQP', 'trust-constr', 'dogleg', 'trust-ncg', 'trust-exact', 'trust-krylov', 'basinhopping'])
Methods
__init__(*[, distribution, alpha, l1_ratio, ...])fit(X, times, events[, strata, check_input, ...])Fit model.
fit_predict(*args, **kwargs)Fit model and Build survival curves.
get_base_hazard([max_time])Retrieve base hazards estimated by model.
get_metadata_routing()Get metadata routing of this object.
get_params([deep])Get parameters for this estimator.
get_pymc_model(X, times, events[, max_time, ...])Return the underlying Pymc model.
predict(X[, strata, max_time])Build survival curves on an array of vectors X.
predict_risk(X)Build relative risk on an array of vectors X.
set_fit_request(*[, check_input, events, ...])Configure whether metadata should be requested to be passed to the
fitmethod.set_params(**params)Set the parameters of this estimator.
set_predict_request(*[, max_time, strata])Configure whether metadata should be requested to be passed to the
predictmethod.- fit(X, times, events, strata=None, check_input=True, times_start=None)¶
Fit model.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Training data.
times (array-like of shape (n_samples), dtype=np.int64) – Point in time last observed.
events (array-like of shape (n_samples), dtype=np.bool_) – Experianed event.
strata (array-like of shape (n_samples,), dtype=np.int64, default=None) – If passed in, associated strata for per observation.
check_input (bool, default=True) – If True, validates and casts inputs.
times_start (array-like of shape (n_samples, dtype=np.int64), default=None) – Starting point for observation. If not passed in, all times_start times are assumed to be 0.
- Returns:
Fitted Estimator.
- Return type:
object
- fit_predict(*args, **kwargs)¶
Fit model and Build survival curves.
- get_base_hazard(max_time=None)¶
Retrieve base hazards estimated by model.
- Parameters:
max_time (int, default=None) – Maximum time of built survival curves. If none, maximum time is max time seen on training data.
- Returns:
The estimated base hazard; used in building survival curves.
- Return type:
ndarray of shape (max_time,) or (n_strata,max_time,) , dtype=np.float64
- get_pymc_model(X, times, events, max_time=None, labes_names=None, strata=None, strata_names=None, times_start=None)¶
Return the underlying Pymc model.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Training data.
times (array-like of shape (n_samples), dtype=np.int64) – Point in time last observed.
events (array-like of shape (n_samples), dtype=np.bool_) – Experianed event.
max_time (int, default=None) – Maximum time of built survival curves. If none, maximum time is max time seen on training data.
labes_names (list of str, default=None) – Names for feature, allows for parameters associated with each feature to be named accordingly.
strata (array-like of shape (n_samples,), dtype=np.int64, default=None) – If passed in, associated strata for per observation.
strata_names (list of str, default=None) – Names for strata, allows for parameters associated with each strata to be named accordingly.
times_start (array-like of shape (n_samples, dtype=np.int64), default=None) – Starting point for observation. If not passed in, all times_start times are assumed to be 0.
- Return type:
“pymc.Model”
- predict(X, strata=None, max_time=None)¶
Build survival curves on an array of vectors X.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Predicting data.
strata (array-like of shape (n_samples,), dtype=np.int64, default=None) – If passed in, associated strata for per observation.
max_time (Optional[int], default=None) – Maximum time of built survival curves. If none, maximum time is max time seen on training data.
- Returns:
The estimated survival curves, the left-most column is the probability of survival at time 1, and the right-most column ends at max_time.
- Return type:
ndarray of shape (n_samples, max_time), dtype=np.float64
- predict_risk(X)¶
Build relative risk on an array of vectors X.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Predicting data.
- Returns:
The Relative risk of X, used under the hood for building survival curves. Relative risk is what ‘Concordance Index’ examines.
- Return type:
ndarray of shape (n_samples), dtype=np.float64