Internal¶
This page documents lower-level classes, utilities, and internal components of uniPairs.estimator.
- class uniPairs.estimator.CVResult[source]¶
Bases:
TypedDict- active_set: List[str]¶
- best_lmda: float¶
- cv_errors: ndarray¶
- lmda_path: ndarray¶
- n_folds: int¶
- prevalidated_preds: ndarray¶
- class uniPairs.estimator.FamilySpec[source]¶
Bases:
TypedDict- family: Literal['gaussian', 'binomial', 'cox']¶
- class uniPairs.estimator.GLM(fit_intercept: bool = True, vars_names: List[str] | None = None, family_spec: FamilySpec | None = None, p_value: bool = False, preds_loo: bool = False)[source]¶
Bases:
BaseEstimatorGeneralized Linear Model (GLM) wrapper using adelie.grpnet for Gaussian, Binomial (logistic), and Cox proportional hazards regression.
Supports optional intercept, leave-one-out (LOO) predictions, and likelihood ratio p-values.
- Parameters:
fit_intercept (bool, default=True) – Whether to include an intercept term. For Cox regression, the intercept is always excluded.
vars_names (list of str, optional) – Names of predictor variables. If None, names are generated as
["X0", "X1", ..., "Xp"].family_spec (dict, optional) –
Dictionary containing the model family. Must contain key
"family"with one of{"gaussian", "binomial", "cox"}.Examples:
{"family": "binomial"}p_value (bool, default=False) – If True, computes a two-sided p-value via a likelihood ratio test comparing the full model vs. the model without the last coefficient.
preds_loo (bool, default=False) – If True, computes leave-one-out linear predictors following Rad & Maleki (2020).
- slopes\_
Estimated regression coefficients.
- Type:
ndarray of shape (n_features,)
- intercept\_
Estimated intercept. Zero for Cox models.
- Type:
float
- preds_loo\_
Leave-one-out predictions if
preds_loo=True.- Type:
ndarray of shape (n_samples,), optional
- p_value\_
Two-sided p-value for the last predictor coefficient.
- Type:
float, optional
- vars_names\_
Names of variables used in the model.
- Type:
list of str
- offset\_
Optional observation-specific offsets supplied to the GLM.
- Type:
ndarray or None
Examples
>>> glm = GLM(family_spec={"family": "binomial"}, preds_loo=True) >>> glm.fit(X, y) >>> eta = glm.predict(X) # linear predictor >>> p = glm.predict(X, response_scale=True) # inverse-link >>> active = glm.get_active_variables() >>> formula = glm.get_fitted_function()
- class uniPairs.estimator.Lasso(lmda_path: ndarray | None = None, fit_intercept: bool = True, vars_names: List[str] | None = None, family_spec: FamilySpec | None = None)[source]¶
Bases:
BaseEstimatorLasso regression using adelie.grpnet for Gaussian, Binomial (logistic), or Cox proportional hazards models.
The model standardizes inputs, fits a regularization path over
lmda_path_(lambda values), and returns intercepts and slopes on the original data scale.- Parameters:
lmda_path (ndarray of shape (n_lambdas,), optional) – Regularization path to use. If None, it is generated automatically via
generate_lmda_path().fit_intercept (bool, default=True) – Whether to include an intercept term. For Cox models, the intercept is always excluded.
vars_names (list of str, optional) – Names of variables used in the model. If None, they are generated automatically as
["X0", "X1", ..., "Xp"].family_spec (dict, optional) –
Dictionary containing the response family. Must include key
"family"with one of:{"gaussian", "binomial", "cox"}.Examples
>>> {"family": "binomial"}
- lmda_path\_
Lambda values used during fitting.
- Type:
ndarray of shape (n_lmdas,)
- slopes\_
Regression coefficients for each lambda.
- Type:
ndarray of shape (n_lmdas, n_features)
- intercept\_
Intercept for each lambda.
- Type:
ndarray of shape (n_lmdas, 1)
- vars_names\_
Names of variables in the model.
- Type:
list of str
- X_mean\_
Feature means used for standardization.
- Type:
ndarray of shape (n_features,)
- X_std\_
Feature standard deviations used for standardization.
- Type:
ndarray of shape (n_features,)
- offset\_
Observation-specific offset passed into the GLM, if provided.
- Type:
ndarray or None
Examples
>>> model = Lasso(family_spec={"family": "binomial"}) >>> model.fit(X, y) >>> y_pred = model.predict(X, response_scale=True) >>> active = model.get_active_variables(lmda=0.1) >>> formula = model.get_fitted_function()
- class uniPairs.estimator.OLS(fit_intercept: bool = True, vars_names: List[str] | None = None, p_value: bool = False, preds_loo: bool = False)[source]¶
Bases:
BaseEstimatorOrdinary Least Squares (OLS) regression with optional intercept, leave-one-out predictions, and p-value estimation.
- Parameters:
fit_intercept (bool, default=True) – If True, an intercept term is included in the model.
vars_names (list of str, optional) – Names of the variables. If None, names are automatically generated as
["X0", "X1", ..., "Xp"].p_value (bool, default=False) – If True, a two-sided p-value is computed for the last coefficient after fitting.
preds_loo (bool, default=False) – If True, leave-one-out predictions are computed after fitting.
- slopes\_
Estimated regression coefficients (excluding intercept).
- Type:
ndarray of shape (n_features,)
- intercept\_
Estimated intercept term (if
fit_intercept=True).- Type:
float
- preds_loo\_
Leave-one-out predictions. Present only if
preds_loo=True.- Type:
ndarray of shape (n_samples,), optional
- p_value\_
Two-sided p-value for the last coefficient. Present only if
p_value=True.- Type:
ndarray of shape (1,), optional
- vars_names\_
Names of the variables used in the model.
- Type:
list of str
Examples
>>> model = OLS(fit_intercept=True, p_value=True, preds_loo=True) >>> model.fit(X, y) >>> y_pred = model.predict(X) >>> active = model.get_active_variables() >>> formula = model.get_fitted_function()
- class uniPairs.estimator.Stage1Results[source]¶
Bases:
TypedDict- intercepts: ndarray¶
- loo_preds: ndarray¶
- slopes: ndarray¶
- class uniPairs.estimator.UniLasso(lmda_path: ndarray | None = None, fit_intercept: bool = True, vars_names: List[str] | None = None, family_spec: FamilySpec | None = None)[source]¶
Bases:
BaseEstimatorUniLasso regression using adelie.grpnet for Gaussian, Binomial (logistic), or Cox proportional hazards models.
- Parameters:
lmda_path (ndarray of shape (n_lmdas,), optional) – Regularization path. If None, it is generated automatically using the leave-one-out predictions from stage 1.
fit_intercept (bool, default=True) – Include an intercept term in the non-negative Lasso stage. This is ignored for Cox models.
vars_names (list of str, optional) – Feature names. If None, automatically generated as
["X0", "X1", ..., "Xp"].family_spec (dict, optional) – Must contain key
"family"with one of:{"gaussian", "binomial", "cox"}.
- uni_slopes\_
Slopes from the univariate models.
- Type:
ndarray of shape (n_features,)
- uni_intercepts\_
Intercepts from the univariate models.
- Type:
ndarray of shape (n_features,)
- loo_preds\_
Leave-one-out predictions from the univariate models.
- Type:
ndarray of shape (n_samples, n_features)
- slopes\_
Final UniLasso slopes.
- Type:
ndarray of shape (n_lmdas, n_features)
- intercept\_
Final UniLasso intercepts.
- Type:
ndarray of shape (n_lmdas, 1)
- vars_names\_
Variable names used in the model.
- Type:
list of str
- lmda_path\_
Lambda values used in stage 2.
- Type:
ndarray of shape (n_lmdas,)
Examples
>>> model = UniLasso(family_spec={"family": "gaussian"}) >>> model.fit(X, y) >>> y_hat = model.predict(X) >>> active = model.get_active_variables(lmda=0.1) >>> model.get_fitted_function()
- uniPairs.estimator.cv(base: UniLasso | Lasso, X: ndarray, y: ndarray, n_folds: int, lmda_path: ndarray | None = None, plot_cv_curve: bool = False, cv1se: bool = False, seed: int = 305, save_plots: str | None = None, offset: ndarray | None = None) CVResult[source]¶
Cross-validation for Lasso or UniLasso models.
If
lmda_pathis not provided:For
UniLasso: the path is generated from the leave-one-out predictions of the univariate models.For
Lasso: the path is generated from the design matrixX.
- Parameters:
base (UniLasso or Lasso instance) – The model to cross-validate. Must implement
fitandpredictand contain the attributefamily_.X (ndarray of shape (n_samples, n_features)) – Design matrix.
y (ndarray) – Response vector for Gaussian/Binomial, or array of shape
(n_samples, 2)for Cox models, containing(time, status).n_folds (int) – Number of cross-validation folds.
lmda_path (ndarray of shape (n_lmdas,), optional) – Regularization path. If None, it is generated automatically.
plot_cv_curve (bool, default=False) – If True, plots the cross-validation R2 curve against
-log(lambda)and annotates model size along the curve.cv1se (bool, default=False) – If True, selects the largest lambda within one standard error of the minimum validation error (1-SE rule). Otherwise, selects the minimizer.
seed (int, default=305) – Random seed for fold shuffling.
save_plots (str, optional) – If provided, the CV plot is saved to this file path.
offset (ndarray, optional) – Optional observation specific offset added during fitting and prediction.
- Returns:
cv_result – A dictionary with the following fields:
cv_errorsndarray of shape (n_folds, n_lmdas)Validation losses for each fold and lambda.
lmda_pathndarray of shape (n_lmdas,)The lambda path used.
prevalidated_predsndarray of shape (n_samples,)Out-of-fold predictions at the selected lambda.
best_lmdafloatSelected value of lambda.
n_foldsintNumber of folds used.
active_setlist of strNames of active variables at the selected lambda.
- Return type:
dict
Examples
>>> model = UniLasso(family_spec={"family": "gaussian"}) >>> results = cv(model, X, y, n_folds=5) >>> results["best_lmda"] 0.031 >>> results["active_set"] ['X2', 'X7'] >>> y_hat = model.predict(X)
- uniPairs.estimator.generate_lmda_path(X: ndarray, y: ndarray, family: Literal['gaussian', 'binomial', 'cox'] = 'gaussian', n_lmdas: int = 100, lmda_min_ratio: float | None = None, fit_intercept: bool = True) ndarray[source]¶
Generate a sequence of lambda values for regularized GLM fitting.
This function computes a decreasing path of penalty values for use in Lasso-type models. For Gaussian families, the lambda path is computed analytically from the KKT conditions. For Binomial and Cox, the path is obtained by calling
ad.cv_grpnet.- Parameters:
X (ndarray of shape (n_samples, n_features)) – Design matrix.
y (ndarray) – Response vector. For
family='cox',ymust be of shape(n_samples, 2)where the columns are(time, status).family ({'gaussian', 'binomial', 'cox'}, default='gaussian') – Family of the GLM used to determine the lambda path.
n_lmdas (int, default=100) – Number of lambda values to generate.
lmda_min_ratio (float, optional) – Ratio of the smallest lambda to the largest lambda. If
None: -1e-4is used whenn_samples > n_features, -1e-2otherwise.fit_intercept (bool, default=True) – Whether to include an intercept term (ignored for the Cox family).
- Returns:
lmda_path – A decreasing sequence of lambda values, on a log-scale.
- Return type:
ndarray of shape (n_lmdas,)
Examples
>>> lmda_path = generate_lmda_path(X, y, family='gaussian', n_lmdas=50) >>> lmda_path.shape (50,)