Internal

This page documents lower-level classes, utilities, and internal components of uniPairs.estimator.

class uniPairs.estimator.CVResult[source]

Bases: TypedDict

active_set: List[str]
best_lmda: float
cv_errors: ndarray
lmda_path: ndarray
n_folds: int
prevalidated_preds: ndarray
class uniPairs.estimator.FamilySpec[source]

Bases: TypedDict

family: Literal['gaussian', 'binomial', 'cox']
class uniPairs.estimator.GLM(fit_intercept: bool = True, vars_names: List[str] | None = None, family_spec: FamilySpec | None = None, p_value: bool = False, preds_loo: bool = False)[source]

Bases: BaseEstimator

Generalized Linear Model (GLM) wrapper using adelie.grpnet for Gaussian, Binomial (logistic), and Cox proportional hazards regression.

Supports optional intercept, leave-one-out (LOO) predictions, and likelihood ratio p-values.

Parameters:
  • fit_intercept (bool, default=True) – Whether to include an intercept term. For Cox regression, the intercept is always excluded.

  • vars_names (list of str, optional) – Names of predictor variables. If None, names are generated as ["X0", "X1", ..., "Xp"].

  • family_spec (dict, optional) –

    Dictionary containing the model family. Must contain key "family" with one of {"gaussian", "binomial", "cox"}.

    Examples: {"family": "binomial"}

  • p_value (bool, default=False) – If True, computes a two-sided p-value via a likelihood ratio test comparing the full model vs. the model without the last coefficient.

  • preds_loo (bool, default=False) – If True, computes leave-one-out linear predictors following Rad & Maleki (2020).

slopes\_

Estimated regression coefficients.

Type:

ndarray of shape (n_features,)

intercept\_

Estimated intercept. Zero for Cox models.

Type:

float

preds_loo\_

Leave-one-out predictions if preds_loo=True.

Type:

ndarray of shape (n_samples,), optional

p_value\_

Two-sided p-value for the last predictor coefficient.

Type:

float, optional

vars_names\_

Names of variables used in the model.

Type:

list of str

offset\_

Optional observation-specific offsets supplied to the GLM.

Type:

ndarray or None

Examples

>>> glm = GLM(family_spec={"family": "binomial"}, preds_loo=True)
>>> glm.fit(X, y)
>>> eta = glm.predict(X)               # linear predictor
>>> p = glm.predict(X, response_scale=True)  # inverse-link
>>> active = glm.get_active_variables()
>>> formula = glm.get_fitted_function()
fit(X: ndarray, y: ndarray, offset: ndarray | None = None) None[source]
get_active_variables(tolerance: float = 1e-10) List[str][source]
get_fitted_function() str[source]
predict(X: ndarray, response_scale: bool = False, offset: ndarray | None = None) ndarray[source]
set_vars_names(vars_names: List[str]) None[source]
class uniPairs.estimator.Lasso(lmda_path: ndarray | None = None, fit_intercept: bool = True, vars_names: List[str] | None = None, family_spec: FamilySpec | None = None)[source]

Bases: BaseEstimator

Lasso regression using adelie.grpnet for Gaussian, Binomial (logistic), or Cox proportional hazards models.

The model standardizes inputs, fits a regularization path over lmda_path_ (lambda values), and returns intercepts and slopes on the original data scale.

Parameters:
  • lmda_path (ndarray of shape (n_lambdas,), optional) – Regularization path to use. If None, it is generated automatically via generate_lmda_path().

  • fit_intercept (bool, default=True) – Whether to include an intercept term. For Cox models, the intercept is always excluded.

  • vars_names (list of str, optional) – Names of variables used in the model. If None, they are generated automatically as ["X0", "X1", ..., "Xp"].

  • family_spec (dict, optional) –

    Dictionary containing the response family. Must include key "family" with one of: {"gaussian", "binomial", "cox"}.

    Examples

    >>> {"family": "binomial"}
    

lmda_path\_

Lambda values used during fitting.

Type:

ndarray of shape (n_lmdas,)

slopes\_

Regression coefficients for each lambda.

Type:

ndarray of shape (n_lmdas, n_features)

intercept\_

Intercept for each lambda.

Type:

ndarray of shape (n_lmdas, 1)

vars_names\_

Names of variables in the model.

Type:

list of str

X_mean\_

Feature means used for standardization.

Type:

ndarray of shape (n_features,)

X_std\_

Feature standard deviations used for standardization.

Type:

ndarray of shape (n_features,)

offset\_

Observation-specific offset passed into the GLM, if provided.

Type:

ndarray or None

Examples

>>> model = Lasso(family_spec={"family": "binomial"})
>>> model.fit(X, y)
>>> y_pred = model.predict(X, response_scale=True)
>>> active = model.get_active_variables(lmda=0.1)
>>> formula = model.get_fitted_function()
fit(X: ndarray, y: ndarray, offset: ndarray | None = None, tolerance: float = 1e-10) None[source]
fit_binomial(X_std: ndarray, y: ndarray) None[source]
fit_cox(X_std: ndarray, y: ndarray) None[source]
fit_gaussian(X_std: ndarray, y: ndarray) None[source]
get_active_variables(lmda: float | None = None, tolerance: float = 1e-10) List[str][source]
get_fitted_function(lmda: float | None = None, tolerance: float = 1e-10) str[source]
predict(X: ndarray, response_scale: bool = False, offset: ndarray | None = None) ndarray[source]
set_lmda_path(lmda_path: ndarray) None[source]
set_vars_names(vars_names: List[str]) None[source]
class uniPairs.estimator.OLS(fit_intercept: bool = True, vars_names: List[str] | None = None, p_value: bool = False, preds_loo: bool = False)[source]

Bases: BaseEstimator

Ordinary Least Squares (OLS) regression with optional intercept, leave-one-out predictions, and p-value estimation.

Parameters:
  • fit_intercept (bool, default=True) – If True, an intercept term is included in the model.

  • vars_names (list of str, optional) – Names of the variables. If None, names are automatically generated as ["X0", "X1", ..., "Xp"].

  • p_value (bool, default=False) – If True, a two-sided p-value is computed for the last coefficient after fitting.

  • preds_loo (bool, default=False) – If True, leave-one-out predictions are computed after fitting.

slopes\_

Estimated regression coefficients (excluding intercept).

Type:

ndarray of shape (n_features,)

intercept\_

Estimated intercept term (if fit_intercept=True).

Type:

float

preds_loo\_

Leave-one-out predictions. Present only if preds_loo=True.

Type:

ndarray of shape (n_samples,), optional

p_value\_

Two-sided p-value for the last coefficient. Present only if p_value=True.

Type:

ndarray of shape (1,), optional

vars_names\_

Names of the variables used in the model.

Type:

list of str

Examples

>>> model = OLS(fit_intercept=True, p_value=True, preds_loo=True)
>>> model.fit(X, y)
>>> y_pred = model.predict(X)
>>> active = model.get_active_variables()
>>> formula = model.get_fitted_function()
fit(X: ndarray, y: ndarray) None[source]
get_active_variables(tolerance: float = 1e-10) List[str][source]
get_fitted_function() str[source]
predict(X: ndarray) ndarray[source]
set_vars_names(vars_names: List[str]) None[source]
class uniPairs.estimator.Stage1Results[source]

Bases: TypedDict

intercepts: ndarray
loo_preds: ndarray
slopes: ndarray
class uniPairs.estimator.UniLasso(lmda_path: ndarray | None = None, fit_intercept: bool = True, vars_names: List[str] | None = None, family_spec: FamilySpec | None = None)[source]

Bases: BaseEstimator

UniLasso regression using adelie.grpnet for Gaussian, Binomial (logistic), or Cox proportional hazards models.

Parameters:
  • lmda_path (ndarray of shape (n_lmdas,), optional) – Regularization path. If None, it is generated automatically using the leave-one-out predictions from stage 1.

  • fit_intercept (bool, default=True) – Include an intercept term in the non-negative Lasso stage. This is ignored for Cox models.

  • vars_names (list of str, optional) – Feature names. If None, automatically generated as ["X0", "X1", ..., "Xp"].

  • family_spec (dict, optional) – Must contain key "family" with one of: {"gaussian", "binomial", "cox"}.

uni_slopes\_

Slopes from the univariate models.

Type:

ndarray of shape (n_features,)

uni_intercepts\_

Intercepts from the univariate models.

Type:

ndarray of shape (n_features,)

loo_preds\_

Leave-one-out predictions from the univariate models.

Type:

ndarray of shape (n_samples, n_features)

slopes\_

Final UniLasso slopes.

Type:

ndarray of shape (n_lmdas, n_features)

intercept\_

Final UniLasso intercepts.

Type:

ndarray of shape (n_lmdas, 1)

vars_names\_

Variable names used in the model.

Type:

list of str

lmda_path\_

Lambda values used in stage 2.

Type:

ndarray of shape (n_lmdas,)

Examples

>>> model = UniLasso(family_spec={"family": "gaussian"})
>>> model.fit(X, y)
>>> y_hat = model.predict(X)
>>> active = model.get_active_variables(lmda=0.1)
>>> model.get_fitted_function()
fit(X: ndarray, y: ndarray, offset: ndarray | None = None) None[source]
get_active_variables(lmda: float | None = None, tolerance: float = 1e-10) List[str][source]
get_fitted_function(lmda: float | None = None, tolerance: float = 1e-10) str[source]
predict(X: ndarray, response_scale: bool = False, offset: ndarray | None = None) ndarray[source]
set_lmda_path(lmda_path: ndarray) None[source]
set_vars_names(vars_names: List[str]) None[source]
uniPairs.estimator.cv(base: UniLasso | Lasso, X: ndarray, y: ndarray, n_folds: int, lmda_path: ndarray | None = None, plot_cv_curve: bool = False, cv1se: bool = False, seed: int = 305, save_plots: str | None = None, offset: ndarray | None = None) CVResult[source]

Cross-validation for Lasso or UniLasso models.

If lmda_path is not provided:

  • For UniLasso: the path is generated from the leave-one-out predictions of the univariate models.

  • For Lasso: the path is generated from the design matrix X.

Parameters:
  • base (UniLasso or Lasso instance) – The model to cross-validate. Must implement fit and predict and contain the attribute family_.

  • X (ndarray of shape (n_samples, n_features)) – Design matrix.

  • y (ndarray) – Response vector for Gaussian/Binomial, or array of shape (n_samples, 2) for Cox models, containing (time, status).

  • n_folds (int) – Number of cross-validation folds.

  • lmda_path (ndarray of shape (n_lmdas,), optional) – Regularization path. If None, it is generated automatically.

  • plot_cv_curve (bool, default=False) – If True, plots the cross-validation R2 curve against -log(lambda) and annotates model size along the curve.

  • cv1se (bool, default=False) – If True, selects the largest lambda within one standard error of the minimum validation error (1-SE rule). Otherwise, selects the minimizer.

  • seed (int, default=305) – Random seed for fold shuffling.

  • save_plots (str, optional) – If provided, the CV plot is saved to this file path.

  • offset (ndarray, optional) – Optional observation specific offset added during fitting and prediction.

Returns:

cv_result – A dictionary with the following fields:

  • cv_errorsndarray of shape (n_folds, n_lmdas)

    Validation losses for each fold and lambda.

  • lmda_pathndarray of shape (n_lmdas,)

    The lambda path used.

  • prevalidated_predsndarray of shape (n_samples,)

    Out-of-fold predictions at the selected lambda.

  • best_lmdafloat

    Selected value of lambda.

  • n_foldsint

    Number of folds used.

  • active_setlist of str

    Names of active variables at the selected lambda.

Return type:

dict

Examples

>>> model = UniLasso(family_spec={"family": "gaussian"})
>>> results = cv(model, X, y, n_folds=5)
>>> results["best_lmda"]
0.031
>>> results["active_set"]
['X2', 'X7']
>>> y_hat = model.predict(X)
uniPairs.estimator.generate_lmda_path(X: ndarray, y: ndarray, family: Literal['gaussian', 'binomial', 'cox'] = 'gaussian', n_lmdas: int = 100, lmda_min_ratio: float | None = None, fit_intercept: bool = True) ndarray[source]

Generate a sequence of lambda values for regularized GLM fitting.

This function computes a decreasing path of penalty values for use in Lasso-type models. For Gaussian families, the lambda path is computed analytically from the KKT conditions. For Binomial and Cox, the path is obtained by calling ad.cv_grpnet.

Parameters:
  • X (ndarray of shape (n_samples, n_features)) – Design matrix.

  • y (ndarray) – Response vector. For family='cox', y must be of shape (n_samples, 2) where the columns are (time, status).

  • family ({'gaussian', 'binomial', 'cox'}, default='gaussian') – Family of the GLM used to determine the lambda path.

  • n_lmdas (int, default=100) – Number of lambda values to generate.

  • lmda_min_ratio (float, optional) – Ratio of the smallest lambda to the largest lambda. If None: - 1e-4 is used when n_samples > n_features, - 1e-2 otherwise.

  • fit_intercept (bool, default=True) – Whether to include an intercept term (ignored for the Cox family).

Returns:

lmda_path – A decreasing sequence of lambda values, on a log-scale.

Return type:

ndarray of shape (n_lmdas,)

Examples

>>> lmda_path = generate_lmda_path(X, y, family='gaussian', n_lmdas=50)
>>> lmda_path.shape
(50,)