Internal¶

This page documents lower-level classes, utilities, and internal components of uniPairs.estimator.

class uniPairs.estimator.CVResult[source]¶

Bases: TypedDict

active_set: List[str]¶

best_lmda: float¶

cv_errors: ndarray¶

lmda_path: ndarray¶

n_folds: int¶

prevalidated_preds: ndarray¶

class uniPairs.estimator.FamilySpec[source]¶

Bases: TypedDict

family: Literal['gaussian', 'binomial', 'cox']¶

class uniPairs.estimator.GLM(fit_intercept: bool = True, vars_names: List[str] | None = None, family_spec: FamilySpec | None = None, p_value: bool = False, preds_loo: bool = False)[source]¶

Bases: BaseEstimator

Generalized Linear Model (GLM) wrapper using adelie.grpnet for Gaussian, Binomial (logistic), and Cox proportional hazards regression.

Supports optional intercept, leave-one-out (LOO) predictions, and likelihood ratio p-values.

Parameters:

fit_intercept (bool, default=True) – Whether to include an intercept term. For Cox regression, the intercept is always excluded.
vars_names (list of str, optional) – Names of predictor variables. If None, names are generated as ["X0", "X1", ..., "Xp"].
family_spec (dict, optional) –
Dictionary containing the model family. Must contain key "family" with one of {"gaussian", "binomial", "cox"}.

Examples: {"family": "binomial"}
p_value (bool, default=False) – If True, computes a two-sided p-value via a likelihood ratio test comparing the full model vs. the model without the last coefficient.
preds_loo (bool, default=False) – If True, computes leave-one-out linear predictors following Rad & Maleki (2020).

slopes\_

Estimated regression coefficients.

Type:: ndarray of shape (n_features,)

intercept\_

Estimated intercept. Zero for Cox models.

Type:: float

preds_loo\_

Leave-one-out predictions if preds_loo=True.

Type:: ndarray of shape (n_samples,), optional

p_value\_

Two-sided p-value for the last predictor coefficient.

Type:: float, optional

vars_names\_

Names of variables used in the model.

Type:: list of str

offset\_

Optional observation-specific offsets supplied to the GLM.

Type:: ndarray or None

Examples

>>> glm = GLM(family_spec={"family": "binomial"}, preds_loo=True)
>>> glm.fit(X, y)
>>> eta = glm.predict(X)               # linear predictor
>>> p = glm.predict(X, response_scale=True)  # inverse-link
>>> active = glm.get_active_variables()
>>> formula = glm.get_fitted_function()

fit(X: ndarray, y: ndarray, offset: ndarray | None = None) → None[source]¶

get_active_variables(tolerance: float = 1e-10) → List[str][source]¶

get_fitted_function() → str[source]¶

predict(X: ndarray, response_scale: bool = False, offset: ndarray | None = None) → ndarray[source]¶

set_vars_names(vars_names: List[str]) → None[source]¶

class uniPairs.estimator.Lasso(lmda_path: ndarray | None = None, fit_intercept: bool = True, vars_names: List[str] | None = None, family_spec: FamilySpec | None = None)[source]¶

Bases: BaseEstimator

Lasso regression using adelie.grpnet for Gaussian, Binomial (logistic), or Cox proportional hazards models.

The model standardizes inputs, fits a regularization path over lmda_path_ (lambda values), and returns intercepts and slopes on the original data scale.

Parameters:

lmda_path (ndarray of shape (n_lambdas,), optional) – Regularization path to use. If None, it is generated automatically via generate_lmda_path().
fit_intercept (bool, default=True) – Whether to include an intercept term. For Cox models, the intercept is always excluded.
vars_names (list of str, optional) – Names of variables used in the model. If None, they are generated automatically as ["X0", "X1", ..., "Xp"].
family_spec (dict, optional) –
Dictionary containing the response family. Must include key "family" with one of: {"gaussian", "binomial", "cox"}.

Examples
```
>>> {"family": "binomial"}
```

lmda_path\_

Lambda values used during fitting.

Type:: ndarray of shape (n_lmdas,)

slopes\_

Regression coefficients for each lambda.

Type:: ndarray of shape (n_lmdas, n_features)

intercept\_

Intercept for each lambda.

Type:: ndarray of shape (n_lmdas, 1)

vars_names\_

Names of variables in the model.

Type:: list of str

X_mean\_

Feature means used for standardization.

Type:: ndarray of shape (n_features,)

X_std\_

Feature standard deviations used for standardization.

Type:: ndarray of shape (n_features,)

offset\_

Observation-specific offset passed into the GLM, if provided.

Type:: ndarray or None

Examples

>>> model = Lasso(family_spec={"family": "binomial"})
>>> model.fit(X, y)
>>> y_pred = model.predict(X, response_scale=True)
>>> active = model.get_active_variables(lmda=0.1)
>>> formula = model.get_fitted_function()

fit(X: ndarray, y: ndarray, offset: ndarray | None = None, tolerance: float = 1e-10) → None[source]¶

fit_binomial(X_std: ndarray, y: ndarray) → None[source]¶

fit_cox(X_std: ndarray, y: ndarray) → None[source]¶

fit_gaussian(X_std: ndarray, y: ndarray) → None[source]¶

get_active_variables(lmda: float | None = None, tolerance: float = 1e-10) → List[str][source]¶

get_fitted_function(lmda: float | None = None, tolerance: float = 1e-10) → str[source]¶

predict(X: ndarray, response_scale: bool = False, offset: ndarray | None = None) → ndarray[source]¶

set_lmda_path(lmda_path: ndarray) → None[source]¶

set_vars_names(vars_names: List[str]) → None[source]¶

class uniPairs.estimator.OLS(fit_intercept: bool = True, vars_names: List[str] | None = None, p_value: bool = False, preds_loo: bool = False)[source]¶

Bases: BaseEstimator

Ordinary Least Squares (OLS) regression with optional intercept, leave-one-out predictions, and p-value estimation.

Parameters:

fit_intercept (bool, default=True) – If True, an intercept term is included in the model.
vars_names (list of str, optional) – Names of the variables. If None, names are automatically generated as ["X0", "X1", ..., "Xp"].
p_value (bool, default=False) – If True, a two-sided p-value is computed for the last coefficient after fitting.
preds_loo (bool, default=False) – If True, leave-one-out predictions are computed after fitting.

slopes\_

Estimated regression coefficients (excluding intercept).

Type:: ndarray of shape (n_features,)

intercept\_

Estimated intercept term (if fit_intercept=True).

Type:: float

preds_loo\_

Leave-one-out predictions. Present only if preds_loo=True.

Type:: ndarray of shape (n_samples,), optional

p_value\_

Two-sided p-value for the last coefficient. Present only if p_value=True.

Type:: ndarray of shape (1,), optional

vars_names\_

Names of the variables used in the model.

Type:: list of str

Examples

>>> model = OLS(fit_intercept=True, p_value=True, preds_loo=True)
>>> model.fit(X, y)
>>> y_pred = model.predict(X)
>>> active = model.get_active_variables()
>>> formula = model.get_fitted_function()

fit(X: ndarray, y: ndarray) → None[source]¶

get_active_variables(tolerance: float = 1e-10) → List[str][source]¶

get_fitted_function() → str[source]¶

predict(X: ndarray) → ndarray[source]¶

set_vars_names(vars_names: List[str]) → None[source]¶

class uniPairs.estimator.Stage1Results[source]¶

Bases: TypedDict

intercepts: ndarray¶

loo_preds: ndarray¶

slopes: ndarray¶

class uniPairs.estimator.UniLasso(lmda_path: ndarray | None = None, fit_intercept: bool = True, vars_names: List[str] | None = None, family_spec: FamilySpec | None = None)[source]¶

Bases: BaseEstimator

UniLasso regression using adelie.grpnet for Gaussian, Binomial (logistic), or Cox proportional hazards models.

Parameters:

lmda_path (ndarray of shape (n_lmdas,), optional) – Regularization path. If None, it is generated automatically using the leave-one-out predictions from stage 1.
fit_intercept (bool, default=True) – Include an intercept term in the non-negative Lasso stage. This is ignored for Cox models.
vars_names (list of str, optional) – Feature names. If None, automatically generated as ["X0", "X1", ..., "Xp"].
family_spec (dict, optional) – Must contain key "family" with one of: {"gaussian", "binomial", "cox"}.

uni_slopes\_

Slopes from the univariate models.

Type:: ndarray of shape (n_features,)

uni_intercepts\_

Intercepts from the univariate models.

Type:: ndarray of shape (n_features,)

loo_preds\_

Leave-one-out predictions from the univariate models.

Type:: ndarray of shape (n_samples, n_features)

slopes\_

Final UniLasso slopes.

Type:: ndarray of shape (n_lmdas, n_features)

intercept\_

Final UniLasso intercepts.

Type:: ndarray of shape (n_lmdas, 1)

vars_names\_

Variable names used in the model.

Type:: list of str

lmda_path\_

Lambda values used in stage 2.

Type:: ndarray of shape (n_lmdas,)

Examples

>>> model = UniLasso(family_spec={"family": "gaussian"})
>>> model.fit(X, y)
>>> y_hat = model.predict(X)
>>> active = model.get_active_variables(lmda=0.1)
>>> model.get_fitted_function()

fit(X: ndarray, y: ndarray, offset: ndarray | None = None) → None[source]¶

get_active_variables(lmda: float | None = None, tolerance: float = 1e-10) → List[str][source]¶

get_fitted_function(lmda: float | None = None, tolerance: float = 1e-10) → str[source]¶

predict(X: ndarray, response_scale: bool = False, offset: ndarray | None = None) → ndarray[source]¶

set_lmda_path(lmda_path: ndarray) → None[source]¶

set_vars_names(vars_names: List[str]) → None[source]¶

uniPairs.estimator.cv(base: UniLasso | Lasso, X: ndarray, y: ndarray, n_folds: int, lmda_path: ndarray | None = None, plot_cv_curve: bool = False, cv1se: bool = False, seed: int = 305, save_plots: str | None = None, offset: ndarray | None = None) → CVResult[source]¶

Cross-validation for Lasso or UniLasso models.

If lmda_path is not provided:

For UniLasso: the path is generated from the leave-one-out predictions of the univariate models.
For Lasso: the path is generated from the design matrix X.

Parameters:

base (UniLasso or Lasso instance) – The model to cross-validate. Must implement fit and predict and contain the attribute family_.
X (ndarray of shape (n_samples, n_features)) – Design matrix.
y (ndarray) – Response vector for Gaussian/Binomial, or array of shape (n_samples, 2) for Cox models, containing (time, status).
n_folds (int) – Number of cross-validation folds.
lmda_path (ndarray of shape (n_lmdas,), optional) – Regularization path. If None, it is generated automatically.
plot_cv_curve (bool, default=False) – If True, plots the cross-validation R2 curve against -log(lambda) and annotates model size along the curve.
cv1se (bool, default=False) – If True, selects the largest lambda within one standard error of the minimum validation error (1-SE rule). Otherwise, selects the minimizer.
seed (int, default=305) – Random seed for fold shuffling.
save_plots (str, optional) – If provided, the CV plot is saved to this file path.
offset (ndarray, optional) – Optional observation specific offset added during fitting and prediction.

Returns:

cv_result – A dictionary with the following fields:

cv_errorsndarray of shape (n_folds, n_lmdas)
Validation losses for each fold and lambda.
lmda_pathndarray of shape (n_lmdas,)
The lambda path used.
prevalidated_predsndarray of shape (n_samples,)
Out-of-fold predictions at the selected lambda.
best_lmdafloat
Selected value of lambda.
n_foldsint
Number of folds used.
active_setlist of str
Names of active variables at the selected lambda.

Return type:

dict

Examples

>>> model = UniLasso(family_spec={"family": "gaussian"})
>>> results = cv(model, X, y, n_folds=5)
>>> results["best_lmda"]
0.031
>>> results["active_set"]
['X2', 'X7']
>>> y_hat = model.predict(X)

uniPairs.estimator.generate_lmda_path(X: ndarray, y: ndarray, family: Literal['gaussian', 'binomial', 'cox'] = 'gaussian', n_lmdas: int = 100, lmda_min_ratio: float | None = None, fit_intercept: bool = True) → ndarray[source]¶

Generate a sequence of lambda values for regularized GLM fitting.

This function computes a decreasing path of penalty values for use in Lasso-type models. For Gaussian families, the lambda path is computed analytically from the KKT conditions. For Binomial and Cox, the path is obtained by calling ad.cv_grpnet.

Parameters:

X (ndarray of shape (n_samples, n_features)) – Design matrix.
y (ndarray) – Response vector. For family='cox', y must be of shape (n_samples, 2) where the columns are (time, status).
family ({'gaussian', 'binomial', 'cox'}, default='gaussian') – Family of the GLM used to determine the lambda path.
n_lmdas (int, default=100) – Number of lambda values to generate.
lmda_min_ratio (float, optional) – Ratio of the smallest lambda to the largest lambda. If None: - 1e-4 is used when n_samples > n_features, - 1e-2 otherwise.
fit_intercept (bool, default=True) – Whether to include an intercept term (ignored for the Cox family).

Returns:

lmda_path – A decreasing sequence of lambda values, on a log-scale.

Return type:

ndarray of shape (n_lmdas,)

Examples

>>> lmda_path = generate_lmda_path(X, y, family='gaussian', n_lmdas=50)
>>> lmda_path.shape
(50,)