API Reference¶

class uniPairs.estimator.BaseInteractionModel(interaction_candidates: List[int] | None = None, interaction_pairs: List[Tuple[int, int]] | None = None, vars_names: List[str] | None = None, zero_cutoff: float = 1e-20, interactions_threshold: float | None = None, save_plots: str | None = None, family_spec: FamilySpec | None = None, plot_cv_curve: bool = False, cv1se: bool = False, verbose: bool = False)[source]¶

Bases: BaseEstimator, ABC

Base class for interaction discovery models.

This class implements the common logic required for fitting models with pairwise interactions. It provides:

input standardization,
selection of interaction candidates or explicit interaction pairs,
optional hierarchy constraints (weak, strong),
fitting of triplet regressions for screening interactions,
p-value based interaction selection,
plotting utilities.

Actual estimation of main effects and interactions must be implemented in subclasses via fit, predict, get_active_variables and get_fitted_function.

Parameters:

interaction_candidates (list of int, optional) – Indices of features that are allowed to form interactions. If provided, all pairs (j, k) are generated such that j is in this list and k ranges over all other features. Mutually exclusive with interaction_pairs.
interaction_pairs (list of tuple(int, int), optional) – Explicit list of feature index pairs to consider as interactions. Mutually exclusive with interaction_candidates.
vars_names (list of str, optional) – Variable names. If None, names are generated as ["X0", ..., "Xp"].
zero_cutoff (float, default=1e-20) – Numerical tolerance used to determine whether a p-value is considered zero when scanning interactions.
interactions_threshold (float, optional) – Absolute threshold for selecting interaction pairs. If None, the data-adaptive largest log-gap rule is used.
save_plots (str, optional) – Directory where plots will be saved. If None, figures are not saved.
family_spec (dict, optional) – Must contain a "family" key. Supports: "gaussian", "binomial", "cox".
plot_cv_curve (bool, default=False) – Whether to plot cross-validation curves when applicable.
cv1se (bool, default=False) – Whether to apply the 1-SE rule when choosing the optimal lambda.
verbose (bool, default=False) – If True, prints progress and diagnostic information during fitting.

family\_

Lowercase model family name: "gaussian", "binomial" or "cox".

Type:: str

interaction_candidates\_

Standardized version of interaction_candidates.

Type:: ndarray of shape (m,), optional

interaction_pairs\_

Standardized version of interaction_pairs.

Type:: ndarray of shape (k, 2), optional

triplet_regressors\_

Dictionary mapping feature pairs (j, k) to triplet regressors.

Type:: dict

allowed_pairs\_

Interaction pairs allowed after applying hierarchy constraints and removing unstable pairs.

Type:: list of tuple(int, int)

selected_pairs\_

Interaction pairs selected based on p-values.

Type:: ndarray of shape (r, 2)

instable_pairs\_

Pairs excluded because the triplet model was numerically unstable (rank deficiency, large condition number or leverage).

Type:: list of tuple(int, int)

main_effects_names\_

Names of main effect variables.

Type:: list of str

interactions_names\_

Names of interactions "Xj*Xk".

Type:: list of str

main_effects_active_set\_

Active main effects when hierarchy is imposed.

Type:: list of int, optional

pvals\_

Mapping from (name_j, name_k) to p-values for screening interactions. Only for allowed pairs.

Type:: dict

Notes

Triplet regression. For each pair of features (j, k), a local model is fit on:

\[[1, X_j, X_k, X_j X_k]\]

to assess whether an interaction is statistically significant.

Hierarchy constraints. If hierarchy = "strong" then only interactions with both variables active are allowed. If "weak", at least one must be active.

Unstable pairs. Models with rank deficiency, large condition number or unit leverage are flagged as unstable and removed before selection.

Interaction selection. If interactions_threshold is given:

\[p_{jk} \le \text{threshold}\]

Otherwise, the data-adaptive largest log-gap rule determines the cutoff.

Subclasses must implement:

fit(X, y, **kwargs),
predict(X),
get_active_variables(),
get_fitted_function().

abstractmethod fit(X: ndarray, y: ndarray, **kwargs: Any) → None[source]¶

abstractmethod get_active_variables() → List[str][source]¶

abstractmethod get_fitted_function(tolerance: float = 1e-10) → str[source]¶

abstractmethod predict(X: ndarray) → ndarray[source]¶

class uniPairs.estimator.UniPairs(two_stage: bool = True, **kwargs: Any)[source]¶

Bases: BaseEstimator

Unified wrapper for UniPairs interaction models.

This class provides a high-level interface for fitting either the one-stage or two-stage UniPairs procedure. It delegates all computations to an internal model (UniPairsOneStage or UniPairsTwoStage).

Parameters:

two_stage (bool, default=True) –
Determines which UniPairs procedure to use:
- True: use the two-stage UniPairs estimator
- False: use the one-stage UniPairs estimator
**kwargs (dict) – Additional keyword arguments forwarded directly to the selected internal model.

model\_

The underlying fitted UniPairs estimator. Set after calling fit.

Type:: UniPairsOneStage or UniPairsTwoStage

two_stage¶

Whether the wrapper is using the two-stage or one-stage method.

Type:: bool

kwargs¶

Saved keyword arguments passed at construction time.

Type:: dict

version¶

Version string inherited from the internal estimator (read-only). Returns None if the model has not yet been fitted.

Type:: str or None

Notes

This class does not implement modeling logic itself. Instead:

During fit:

If two_stage=True, an instance of UniPairsTwoStage is created and
fitted.
If two_stage=False, an instance of UniPairsOneStage is created and
fitted.

2. All subsequent calls to predict, get_active_variables, get_fitted_function are delegated to the fitted internal model.

Examples

>>> model = UniPairs(two_stage=True, n_folds=5)
>>> model.fit(X, y)
>>> y_pred = model.predict(X)
>>> model.get_active_variables()
['X0', 'X2', 'X0*X2']

Switching to one-stage:

>>> model = UniPairs(two_stage=False, n_folds=5)
>>> model.fit(X, y)
>>> model.get_fitted_function()
'1.203 + 0.44*X0 + 0.08*X0*X3'

fit(X: ndarray, y: ndarray, **fit_kwargs: Any) → None[source]¶

get_active_variables(*args: Any, **kwargs: Any) → List[str][source]¶

get_fitted_function(*args: Any, **kwargs: Any) → str[source]¶

predict(X: ndarray, **kwargs: Any) → ndarray[source]¶

class uniPairs.estimator.UniPairsOneStage(lmda_path: ndarray | None = None, n_folds: int = 10, **kwargs: Any)[source]¶

Bases: BaseInteractionModel

One-stage UniPairs model for estimation of main effects and interactions.

Parameters:

lmda_path (ndarray of shape (n_lmdas,), optional) – Lambda path for the joint UniLasso fit. If None, it is generated automatically.
n_folds (int, default=10) – Number of folds for cross-validation when selecting the regularization parameter.
**kwargs (dict) – Additional keyword arguments passed to BaseInteractionModel such as hierarchy, plotting options, and verbosity.

regressor\_

Fitted UniLasso model containing both main-effect and interaction terms.

Type:: UniLasso

lmda_path\_

Lambda path used during CV.

Type:: ndarray of shape (n_lmdas,)

cv_errors\_

Cross-validation errors over the lambda path.

Type:: ndarray of shape (n_folds, n_lmdas)

selected_pairs\_

Interaction index pairs retained after triplet screening.

Type:: ndarray of shape (r, 2)

main_effects_active_set\_

Active main effects identified after rescaling.

Type:: ndarray of indices

interactions_active_set\_

Active interaction effects identified after rescaling.

Type:: ndarray of indices

Notes

This estimator fits a linear model in both main effects and interactions in one-stage.

Triplet regressions are fit for every allowed pair (j, k) to obtain p-values for interaction terms. Unstable models (rank deficiency, large condition number, or unit leverage) are discarded.

Interaction candidates are selected either via a user-defined p-value threshold or the largest log-gap rule.

After screening, a single UniLasso model is fitted on the expanded design combining all main effects and the selected interactions.

Cross-validation over a path of n_lmdas lambda values is performed once, using n_folds folds.

All coefficients are finally transformed back to the original scale of the input variables. Active sets for both main effects and interactions are extracted from the refitted coefficients.

During prediction, interaction features are generated only for the selected pairs, stacked alongside main effects, and passed through the model. If a non-Gaussian family is used, response_scale=True applies the inverse link.

Examples

>>> model = UniPairsOneStage(
...     n_folds=5,
... )
>>> model.fit(X, y)
>>> y_pred = model.predict(X)
>>> model.get_active_variables()
['X0', 'X1*X4', 'X3']

fit(X: ndarray, y: ndarray, tolerance: float = 1e-10) → None[source]¶

get_active_variables() → List[str][source]¶

get_fitted_function(tolerance: float = 1e-10) → str[source]¶

predict(X: ndarray, response_scale: bool = False) → ndarray[source]¶

class uniPairs.estimator.UniPairsTwoStage(hierarchy: Literal['weak', 'strong'] | None = None, lmda_path_main_effects: ndarray | None = None, lmda_path_interactions: ndarray | None = None, n_folds_main_effects: int = 10, n_folds_interactions: int = 10, **kwargs)[source]¶

Bases: BaseInteractionModel

Two-stage UniPairs model for estimation of main effects and interactions.

Parameters:

hierarchy ({"weak", "strong", None}, optional) – Type of hierarchy constraint enforced between main effects and interactions. If None, no hierarchy is used.
lmda_path_main_effects (ndarray of shape (n_lmdas,), optional) – Lambda path for the UniLasso in stage 1. If None, it is generated automatically.
lmda_path_interactions (ndarray of shape (n_lmdas,), optional) – Lambda path for the Lasso in stage 2. If None, it is generated automatically.
n_folds_main_effects (int, default=10) – Number of folds for cross-validation of main effects.
n_folds_interactions (int, default=10) – Number of folds for cross-validation of interactions.
**kwargs (dict) – Additional keyword arguments passed to BaseInteractionModel.

main_effects_regressor\_

Fitted model for main effects after stage 1.

Type:: UniLasso

interactions_regressor\_

Fitted model for interaction terms after stage 2.

Type:: Lasso

lmda_path_main_effects\_

Lambda path used in stage 1.

Type:: ndarray of shape (n_lmdas,)

lmda_path_interactions\_

Lambda path used in stage 2.

Type:: ndarray of shape (n_lmdas,)

stage1_cv_errors\_

Cross-validation errors for main effects.

Type:: ndarray of shape (n_folds_main_effects, n_lmdas)

stage2_cv_errors\_

Cross-validation errors for interactions.

Type:: ndarray of shape (n_folds_interactions, n_lmdas)

main_effects_active_set\_

Set of active main effects.

Type:: ndarray of indices

interactions_active_set\_

Set of active interactions.

Type:: ndarray of indices

selected_pairs\_

Interaction index pairs selected after triplet screening.

Type:: ndarray of shape (r, 2)

Notes

This estimator fits a linear model of the form using a two-stage procedure.

Stage 1 — Main effects:

A UniLasso regression is used to select and estimate main effects. The regularization parameter is selected via K-fold cross-validation using n_folds_main_effects. The path of lambda values (n_lmdas long) may be user-specified or generated automatically.

Stage 2 — Interaction screening and refitting:

Triplet regressions are fit for every allowed pair (j, k) to obtain p-values for interaction terms. Unstable models (rank deficiency, large condition number, or unit leverage) are discarded.

Interaction candidates are selected either via a user-defined p-value threshold or the largest log-gap rule. A Lasso model is then fitted on the selected interaction features with cross-validation using n_folds_interactions and a separate n_lmdas lambda path.

Both stages apply hierarchy if specified:

hierarchy="strong": interactions allowed only if both main effects are active,
hierarchy="weak": allowed if at least one is active,
None: no hierarchy imposed.

Coefficients are finally converted back to the original scale of the input variables.

During prediction, interaction features are generated for only the selected pairs, and both components are added. If a non-Gaussian family is used, response_scale=True applies the inverse link.

Examples

>>> model = UniPairsTwoStage(
...     interaction_candidates=[0, 3, 5],
...     hierarchy="weak",
...     n_folds_main_effects=5,
...     n_folds_interactions=5,
... )
>>> model.fit(X, y)
>>> y_pred = model.predict(X)
>>> model.get_active_variables()
['X0', 'X3', 'X5', 'X0*X3']

fit(X: ndarray, y: ndarray, tolerance: float = 1e-10) → None[source]¶

get_active_variables() → List[str][source]¶

get_fitted_function(tolerance: float = 1e-10) → str[source]¶

predict(X: ndarray, response_scale: bool = False) → ndarray[source]¶