API Reference¶
- class uniPairs.estimator.BaseInteractionModel(interaction_candidates: List[int] | None = None, interaction_pairs: List[Tuple[int, int]] | None = None, vars_names: List[str] | None = None, zero_cutoff: float = 1e-20, interactions_threshold: float | None = None, save_plots: str | None = None, family_spec: FamilySpec | None = None, plot_cv_curve: bool = False, cv1se: bool = False, verbose: bool = False)[source]¶
Bases:
BaseEstimator,ABCBase class for interaction discovery models.
This class implements the common logic required for fitting models with pairwise interactions. It provides:
input standardization,
selection of interaction candidates or explicit interaction pairs,
optional hierarchy constraints (weak, strong),
fitting of triplet regressions for screening interactions,
p-value based interaction selection,
plotting utilities.
Actual estimation of main effects and interactions must be implemented in subclasses via
fit,predict,get_active_variablesandget_fitted_function.- Parameters:
interaction_candidates (list of int, optional) – Indices of features that are allowed to form interactions. If provided, all pairs
(j, k)are generated such thatjis in this list andkranges over all other features. Mutually exclusive withinteraction_pairs.interaction_pairs (list of tuple(int, int), optional) – Explicit list of feature index pairs to consider as interactions. Mutually exclusive with
interaction_candidates.vars_names (list of str, optional) – Variable names. If None, names are generated as
["X0", ..., "Xp"].zero_cutoff (float, default=1e-20) – Numerical tolerance used to determine whether a p-value is considered zero when scanning interactions.
interactions_threshold (float, optional) – Absolute threshold for selecting interaction pairs. If None, the data-adaptive largest log-gap rule is used.
save_plots (str, optional) – Directory where plots will be saved. If None, figures are not saved.
family_spec (dict, optional) – Must contain a
"family"key. Supports:"gaussian","binomial","cox".plot_cv_curve (bool, default=False) – Whether to plot cross-validation curves when applicable.
cv1se (bool, default=False) – Whether to apply the 1-SE rule when choosing the optimal lambda.
verbose (bool, default=False) – If True, prints progress and diagnostic information during fitting.
- family\_
Lowercase model family name:
"gaussian","binomial"or"cox".- Type:
str
- interaction_candidates\_
Standardized version of
interaction_candidates.- Type:
ndarray of shape (m,), optional
- interaction_pairs\_
Standardized version of
interaction_pairs.- Type:
ndarray of shape (k, 2), optional
- triplet_regressors\_
Dictionary mapping feature pairs
(j, k)to triplet regressors.- Type:
dict
- allowed_pairs\_
Interaction pairs allowed after applying hierarchy constraints and removing unstable pairs.
- Type:
list of tuple(int, int)
- selected_pairs\_
Interaction pairs selected based on p-values.
- Type:
ndarray of shape (r, 2)
- instable_pairs\_
Pairs excluded because the triplet model was numerically unstable (rank deficiency, large condition number or leverage).
- Type:
list of tuple(int, int)
- main_effects_names\_
Names of main effect variables.
- Type:
list of str
- interactions_names\_
Names of interactions
"Xj*Xk".- Type:
list of str
- main_effects_active_set\_
Active main effects when hierarchy is imposed.
- Type:
list of int, optional
- pvals\_
Mapping from
(name_j, name_k)to p-values for screening interactions. Only for allowed pairs.- Type:
dict
Notes
Triplet regression. For each pair of features
(j, k), a local model is fit on:\[[1, X_j, X_k, X_j X_k]\]to assess whether an interaction is statistically significant.
Hierarchy constraints. If
hierarchy = "strong"then only interactions with both variables active are allowed. If"weak", at least one must be active.Unstable pairs. Models with rank deficiency, large condition number or unit leverage are flagged as unstable and removed before selection.
Interaction selection. If
interactions_thresholdis given:\[p_{jk} \le \text{threshold}\]Otherwise, the data-adaptive largest log-gap rule determines the cutoff.
Subclasses must implement:
fit(X, y, **kwargs),predict(X),get_active_variables(),get_fitted_function().
- class uniPairs.estimator.UniPairs(two_stage: bool = True, **kwargs: Any)[source]¶
Bases:
BaseEstimatorUnified wrapper for UniPairs interaction models.
This class provides a high-level interface for fitting either the one-stage or two-stage UniPairs procedure. It delegates all computations to an internal model (
UniPairsOneStageorUniPairsTwoStage).- Parameters:
two_stage (bool, default=True) –
Determines which UniPairs procedure to use:
True: use the two-stage UniPairs estimatorFalse: use the one-stage UniPairs estimator
**kwargs (dict) – Additional keyword arguments forwarded directly to the selected internal model.
- model\_
The underlying fitted UniPairs estimator. Set after calling
fit.- Type:
- two_stage¶
Whether the wrapper is using the two-stage or one-stage method.
- Type:
bool
- kwargs¶
Saved keyword arguments passed at construction time.
- Type:
dict
- version¶
Version string inherited from the internal estimator (read-only). Returns
Noneif the model has not yet been fitted.- Type:
str or None
Notes
This class does not implement modeling logic itself. Instead:
During
fit:
- If
two_stage=True, an instance ofUniPairsTwoStageis created and fitted.
- If
- If
two_stage=False, an instance ofUniPairsOneStageis created and fitted.
- If
2. All subsequent calls to
predict,get_active_variables,get_fitted_functionare delegated to the fitted internal model.Examples
>>> model = UniPairs(two_stage=True, n_folds=5) >>> model.fit(X, y) >>> y_pred = model.predict(X) >>> model.get_active_variables() ['X0', 'X2', 'X0*X2']
Switching to one-stage:
>>> model = UniPairs(two_stage=False, n_folds=5) >>> model.fit(X, y) >>> model.get_fitted_function() '1.203 + 0.44*X0 + 0.08*X0*X3'
- class uniPairs.estimator.UniPairsOneStage(lmda_path: ndarray | None = None, n_folds: int = 10, **kwargs: Any)[source]¶
Bases:
BaseInteractionModelOne-stage UniPairs model for estimation of main effects and interactions.
- Parameters:
lmda_path (ndarray of shape (n_lmdas,), optional) – Lambda path for the joint UniLasso fit. If None, it is generated automatically.
n_folds (int, default=10) – Number of folds for cross-validation when selecting the regularization parameter.
**kwargs (dict) – Additional keyword arguments passed to
BaseInteractionModelsuch as hierarchy, plotting options, and verbosity.
- regressor\_
Fitted UniLasso model containing both main-effect and interaction terms.
- Type:
- lmda_path\_
Lambda path used during CV.
- Type:
ndarray of shape (n_lmdas,)
- cv_errors\_
Cross-validation errors over the lambda path.
- Type:
ndarray of shape (n_folds, n_lmdas)
- selected_pairs\_
Interaction index pairs retained after triplet screening.
- Type:
ndarray of shape (r, 2)
- main_effects_active_set\_
Active main effects identified after rescaling.
- Type:
ndarray of indices
- interactions_active_set\_
Active interaction effects identified after rescaling.
- Type:
ndarray of indices
Notes
This estimator fits a linear model in both main effects and interactions in one-stage.
Triplet regressions are fit for every allowed pair
(j, k)to obtain p-values for interaction terms. Unstable models (rank deficiency, large condition number, or unit leverage) are discarded.Interaction candidates are selected either via a user-defined p-value threshold or the largest log-gap rule.
After screening, a single UniLasso model is fitted on the expanded design combining all main effects and the selected interactions.
Cross-validation over a path of
n_lmdaslambda values is performed once, usingn_foldsfolds.All coefficients are finally transformed back to the original scale of the input variables. Active sets for both main effects and interactions are extracted from the refitted coefficients.
During prediction, interaction features are generated only for the selected pairs, stacked alongside main effects, and passed through the model. If a non-Gaussian family is used,
response_scale=Trueapplies the inverse link.Examples
>>> model = UniPairsOneStage( ... n_folds=5, ... ) >>> model.fit(X, y) >>> y_pred = model.predict(X) >>> model.get_active_variables() ['X0', 'X1*X4', 'X3']
- class uniPairs.estimator.UniPairsTwoStage(hierarchy: Literal['weak', 'strong'] | None = None, lmda_path_main_effects: ndarray | None = None, lmda_path_interactions: ndarray | None = None, n_folds_main_effects: int = 10, n_folds_interactions: int = 10, **kwargs)[source]¶
Bases:
BaseInteractionModelTwo-stage UniPairs model for estimation of main effects and interactions.
- Parameters:
hierarchy ({"weak", "strong", None}, optional) – Type of hierarchy constraint enforced between main effects and interactions. If None, no hierarchy is used.
lmda_path_main_effects (ndarray of shape (n_lmdas,), optional) – Lambda path for the UniLasso in stage 1. If None, it is generated automatically.
lmda_path_interactions (ndarray of shape (n_lmdas,), optional) – Lambda path for the Lasso in stage 2. If None, it is generated automatically.
n_folds_main_effects (int, default=10) – Number of folds for cross-validation of main effects.
n_folds_interactions (int, default=10) – Number of folds for cross-validation of interactions.
**kwargs (dict) – Additional keyword arguments passed to
BaseInteractionModel.
- main_effects_regressor\_
Fitted model for main effects after stage 1.
- Type:
- interactions_regressor\_
Fitted model for interaction terms after stage 2.
- Type:
- lmda_path_main_effects\_
Lambda path used in stage 1.
- Type:
ndarray of shape (n_lmdas,)
- lmda_path_interactions\_
Lambda path used in stage 2.
- Type:
ndarray of shape (n_lmdas,)
- stage1_cv_errors\_
Cross-validation errors for main effects.
- Type:
ndarray of shape (n_folds_main_effects, n_lmdas)
- stage2_cv_errors\_
Cross-validation errors for interactions.
- Type:
ndarray of shape (n_folds_interactions, n_lmdas)
- main_effects_active_set\_
Set of active main effects.
- Type:
ndarray of indices
- interactions_active_set\_
Set of active interactions.
- Type:
ndarray of indices
- selected_pairs\_
Interaction index pairs selected after triplet screening.
- Type:
ndarray of shape (r, 2)
Notes
This estimator fits a linear model of the form using a two-stage procedure.
Stage 1 — Main effects:
A UniLasso regression is used to select and estimate main effects. The regularization parameter is selected via K-fold cross-validation using
n_folds_main_effects. The path of lambda values (n_lmdaslong) may be user-specified or generated automatically.Stage 2 — Interaction screening and refitting:
Triplet regressions are fit for every allowed pair
(j, k)to obtain p-values for interaction terms. Unstable models (rank deficiency, large condition number, or unit leverage) are discarded.Interaction candidates are selected either via a user-defined p-value threshold or the largest log-gap rule. A Lasso model is then fitted on the selected interaction features with cross-validation using
n_folds_interactionsand a separaten_lmdaslambda path.Both stages apply hierarchy if specified:
hierarchy="strong": interactions allowed only if both main effects are active,hierarchy="weak": allowed if at least one is active,None: no hierarchy imposed.
Coefficients are finally converted back to the original scale of the input variables.
During prediction, interaction features are generated for only the selected pairs, and both components are added. If a non-Gaussian family is used,
response_scale=Trueapplies the inverse link.Examples
>>> model = UniPairsTwoStage( ... interaction_candidates=[0, 3, 5], ... hierarchy="weak", ... n_folds_main_effects=5, ... n_folds_interactions=5, ... ) >>> model.fit(X, y) >>> y_pred = model.predict(X) >>> model.get_active_variables() ['X0', 'X3', 'X5', 'X0*X3']