MRMR
Module with tools to perform forward feature selection using the Minimum Redundancy Maximum Relevance (MRMR) framework.
This module contains:
MRMRRanker: stateful importance-getter callable implementing the MRMR score (mutual information relevance / absolute Pearson correlation redundance), suitable for use withForwardSelectorCV.MRMRCV: preset ofForwardSelectorCVwired withMRMRRanker.
MRMRCV(estimator, *, step=1, min_features_to_select=None, max_features_to_select=None, cv=None, scoring=None, verbose=0, n_jobs=None, random_state=None, scheme='difference', n_neighbors=3, discrete_features='auto', relevance_func=None, redundance_func=None, redundancy_aggregation='max', min_relevance_perc=0.01, max_redundancy=None, discrete_imputer=None, continuous_imputer=None, max_samples=None, callbacks=None, best_iteration_selection_criteria='mean_test_score')
Bases: ForwardSelectorCV
Forward feature selector using Minimum Redundancy Maximum Relevance (MRMR) scoring.
Performs forward feature selection driven by MRMR scores, using cross-validation to determine the optimal number of features.
The selector starts by ranking all features by their relevance to the target and picks the highest-scoring one. It then iteratively selects the feature that maximises relevance minus (or divided by) redundance with already-selected features. Cross-validation scores the model at each evaluated step.
By default both relevance (feature-vs-target) and redundance
(feature-vs-already-selected-feature) are computed with mutual
information, which handles continuous and categorical features
transparently when discrete_features is supplied. Both functions
can be swapped out via relevance_func / redundance_func.
Parameters:
-
estimator(``Estimator`` instance) –A supervised learning estimator with a
fitmethod, used to score candidate feature subsets via cross-validation. Also used to detect classification vs. regression for the MRMR ranker (viais_classifier). -
step(int or float, default:1) –Number of features added between two consecutive cross-validation evaluations. If greater than or equal to 1, this is the integer number of features added per evaluation. If within (0.0, 1.0), it is the fraction (rounded down, with a floor of 1) of the already- selected features added per evaluation, growing the selection geometrically. Selection within a step still happens one feature at a time.
-
min_features_to_select(int, default:None) –Minimum number of features that must be selected before the first cross-validation evaluation. Features are still selected via MRMR scoring before this threshold, but no CV scoring takes place. If
None, defaults to 1 (CV evaluation starts from the very first selected feature). -
max_features_to_select(int, default:None) –Maximum number of features to select. The forward process stops once this many features have been selected. If
None, defaults to all features inX. -
cv(int, cross-validation generator or an iterable, default:None) –Determines the cross-validation splitting strategy. See
~sklearn.model_selection.check_cvfor accepted inputs. -
scoring((str, callable or None), default:None) –Scorer used to evaluate the estimator on each CV fold.
-
verbose(int, default:0) –Controls verbosity of output.
-
n_jobs(int or None, default:None) –Number of cores to run in parallel while fitting across folds. Also forwarded to the default mutual information estimators used for MRMR scoring.
-
random_state(int, RandomState instance or None, default:None) –Seed used by the default mutual information estimators and by
plot. -
scheme((ratio, difference), default:'ratio') –How to combine relevance and redundance:
'ratio':relevance / redundance(MIQ-style).'difference':relevance - redundance(MID-style).
-
n_neighbors(int, default:3) –Number of neighbors used by the default mutual information estimators. Ignored when both
relevance_funcandredundance_funcare overridden. -
discrete_features((auto, bool or array - like), default:'auto') –Indicates which input features are categorical. Accepted formats match
sklearn.feature_selection.mutual_info_classif:'auto': infer from dtype whenXis a :class:pandas.DataFrame— columns with categorical, string, or object dtype are treated as discrete; all others as continuous. Falls back to all-continuous for plain arrays.True: treat all features as discrete.- boolean mask of shape
(n_features,). - integer array of indices of the discrete features.
Used by the default relevance and redundance functions, both to tell the mutual information estimator which inputs are categorical and to decide whether to use the classifier or regressor estimator when a categorical feature is the target of a redundance computation. Ignored when both
relevance_funcandredundance_funcare overridden. -
relevance_func(callable, default:None) –Optional override for the relevance computation. Signature:
relevance_func(X, y) -> ndarrayof shape(n_features,), scoring each feature against the target. WhenNone(default), mutual information is used (handles categorical features viadiscrete_features). Useabs_pearson_correlationfor a fast Pearson-based alternative on purely numeric data. -
redundance_func(callable, default:None) –Optional override for the redundance computation. Signature:
redundance_func(X, y_feature) -> ndarrayof shape(n_features,), scoring each feature against the already-selectedy_feature. WhenNone(default), mutual information is used (the classifier vs. regressor estimator is chosen based on whether the target column is marked as categorical indiscrete_features). -
redundancy_aggregation((max, mean), default:'max') –How to aggregate per-selected-feature redundancy scores into a single redundancy value before combining with relevance:
'max': take the element-wise maximum across all already-selected features. A candidate is penalised as soon as it is highly redundant with any selected feature, making the criterion more conservative.'mean': take the element-wise mean, matching the formulation in the original MRMR paper (Peng et al., 2005).- callable: a function with signature
f(redundancy_matrix) -> ndarrayof shape(n_features,), whereredundancy_matrixhas shape(n_selected, n_features). Rows correspond to already-selected features; columns to candidate features.
Note: The default
'max'deviates from the original MRMR paper, which uses the mean.'max'is chosen as the default because it more aggressively avoids adding features that duplicate information already captured, which tends to work better in practice for forward selection with CV scoring. -
min_relevance_perc(float or None, default:0.01) –If set, features are filtered based on cumulative relevance. After computing relevance scores, a minimum relevance threshold is derived as
min_relevance_perc * sum(relevance scores). Features are then ordered by relevance ascending and their cumulative relevance is computed; any feature whose cumulative relevance (from the least relevant up to and including itself) is strictly below the threshold is assigned-infand will never be selected. This removes the low-relevance tail that together contributes less thanmin_relevance_percof the total relevance. -
max_redundancy(float or None, default:None) –If set, features whose aggregated redundancy with the already-selected features exceeds this threshold are assigned
-infand will not be selected in that round. The aggregation is controlled byredundancy_aggregation. Only applied when at least one feature has already been selected. -
discrete_imputer(sklearn-compatible transformer or None, default:None) –Forwarded to :class:
MRMRRanker. Imputer for discrete (categorical) columns. WhenNone, defaults toSimpleImputer(strategy='constant', fill_value='MISSING'). -
continuous_imputer(sklearn-compatible transformer or None, default:None) –Forwarded to :class:
MRMRRanker. Imputer for continuous (numeric) columns. WhenNone, defaults toSimpleImputer(strategy='median'). -
max_samples((int, float or None), default:None) –Forwarded to :class:
MRMRRanker. Number of samples used when computing mutual information scores.Nonemeans all samples. See :class:MRMRRankerfor the full description. -
callbacks(list of callable, default:None) –List of callables called at the end of each evaluated step. Each callable receives
(selector, scores)wherescoresis the last array of MRMR scores. -
best_iteration_selection_criteria(str or callable, default:'mean_test_score') –Either a key into
cv_results_(the iteration that maximises that key is picked) or a callablef(cv_results) -> n_featuresthat must return one of the values incv_results_["n_features"].
Examples:
>>> from felimination.mrmr import MRMRCV
>>> from sklearn.datasets import make_classification
>>> from sklearn.linear_model import LogisticRegression
>>> X, y = make_classification(n_samples=200, n_features=10, random_state=0)
>>> selector = MRMRCV(
... LogisticRegression(),
... min_features_to_select=2,
... max_features_to_select=8,
... step=1,
... cv=3,
... random_state=0,
... ).fit(X, y)
>>> selector.support_.sum() > 0
True
Source code in felimination/mrmr.py
plot(**kwargs)
Plot the cross-validation curve over number of features.
Parameters:
-
**kwargs(dict, default:{}) –Forwarded to
seaborn.lineplot.
Returns:
-
Axes–
Source code in felimination/forward.py
select_best_iteration(cv_results)
Return the best n_features value given cv_results_.
Source code in felimination/forward.py
set_n_features_to_select(n_features_to_select)
Change the number of selected features after fitting.
The underlying estimator is not retrained — predict /
predict_proba keep using the model fit on the originally
selected features. Only support_, transform and
get_feature_names_out are affected.
Parameters:
-
n_features_to_select(int) –Must be one of the values in
cv_results_["n_features"].
Source code in felimination/forward.py
MRMRRanker(regression=False, scheme='difference', n_neighbors=3, discrete_features='auto', random_state=None, n_jobs=None, relevance_func=None, redundance_func=None, redundancy_aggregation='max', min_relevance_perc=0.01, max_redundancy=None, discrete_imputer=None, continuous_imputer=None, max_samples=None)
Importance getter implementing the Minimum Redundancy Maximum Relevance score.
By default both relevance (feature-vs-target) and redundance
(feature-vs-already-selected-feature) are computed with mutual
information, which handles continuous and categorical features
transparently when discrete_features is supplied. Both functions
can be swapped out via relevance_func / redundance_func.
The ranker is lazy and stateful. On every call it computes a lightweight
fingerprint of (X, y) (shape, dtype, boundary values). If the
fingerprint matches the previous call, all cached state — relevance and
per-feature redundance vectors — is reused. If it differs (different
dataset or different CV fold), the caches are reset automatically before
re-initialising. This means the same instance can be reused across
successive fit calls efficiently.
Redundance is stored per feature in _redundance_cache: the redundance
vector for a given feature is computed at most once per dataset — if the
same feature appears in selected_idx of a later call, the cached
vector is reused directly.
The ranker auto-initialises on its first call.
Parameters:
-
regression(bool, default:False) –Whether the target is continuous. Switches the default relevance between
mutual_info_regressionandmutual_info_classif. Ignored whenrelevance_funcis set. -
scheme((ratio, difference), default:'ratio') –How to combine relevance and redundance:
'ratio':relevance / redundance(MIQ-style).'difference':relevance - redundance(MID-style).
-
n_neighbors(int, default:3) –Number of neighbors used by the default mutual information estimators. Ignored when both functions are overridden.
-
discrete_features('auto', bool, or array-like, default:'auto') –Indicates which input features are categorical. Accepted formats match
sklearn.feature_selection.mutual_info_classif:'auto': infer from dtype whenXis a :class:pandas.DataFrame— columns with categorical, string, or object dtype are treated as discrete; all others as continuous. Falls back to all-continuous for plain arrays.True: treat all features as discrete.- boolean mask of shape
(n_features,). - integer array of indices of the discrete features.
Used by the default relevance and redundance functions, both to tell the mutual information estimator which inputs are categorical and to decide whether to use the classifier or regressor estimator when a categorical feature is the target of a redundance computation. Ignored when both
relevance_funcandredundance_funcare overridden. -
random_state(int, RandomState instance or None, default:None) –Seed used by the default mutual information estimators.
-
n_jobs(int or None, default:None) –Forwarded to the default mutual information estimators.
-
relevance_func(callable, default:None) –Optional override for the relevance computation. Signature:
relevance_func(X, y) -> ndarrayof shape(n_features,), scoring each feature against the target. WhenNone(default), mutual information is used (handles categorical features viadiscrete_features). Useabs_pearson_correlationfor a fast Pearson-based alternative on purely numeric data. -
redundance_func(callable, default:None) –Optional override for the redundance computation. Signature:
redundance_func(X, y_feature) -> ndarrayof shape(n_features,), scoring each feature against the already-selectedy_feature. WhenNone(default), mutual information is used (the classifier vs. regressor estimator is chosen based on whether the target column is marked as categorical indiscrete_features). -
redundancy_aggregation((max, mean), default:'max') –How to aggregate per-selected-feature redundancy scores into a single redundancy value before combining with relevance:
'max': take the element-wise maximum across all already-selected features. A candidate is penalised as soon as it is highly redundant with any selected feature, making the criterion more conservative.'mean': take the element-wise mean, matching the formulation in the original MRMR paper (Peng et al., 2005).- callable: a function with signature
f(redundancy_matrix) -> ndarrayof shape(n_features,), whereredundancy_matrixhas shape(n_selected, n_features). Rows correspond to already-selected features; columns to candidate features.
Note: The default
'max'deviates from the original MRMR paper, which uses the mean.'max'is chosen as the default because it more aggressively avoids adding features that duplicate information already captured, which tends to work better in practice for forward selection with CV scoring. -
min_relevance_perc(float or None, default:0.01) –If set, features are filtered based on cumulative relevance. After computing relevance scores, a minimum relevance threshold is derived as
min_relevance_perc * sum(relevance scores). Features are then ordered by relevance ascending and their cumulative relevance is computed; any feature whose cumulative relevance (from the least relevant up to and including itself) is strictly below the threshold is assigned-infand will never be selected. This removes the low-relevance tail that together contributes less thanmin_relevance_percof the total relevance. -
max_redundancy(float or None, default:None) –If set, features whose aggregated redundancy with the already-selected features exceeds this threshold are assigned
-infand will not be selected in that round. The aggregation is controlled byredundancy_aggregation. Only applied when at least one feature has already been selected. -
discrete_imputer(sklearn-compatible transformer or None, default:None) –Imputer applied to discrete (categorical) feature columns before encoding. When
None, defaults toSimpleImputer(strategy='constant', fill_value='MISSING'), replacing missing values with the string'MISSING'(treated as an additional category). Pass any sklearn-compatible transformer withfit/transform. Ignored when there are no discrete columns. -
continuous_imputer(sklearn-compatible transformer or None, default:None) –Imputer applied to continuous (numeric) feature columns before the mutual information computation. When
None, defaults toSimpleImputer(strategy='median'). Pass any sklearn-compatible transformer withfit/transform. Ignored when there are no continuous columns. For arrays with non-objectdtype, applied to all columns regardless ofdiscrete_features. -
max_samples((int, float or None), default:None) –Number of samples used when computing mutual information scores. Imputers are still fitted on the full training set; only the MI scoring (relevance on the first call and redundance on subsequent calls) uses the subsample.
None: use all samples (no subsampling).int: use exactly this many samples (capped atn_samples).floatin(0.0, 1.0]: use this fraction of the training set (at least 1 sample).
The same row indices are drawn once per forward-selection run (controlled by
random_state) and reused for every subsequent redundance computation, keeping relevance and redundance comparable.
Attributes:
-
relevance_(ndarray of shape (n_features,)) –Per-feature relevance, populated on the first call.
Source code in felimination/mrmr.py
abs_pearson_correlation(X, y)
Absolute Pearson correlation between each column of X and y.
Convenience helper for use as relevance_func or
redundance_func in MRMRRanker. Only suitable for numeric data;
use mutual-information based scoring (the default) when categorical
features are present.
Parameters:
-
X(array-like of shape (n_samples, n_features)) – -
y(array-like of shape (n_samples,)) –
Returns:
-
ndarray of shape (n_features,)–Absolute Pearson correlation per feature.