Genetic Algorithms x MRMR: Smarter Mutation Candidate Selection¶

This tutorial shows how to combine the genetic-algorithm feature selector with the MRMR (Minimum Redundancy Maximum Relevance) ranker to make the mutation step smarter.

In the standard GA flow, when a solution is mutated (a feature is swapped in or out), the replacement candidate is chosen at random from the pool of unselected features. By passing an MRMRRanker as the mutation_candidate_scorer, the selector instead scores every candidate feature relative to what is already in the solution being mutated — features that are highly relevant to the target and not yet well-represented in the current feature set are ranked first.

This leads to mutations that explore the search space more intelligently: instead of randomly swapping in a correlated copy of a feature that is already selected, the ranker steers the mutation toward genuinely complementary features.

In [ ]:

Copied!

# Install felimination
! pip install felimination
# Install felimination
! pip install felimination

How MRMR-guided mutation works¶

At each mutation step, HybridImportanceGACVFeatureSelector picks a solution from the pool and swaps one of its features for a candidate from outside the solution. When mutation_candidate_scorer is set, the scoring function is called with:

scorer(X, y, selected_features) -> array of shape (n_features,)

where selected_features is the list of feature indices already in that particular solution. MRMRRanker uses this list to compute the MRMR score for every feature:

Relevance — mutual information between the candidate feature and the target y.
Redundancy — average mutual information between the candidate and each already-selected feature.

The score is relevance / redundancy (or relevance - redundancy for the "difference" scheme). Features that are relevant but not redundant with the current selection rank highest and are therefore the most likely mutation candidates.

The mutation_candidate_selection parameter controls how the ranked list is used:

Value	Behaviour
`"best"`	Always pick the top-ranked candidate (deterministic).
`"sample"`	Sample proportionally to the score (stochastic; default).

"sample" is usually preferable because it preserves diversity — the best candidate is picked most often but lower-ranked ones still get a chance, keeping the population from collapsing prematurely.

Create a dummy dataset¶

We use the same dataset as in the Genetic Algorithms x Feature Selection tutorial so that results are directly comparable: 200 features in total, of which 6 are informative, 10 are redundant (correlated with the informative ones), and the remaining 184 are pure noise.

In [29]:

Copied!





from sklearn.datasets import make_classification

X, y = make_classification(
    n_samples=1000,
    n_features=200,
    n_informative=6,
    n_redundant=10,
    n_clusters_per_class=1,
    random_state=42,
    shuffle=False,
)
from sklearn.datasets import make_classification

X, y = make_classification(
    n_samples=1000,
    n_features=200,
    n_informative=6,
    n_redundant=10,
    n_clusters_per_class=1,
    random_state=42,
    shuffle=False,
)

Baseline: performance without feature selection¶

In [30]:

Copied!





from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import StratifiedKFold, cross_validate

model = LogisticRegression(random_state=42)
cv = StratifiedKFold(n_splits=5, random_state=42, shuffle=True)

cv_results = cross_validate(model, X, y, cv=cv, scoring="roc_auc", return_train_score=True)
print(f"Baseline AUC: {cv_results['test_score'].mean():.4f}")
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import StratifiedKFold, cross_validate

model = LogisticRegression(random_state=42)
cv = StratifiedKFold(n_splits=5, random_state=42, shuffle=True)

cv_results = cross_validate(model, X, y, cv=cv, scoring="roc_auc", return_train_score=True)
print(f"Baseline AUC: {cv_results['test_score'].mean():.4f}")

Baseline AUC: 0.8561

Standard GA (random mutation)¶

First we run the GA with its default random mutation, as a reference point.

In [36]:

Copied!





from felimination.callbacks import plot_progress_callback
from felimination.ga import HybridImportanceGACVFeatureSelector

selector_random = HybridImportanceGACVFeatureSelector(
    model,
    callbacks=[plot_progress_callback],
    scoring="roc_auc",
    cv=cv,
    init_avg_features_num=5,
    min_features_to_select=3,
    pool_size=20,
    n_children_cross_over=20,
    n_mutations=20,
    patience=5,
    random_state=42,
    range_randomly_swapped_features_mutation=(1, 5)
)
selector_random.fit(X, y)
from felimination.callbacks import plot_progress_callback
from felimination.ga import HybridImportanceGACVFeatureSelector

selector_random = HybridImportanceGACVFeatureSelector(
    model,
    callbacks=[plot_progress_callback],
    scoring="roc_auc",
    cv=cv,
    init_avg_features_num=5,
    min_features_to_select=3,
    pool_size=20,
    n_children_cross_over=20,
    n_mutations=20,
    patience=5,
    random_state=42,
    range_randomly_swapped_features_mutation=(1, 5)
)
selector_random.fit(X, y)

No description has been provided for this image

Out[36]:

HybridImportanceGACVFeatureSelector(callbacks=[<function plot_progress_callback at 0x119cb5bc0>],
                                    cv=StratifiedKFold(n_splits=5, random_state=42, shuffle=True),
                                    estimator=LogisticRegression(random_state=42),
                                    init_avg_features_num=5,
                                    min_features_to_select=3,
                                    n_children_cross_over=20, n_mutations=20,
                                    random_state=42,
                                    range_randomly_swapped_features_mutation=(1,
                                                                              5),
                                    scoring='roc_auc')

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

In [37]:

Copied!

print(f"Best AUC (random mutation): {selector_random.best_solution_['mean_test_score']:.4f}")
print(f"Features selected: {sorted(selector_random.best_solution_['features'])}")
print(f"Best AUC (random mutation): {selector_random.best_solution_['mean_test_score']:.4f}")
print(f"Features selected: {sorted(selector_random.best_solution_['features'])}")

Best AUC (random mutation): 0.9329
Features selected: [np.int64(6), np.int64(8), np.int64(10), np.int64(14), np.int64(15), 69, np.int64(82), 113, np.int64(197)]

GA with MRMR mutation scorer¶

Now we pass an MRMRRanker as mutation_candidate_scorer. The ranker is stateful and lazy: relevance scores are computed once on the first call and reused; per-feature redundance vectors are filled on demand and cached, so features that appear in many solutions only pay the computation cost once.

We use mutation_candidate_selection="sample" so that the mutation remains stochastic but biased toward high-scoring candidates.

In [38]:

Copied!





from felimination.mrmr import MRMRRanker

mrmr_ranker = MRMRRanker(regression=False, random_state=42)

selector_mrmr = HybridImportanceGACVFeatureSelector(
    model,
    callbacks=[plot_progress_callback],
    scoring="roc_auc",
    cv=cv,
    init_avg_features_num=5,
    min_features_to_select=3,
    pool_size=20,
    n_children_cross_over=20,
    n_mutations=20,
    mutation_candidate_scorer=mrmr_ranker,
    mutation_candidate_selection="sample",
    random_state=42,
    patience=5,
    range_randomly_swapped_features_mutation=(1, 5)
)
selector_mrmr.fit(X, y)
from felimination.mrmr import MRMRRanker

mrmr_ranker = MRMRRanker(regression=False, random_state=42)

selector_mrmr = HybridImportanceGACVFeatureSelector(
    model,
    callbacks=[plot_progress_callback],
    scoring="roc_auc",
    cv=cv,
    init_avg_features_num=5,
    min_features_to_select=3,
    pool_size=20,
    n_children_cross_over=20,
    n_mutations=20,
    mutation_candidate_scorer=mrmr_ranker,
    mutation_candidate_selection="sample",
    random_state=42,
    patience=5,
    range_randomly_swapped_features_mutation=(1, 5)
)
selector_mrmr.fit(X, y)

Out[38]:

HybridImportanceGACVFeatureSelector(callbacks=[<function plot_progress_callback at 0x119cb5bc0>],
                                    cv=StratifiedKFold(n_splits=5, random_state=42, shuffle=True),
                                    estimator=LogisticRegression(random_state=42),
                                    init_avg_features_num=5,
                                    min_features_to_select=3,
                                    mutation_candidate_scorer=<felimination.mrmr.MRMRRanker object at 0x1207c2f90>,
                                    n_children_cross_over=20, n_mutations=20,
                                    random_state=42,
                                    range_randomly_swapped_features_mutation=(1,
                                                                              5),
                                    scoring='roc_auc')

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

In [39]:

Copied!

print(f"Best AUC (MRMR mutation): {selector_mrmr.best_solution_['mean_test_score']:.4f}")
print(f"Features selected: {sorted(selector_mrmr.best_solution_['features'])}")
print(f"Best AUC (MRMR mutation): {selector_mrmr.best_solution_['mean_test_score']:.4f}")
print(f"Features selected: {sorted(selector_mrmr.best_solution_['features'])}")

Best AUC (MRMR mutation): 0.9307
Features selected: [np.int64(6), np.int64(8), np.int64(10), 11, 12, 105, 159, 168, np.int64(197)]

In [ ]:

Compare results¶

Let's put the numbers side by side.

In [40]:

Copied!





import pandas as pd

results = pd.DataFrame(
    {
        "Setup": ["Baseline (all features)", "GA (random mutation)", "GA (MRMR mutation)"],
        "AUC": [
            cv_results["test_score"].mean(),
            selector_random.best_solution_["mean_test_score"],
            selector_mrmr.best_solution_["mean_test_score"],
        ],
        "n_features": [
            X.shape[1],
            len(selector_random.best_solution_["features"]),
            len(selector_mrmr.best_solution_["features"]),
        ],
        "n_iterations": [
            0,
            len(selector_random.best_solutions_),
            len(selector_mrmr.best_solutions_),
        ],
    }
).set_index("Setup")

results
import pandas as pd

results = pd.DataFrame(
    {
        "Setup": ["Baseline (all features)", "GA (random mutation)", "GA (MRMR mutation)"],
        "AUC": [
            cv_results["test_score"].mean(),
            selector_random.best_solution_["mean_test_score"],
            selector_mrmr.best_solution_["mean_test_score"],
        ],
        "n_features": [
            X.shape[1],
            len(selector_random.best_solution_["features"]),
            len(selector_mrmr.best_solution_["features"]),
        ],
        "n_iterations": [
            0,
            len(selector_random.best_solutions_),
            len(selector_mrmr.best_solutions_),
        ],
    }
).set_index("Setup")

results

Out[40]:

	AUC	n_features	n_iterations
Setup
Baseline (all features)	0.856136	200	0
GA (random mutation)	0.932938	9	18
GA (MRMR mutation)	0.930678	9	10

Why does MRMR guidance help?¶

In this dataset, 10 of the 200 features are linear combinations of the 6 informative ones — they carry similar signal but add redundancy. Random mutation has no way to distinguish them from features that add genuinely new information. The MRMR scorer, on the other hand, penalises candidates that are highly correlated with features already in the solution, so it naturally avoids swapping in a redundant copy when a complementary feature is available.

The result is a search that tends to reach good solutions in fewer generations, because fewer mutations are "wasted" on substitutions that do not change the information content of the selected set.

	penalty penalty: {'l1', 'l2', 'elasticnet', None}, default='l2' Specify the norm of the penalty: - `None`: no penalty is added; - `'l2'`: add a L2 penalty term and it is the default choice; - `'l1'`: add a L1 penalty term; - `'elasticnet'`: both L1 and L2 penalty terms are added. .. warning:: Some penalties may not work with some solvers. See the parameter `solver` below, to know the compatibility between the penalty and solver. .. versionadded:: 0.19 l1 penalty with SAGA solver (allowing 'multinomial' + L1) .. deprecated:: 1.8 `penalty` was deprecated in version 1.8 and will be removed in 1.10. Use `l1_ratio` instead. `l1_ratio=0` for `penalty='l2'`, `l1_ratio=1` for `penalty='l1'` and `l1_ratio` set to any float between 0 and 1 for `'penalty='elasticnet'`.	'deprecated'
	C C: float, default=1.0 Inverse of regularization strength; must be a positive float. Like in support vector machines, smaller values specify stronger regularization. `C=np.inf` results in unpenalized logistic regression. For a visual example on the effect of tuning the `C` parameter with an L1 penalty, see: :ref:`sphx_glr_auto_examples_linear_model_plot_logistic_path.py`.	1.0
	l1_ratio l1_ratio: float, default=0.0 The Elastic-Net mixing parameter, with `0 <= l1_ratio <= 1`. Setting `l1_ratio=1` gives a pure L1-penalty, setting `l1_ratio=0` a pure L2-penalty. Any value between 0 and 1 gives an Elastic-Net penalty of the form `l1_ratio * L1 + (1 - l1_ratio) * L2`. .. warning:: Certain values of `l1_ratio`, i.e. some penalties, may not work with some solvers. See the parameter `solver` below, to know the compatibility between the penalty and solver. .. versionchanged:: 1.8 Default value changed from None to 0.0. .. deprecated:: 1.8 `None` is deprecated and will be removed in version 1.10. Always use `l1_ratio` to specify the penalty type.	0.0
	dual dual: bool, default=False Dual (constrained) or primal (regularized, see also :ref:`this equation `) formulation. Dual formulation is only implemented for l2 penalty with liblinear solver. Prefer `dual=False` when n_samples > n_features.	False
	tol tol: float, default=1e-4 Tolerance for stopping criteria.	0.0001
	fit_intercept fit_intercept: bool, default=True Specifies if a constant (a.k.a. bias or intercept) should be added to the decision function.	True
	intercept_scaling intercept_scaling: float, default=1 Useful only when the solver `liblinear` is used and `self.fit_intercept` is set to `True`. In this case, `x` becomes `[x, self.intercept_scaling]`, i.e. a "synthetic" feature with constant value equal to `intercept_scaling` is appended to the instance vector. The intercept becomes ``intercept_scaling * synthetic_feature_weight``. .. note:: The synthetic feature weight is subject to L1 or L2 regularization as all other features. To lessen the effect of regularization on synthetic feature weight (and therefore on the intercept) `intercept_scaling` has to be increased.	1
	class_weight class_weight: dict or 'balanced', default=None Weights associated with classes in the form ``{class_label: weight}``. If not given, all classes are supposed to have weight one. The "balanced" mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as ``n_samples / (n_classes * np.bincount(y))``. Note that these weights will be multiplied with sample_weight (passed through the fit method) if sample_weight is specified. .. versionadded:: 0.17 class_weight='balanced'	None
	random_state random_state: int, RandomState instance, default=None Used when ``solver`` == 'sag', 'saga' or 'liblinear' to shuffle the data. See :term:`Glossary ` for details.	42
	solver solver: {'lbfgs', 'liblinear', 'newton-cg', 'newton-cholesky', 'sag', 'saga'}, default='lbfgs' Algorithm to use in the optimization problem. Default is 'lbfgs'. To choose a solver, you might want to consider the following aspects: - 'lbfgs' is a good default solver because it works reasonably well for a wide class of problems. - For :term:`multiclass` problems (`n_classes >= 3`), all solvers except 'liblinear' minimize the full multinomial loss, 'liblinear' will raise an error. - 'newton-cholesky' is a good choice for `n_samples` >> `n_features * n_classes`, especially with one-hot encoded categorical features with rare categories. Be aware that the memory usage of this solver has a quadratic dependency on `n_features * n_classes` because it explicitly computes the full Hessian matrix. - For small datasets, 'liblinear' is a good choice, whereas 'sag' and 'saga' are faster for large ones; - 'liblinear' can only handle binary classification by default. To apply a one-versus-rest scheme for the multiclass setting one can wrap it with the :class:`~sklearn.multiclass.OneVsRestClassifier`. .. warning:: The choice of the algorithm depends on the penalty chosen (`l1_ratio=0` for L2-penalty, `l1_ratio=1` for L1-penalty and `0 < l1_ratio < 1` for Elastic-Net) and on (multinomial) multiclass support: ================= ======================== ====================== solver l1_ratio multinomial multiclass ================= ======================== ====================== 'lbfgs' l1_ratio=0 yes 'liblinear' l1_ratio=1 or l1_ratio=0 no 'newton-cg' l1_ratio=0 yes 'newton-cholesky' l1_ratio=0 yes 'sag' l1_ratio=0 yes 'saga' 0<=l1_ratio<=1 yes ================= ======================== ====================== .. note:: 'sag' and 'saga' fast convergence is only guaranteed on features with approximately the same scale. You can preprocess the data with a scaler from :mod:`sklearn.preprocessing`. .. seealso:: Refer to the :ref:`User Guide ` for more information regarding :class:`LogisticRegression` and more specifically the :ref:`Table ` summarizing solver/penalty supports. .. versionadded:: 0.17 Stochastic Average Gradient (SAG) descent solver. Multinomial support in version 0.18. .. versionadded:: 0.19 SAGA solver. .. versionchanged:: 0.22 The default solver changed from 'liblinear' to 'lbfgs' in 0.22. .. versionadded:: 1.2 newton-cholesky solver. Multinomial support in version 1.6.	'lbfgs'
	max_iter max_iter: int, default=100 Maximum number of iterations taken for the solvers to converge.	100
	verbose verbose: int, default=0 For the liblinear and lbfgs solvers set verbose to any positive number for verbosity.	0
	warm_start warm_start: bool, default=False When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution. Useless for liblinear solver. See :term:`the Glossary `. .. versionadded:: 0.17 warm_start to support lbfgs, newton-cg, sag, saga solvers.	False
	n_jobs n_jobs: int, default=None Does not have any effect. .. deprecated:: 1.8 `n_jobs` is deprecated in version 1.8 and will be removed in 1.10.	None

	penalty penalty: {'l1', 'l2', 'elasticnet', None}, default='l2' Specify the norm of the penalty: - `None`: no penalty is added; - `'l2'`: add a L2 penalty term and it is the default choice; - `'l1'`: add a L1 penalty term; - `'elasticnet'`: both L1 and L2 penalty terms are added. .. warning:: Some penalties may not work with some solvers. See the parameter `solver` below, to know the compatibility between the penalty and solver. .. versionadded:: 0.19 l1 penalty with SAGA solver (allowing 'multinomial' + L1) .. deprecated:: 1.8 `penalty` was deprecated in version 1.8 and will be removed in 1.10. Use `l1_ratio` instead. `l1_ratio=0` for `penalty='l2'`, `l1_ratio=1` for `penalty='l1'` and `l1_ratio` set to any float between 0 and 1 for `'penalty='elasticnet'`.	'deprecated'
	C C: float, default=1.0 Inverse of regularization strength; must be a positive float. Like in support vector machines, smaller values specify stronger regularization. `C=np.inf` results in unpenalized logistic regression. For a visual example on the effect of tuning the `C` parameter with an L1 penalty, see: :ref:`sphx_glr_auto_examples_linear_model_plot_logistic_path.py`.	1.0
	l1_ratio l1_ratio: float, default=0.0 The Elastic-Net mixing parameter, with `0 <= l1_ratio <= 1`. Setting `l1_ratio=1` gives a pure L1-penalty, setting `l1_ratio=0` a pure L2-penalty. Any value between 0 and 1 gives an Elastic-Net penalty of the form `l1_ratio * L1 + (1 - l1_ratio) * L2`. .. warning:: Certain values of `l1_ratio`, i.e. some penalties, may not work with some solvers. See the parameter `solver` below, to know the compatibility between the penalty and solver. .. versionchanged:: 1.8 Default value changed from None to 0.0. .. deprecated:: 1.8 `None` is deprecated and will be removed in version 1.10. Always use `l1_ratio` to specify the penalty type.	0.0
	dual dual: bool, default=False Dual (constrained) or primal (regularized, see also :ref:`this equation `) formulation. Dual formulation is only implemented for l2 penalty with liblinear solver. Prefer `dual=False` when n_samples > n_features.	False
	tol tol: float, default=1e-4 Tolerance for stopping criteria.	0.0001
	fit_intercept fit_intercept: bool, default=True Specifies if a constant (a.k.a. bias or intercept) should be added to the decision function.	True
	intercept_scaling intercept_scaling: float, default=1 Useful only when the solver `liblinear` is used and `self.fit_intercept` is set to `True`. In this case, `x` becomes `[x, self.intercept_scaling]`, i.e. a "synthetic" feature with constant value equal to `intercept_scaling` is appended to the instance vector. The intercept becomes ``intercept_scaling * synthetic_feature_weight``. .. note:: The synthetic feature weight is subject to L1 or L2 regularization as all other features. To lessen the effect of regularization on synthetic feature weight (and therefore on the intercept) `intercept_scaling` has to be increased.	1
	class_weight class_weight: dict or 'balanced', default=None Weights associated with classes in the form ``{class_label: weight}``. If not given, all classes are supposed to have weight one. The "balanced" mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as ``n_samples / (n_classes * np.bincount(y))``. Note that these weights will be multiplied with sample_weight (passed through the fit method) if sample_weight is specified. .. versionadded:: 0.17 class_weight='balanced'	None
	random_state random_state: int, RandomState instance, default=None Used when ``solver`` == 'sag', 'saga' or 'liblinear' to shuffle the data. See :term:`Glossary ` for details.	42
	solver solver: {'lbfgs', 'liblinear', 'newton-cg', 'newton-cholesky', 'sag', 'saga'}, default='lbfgs' Algorithm to use in the optimization problem. Default is 'lbfgs'. To choose a solver, you might want to consider the following aspects: - 'lbfgs' is a good default solver because it works reasonably well for a wide class of problems. - For :term:`multiclass` problems (`n_classes >= 3`), all solvers except 'liblinear' minimize the full multinomial loss, 'liblinear' will raise an error. - 'newton-cholesky' is a good choice for `n_samples` >> `n_features * n_classes`, especially with one-hot encoded categorical features with rare categories. Be aware that the memory usage of this solver has a quadratic dependency on `n_features * n_classes` because it explicitly computes the full Hessian matrix. - For small datasets, 'liblinear' is a good choice, whereas 'sag' and 'saga' are faster for large ones; - 'liblinear' can only handle binary classification by default. To apply a one-versus-rest scheme for the multiclass setting one can wrap it with the :class:`~sklearn.multiclass.OneVsRestClassifier`. .. warning:: The choice of the algorithm depends on the penalty chosen (`l1_ratio=0` for L2-penalty, `l1_ratio=1` for L1-penalty and `0 < l1_ratio < 1` for Elastic-Net) and on (multinomial) multiclass support: ================= ======================== ====================== solver l1_ratio multinomial multiclass ================= ======================== ====================== 'lbfgs' l1_ratio=0 yes 'liblinear' l1_ratio=1 or l1_ratio=0 no 'newton-cg' l1_ratio=0 yes 'newton-cholesky' l1_ratio=0 yes 'sag' l1_ratio=0 yes 'saga' 0<=l1_ratio<=1 yes ================= ======================== ====================== .. note:: 'sag' and 'saga' fast convergence is only guaranteed on features with approximately the same scale. You can preprocess the data with a scaler from :mod:`sklearn.preprocessing`. .. seealso:: Refer to the :ref:`User Guide ` for more information regarding :class:`LogisticRegression` and more specifically the :ref:`Table ` summarizing solver/penalty supports. .. versionadded:: 0.17 Stochastic Average Gradient (SAG) descent solver. Multinomial support in version 0.18. .. versionadded:: 0.19 SAGA solver. .. versionchanged:: 0.22 The default solver changed from 'liblinear' to 'lbfgs' in 0.22. .. versionadded:: 1.2 newton-cholesky solver. Multinomial support in version 1.6.	'lbfgs'
	max_iter max_iter: int, default=100 Maximum number of iterations taken for the solvers to converge.	100
	verbose verbose: int, default=0 For the liblinear and lbfgs solvers set verbose to any positive number for verbosity.	0
	warm_start warm_start: bool, default=False When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution. Useless for liblinear solver. See :term:`the Glossary `. .. versionadded:: 0.17 warm_start to support lbfgs, newton-cg, sag, saga solvers.	False
	n_jobs n_jobs: int, default=None Does not have any effect. .. deprecated:: 1.8 `n_jobs` is deprecated in version 1.8 and will be removed in 1.10.	None

	estimator	LogisticRegre...ndom_state=42)
	cv	StratifiedKFo... shuffle=True)
	scoring	'roc_auc'
	random_state	42
	n_jobs	None
	importance_getter	'auto'
	min_features_to_select	3
	init_avg_features_num	5
	init_std_features_num	5
	pool_size	20
	is_parent_selection_chance_proportional_to_fitness	True
	n_children_cross_over	20
	n_parents_cross_over	2
	n_mutations	20
	range_change_n_features_mutation	(-2, ...)
	range_randomly_swapped_features_mutation	(1, ...)
	max_generations	100
	patience	5
	callbacks	[<function plo...t 0x119cb5bc0>]
	fitness_function	'mean_test_score'
	mutation_candidate_scorer	None
	mutation_candidate_selection	'sample'