Best iteration selection
Criteria for picking the best number of features from a fitted
selector's cv_results_.
Each function in this module accepts the cv_results_ dict produced by
FeliminationRFECV, ForwardSelectorCV, or any of their presets, and
returns one of the values in cv_results_["n_features"]. Pass any of
them to the best_iteration_selection_criteria argument of those
selectors.
select_best_by_mean_test_score_and_overfit(cv_results)
Balance high test score against low overfit.
Ranks each iteration by descending mean_test_score and by ascending
overfit = mean_train_score - mean_test_score, then picks the
iteration that minimises the rank sum. Ties are broken by
mean_test_score.
Parameters:
-
cv_results(dict) –Must contain keys
mean_test_score,mean_train_scoreandn_features.
Returns:
-
int–The chosen number of features. Guaranteed to be one of the values in
cv_results["n_features"].
Source code in felimination/best_iteration_selection.py
select_best_by_n_features_and_score(cv_results)
Balance high test score against a small number of features.
Ranks each iteration by descending mean_test_score and by ascending
n_features, then picks the iteration that minimises the rank sum.
Useful when you want a parsimonious model and are willing to give up a
bit of score to drop features. Ties are broken by mean_test_score.
Parameters:
-
cv_results(dict) –Must contain keys
mean_test_scoreandn_features.
Returns:
-
int–The chosen number of features. Guaranteed to be one of the values in
cv_results["n_features"].