Extend functionality of Sequential Feature Selector to allow repeating cross-validation. #272

nd26 · 2017-10-31T19:55:49Z

Description

Added ‘n_cv_repeats’ parameter and extended functionality with RepeatedStratifiedKFold and RepeatedKFold to allow repeating cross-validation for both classifiers and regressors respectively.

Related issues or pull requests

Fixes #271

Pull Request requirements

Added appropriate unit test functions in the ./mlxtend/*/tests directories
Ran nosetests ./mlxtend -sv and make sure that all unit tests pass
Checked the test coverage by running nosetests ./mlxtend --with-coverage
Checked for style issues by running flake8 ./mlxtend
Added a note about the modification or contribution to the ./docs/sources/CHANGELOG.md file
Modify documentation in the appropriate location under mlxtend/docs/sources/ (optional)
Checked that the Travis-CI build passed at https://travis-ci.org/rasbt/mlxtend

…StratifiedKFold and RepeatedKFold

pep8speaks · 2017-10-31T19:55:52Z

Hello @nd26! Thanks for updating the PR.

In the file mlxtend/feature_selection/sequential_feature_selector.py, following are the PEP8 issues :

Line 21:80: E501 line too long (91 > 79 characters)
Line 24:1: E302 expected 2 blank lines, found 1
Line 28:80: E501 line too long (105 > 79 characters)
Line 30:80: E501 line too long (95 > 79 characters)
Line 30:96: W291 trailing whitespace
Line 33:80: E501 line too long (92 > 79 characters)
Line 111:78: W291 trailing whitespace
Line 112:77: W291 trailing whitespace
Line 113:77: W291 trailing whitespace
Line 138:30: E251 unexpected spaces around keyword / parameter equals
Line 138:32: E251 unexpected spaces around keyword / parameter equals
Line 164:80: E501 line too long (86 > 79 characters)
Line 166:80: E501 line too long (87 > 79 characters)
Line 169:9: E303 too many blank lines (2)

Comment last updated on November 01, 2017 at 15:51 Hours UTC

rasbt · 2017-10-31T20:31:23Z

Thanks for the PR! Based on the test results, it just seems to be a Tab vs. Whitespace error. I.e., try to replace tabs by 4 whitespaces each.

Btw. you can always test your code locally by running nosetests mlxtend/feature_selection/, for example.

nd26 · 2017-10-31T23:18:25Z

No problem! Thanks for the reply. Will do tomorrow morning :)

rasbt · 2017-11-01T04:19:45Z

Sure, no hurry!

Btw. added an issue regarding the random number generator you mentioned via email (#274). I am not sure what's more convenient, adding the Repeated CV feature or the random seed first. But I think adding repeated CV (using the current cross_val_score implementation) should work without an extra random seed. After that is implemented, I am happy to rewrite the code to support random seeds.

However, I just wanted to mention it in case the repeated CV causes troubles without random seeds (i.e., irreproducible unit tests). In that case, we may want to add random seeds first and then the repeated CV.

Anyways, thanks a lot for contributing, both repeated CV and the random seeds are useful enhancements!

nd26 · 2017-11-01T13:13:43Z

Most likely, if there are unit tests that test the n_cv_repeats parameter, they will fail due to the lack of reproducibility (SFS is highly unstable when used against a large number of features). But, current tests only test cross_val_score without using different values for n_cv_repeats, right?

No problem man!

coveralls · 2017-11-01T16:05:41Z

Coverage decreased (-0.08%) to 90.828% when pulling 2a5c9c5 on nd26:feature/RepeatedSKFold271 into 3028be8 on rasbt:master.

rasbt · 2017-11-01T17:55:45Z

I just see the reason why one of the unit tests is failing is because one of them is running sklearn 0.18, which doesn't yet have the RepeatedKFold class (it was added in sklearn 0.19).

I think it's fair to switch all the unit tests to sklearn >= 0.19 as it has been out now for a while.

The way to fix this would be to change SKLEARN_VERSION="0.18" in the .travis.yml file to SKLEARN_VERSION="0.19" in the section

- python: 3.5
          env: LATEST="false" COVERAGE="false" NUMPY_VERSION="1.11.2" SCIPY_VERSION="0.18.1" SKLEARN_VERSION="0.18" PANDAS_VERSION="0.19.1"

Added n_cv_repeats parameter and extended functionality with Repeated…

a46e200

…StratifiedKFold and RepeatedKFold

Changed tabs to spaces + added parameter to constructor

2a5c9c5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend functionality of Sequential Feature Selector to allow repeating cross-validation. #272

Extend functionality of Sequential Feature Selector to allow repeating cross-validation. #272

nd26 commented Oct 31, 2017 •

edited

pep8speaks commented Oct 31, 2017 •

edited

rasbt commented Oct 31, 2017

nd26 commented Oct 31, 2017

rasbt commented Nov 1, 2017

nd26 commented Nov 1, 2017 •

edited

coveralls commented Nov 1, 2017

rasbt commented Nov 1, 2017

Extend functionality of Sequential Feature Selector to allow repeating cross-validation. #272

Are you sure you want to change the base?

Extend functionality of Sequential Feature Selector to allow repeating cross-validation. #272

Conversation

nd26 commented Oct 31, 2017 • edited

Description

Related issues or pull requests

Pull Request requirements

pep8speaks commented Oct 31, 2017 • edited

Comment last updated on November 01, 2017 at 15:51 Hours UTC

rasbt commented Oct 31, 2017

nd26 commented Oct 31, 2017

rasbt commented Nov 1, 2017

nd26 commented Nov 1, 2017 • edited

coveralls commented Nov 1, 2017

rasbt commented Nov 1, 2017

nd26 commented Oct 31, 2017 •

edited

pep8speaks commented Oct 31, 2017 •

edited

nd26 commented Nov 1, 2017 •

edited