Automated Feature Engineering in Python
Machine learning is increasingly moving from hand-designed models to
automatically optimized pipelines using tools such as H20, TPOT, and auto-
sklearn. These libraries, along with methods such as random search, aim to
simplify the model selection and tuning parts of machine learning by finding the
best model for a dataset with little to no manual intervention. However,
feature engineering, an arguably more valuable aspect of the machine learning
pipeline, remains almost entirely a human labor.
Typically, feature engineering is a drawn-out manual process, relying on domain
knowledge, intuition, and data manipulation. This process can be extremely
tedious and the final features will be limited both by human subjectivity and
time. Automated feature engineering aims to help the data scientist by
automatically creating many candidate features out of a dataset from which
the best can be selected and used for training.