Practical - 2

Aim: Perform the following data pre-processing (Feature selection/Elimination) task using python


What is feature selection?

In machine learning and statistics, feature selection, also known as variable selection, attribute selection or variable subset selection, is the process of selecting a subset of relevant features for use in model construction.

Why it is important?
  • It improves model performance.
  • It leads to faster machine learning models.
  • It prevents overfitting. We would able to perfectly match our training data if we have more columns in the data than the number of rows.
  • Removing Garbage
Methods of Feature Selection

Variance threshold: This method removes features with the variation below a certain cutoff. The idea is when a feature doesn't vary much within itself, it generally has very little predictive power.

Univariate feature selection: Using univariate statistical test such as chi-square, Univariate feature selection works by choosing the best characteristics. It independently tests each feature to assess the intensity of the feature's relationship with the response variable. 

Recursive feature elimination: Recursive feature elimination starts by fitting a model for each predictor on the entire set of features and computing an important score. The weakest features are then removed, the model is re-fitted, and once the specified number of features are used, significant scores are computed again.

PCA: Principal component analysis, is a dimensionality-reduction method that is often used to reduce the dimensionality of large data sets, by transforming a large set of variables into a smaller one that still contains most of the information in the large set.

Correlation: Correlation is a statistical term which in common usage refers to how close two variables are to having a linear relationship with each other.

Importing Packages


Dataset (Heart Failure)


Univariate Selection


Recursive Feature Elimination


Principal Component Analysis


Feature Importance



Python Code Link

No comments:

Post a Comment

Welcome to my blog