Practical 1
Data Pre-processing tasks in Python using Scikit-learn.
In simple words, pre-processing refers to the transformations applied to your data before feeding it to the algorithm. In python, scikit-learn library has a pre-built functionality under scikitlearn preprocessing.
Various data pre-processing techniques:
Encoding: One hot encoding may be a process that transforms categorical data into a kind that would tend to ML algorithms to try to to a far better prediction job. It only accepts numerical information as an input. So, by using Label Encoder, the specific data that must be encoded is transformed into a numerical form.
Standardization: Data standardization is that the method by which one or more attributes are rescaled such they need a mean of 0 and a typical deviation of 1.
Normalization: The aim of normalization is to regulate the numeric column values to a typical scale within the dataset, without distorting the variations within the value ranges.
1 Importing Libraries:
Fig 1 import libraries
2 Display data for this practical we use titanic dataset.
3 Missing Data: for given data set first we find missing data using isnull() function
Fig 3 missing value
fill missing data with fillna() function.
Fig 4 missing value filling with fillna()
4 Encoding: in given dataset some data are object and float we have to encoding in one form.
Fig 5 Label Encoding
Fig 6 Onehot Encoding
5 Standardization: we have to standardize data for accurate result.
Fig 7 Standardization
Get the code from here
Comments
Post a Comment