Practical 1

Data Pre-processing tasks in Python using Scikit-learn.

In simple words, pre-processing refers to the transformations applied to your data before feeding it to the algorithm. In python, scikit-learn library has a pre-built functionality under scikitlearn preprocessing.

Various data pre-processing techniques:

Encoding:  One hot encoding may be a process that transforms categorical data into a kind that would tend to ML algorithms to try to to a far better prediction job. It only accepts numerical information as an input. So, by using Label Encoder, the specific data that must be encoded is transformed into a numerical form.
Standardization: Data standardization is that the method by which one or more attributes are rescaled such they need a mean of 0 and a typical deviation of 1.
Normalization: The aim of normalization is to regulate the numeric column values to a typical scale within the dataset, without distorting the variations within the value ranges.         


  1 Importing Libraries:

 

Fig 1 import libraries

     2 Display data for this practical we use titanic dataset.


                                                                    Fig 2 dataset of titanic

    3 Missing Data:  for given data set first we find missing data using isnull() function

    Fig 3 missing value

                                                                        Fig 4 missing value age and cabin

        fill missing data with fillna() function.


  Fig 4 missing value filling with fillna()


    

4 Encoding: in given dataset some data are object and float we have to encoding in one form.




  Fig 5 Label Encoding
 Fig 6 Onehot Encoding

     5 Standardization: we have to standardize data for accurate result. 

 
 Fig 7  Standardization



Get the code from here




Comments