Hello, guy’s so today we will focus on the basics steps that we need to perform on a data set in EDA where the data is continuous.
First, let's import all the required libraries for the analysis.
This contains all the functions and libraries and codes for Exploratory Data Analysis.
# Numpy Library
import NumPy as np
# Pandas Library
import pandas as pd
# Matplot Library
import matplotlib.pyplot as plt
# Seaborn Library
import seaborn as sns
The first step is to always understand the variables present in the data set.
- This includes knowing the shape and info by using dataframe.info() and dataframe.shape.
- Going through each column using dataframe.head().
- Finding the insignificant and significant columns for the analysis.
- Reclassify the columns where it’s needed (What it means to reclassify is that for example if we have a column where we have categorical variables such as ‘Very Bad’,’ Bad’,’ Average’,’ Good’,’ Very good’ so in this case having ‘very bad and bad’ is pointless and similar for ‘Good and Very good’ so we go for something called Reclassifying the columns making it easier for analysis). Where were reclassify ‘Very Bad’ and ‘Bad’ as “Bad’ and similarly ‘Very Good and ‘Good’ as ‘Good’.