Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Types consistency | Preprocessing Data: Part I
Data Manipulation using pandas
course content

Course Content

Data Manipulation using pandas

Data Manipulation using pandas

1. Preprocessing Data: Part I
2. Preprocessing Data: Part II
3. Grouping Data
4. Aggregating and Visualizing Data
5. Joining Data

bookTypes consistency

One of the first steps of analyzing received data is checking the values types. If we are talking about column with age, then we expect to have there integer type; or column with salaries should be either integer or float.

Remember, to get the columns types in pandas, you should use the .dtypes attribute. Execute the code below to find out values types.

1234567
# Importing the library import pandas as pd # Reading the file df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/f2947b09-5f0d-4ad9-992f-ec0b87cd4b3f/data.csv') # Output values types print(df.dtypes)
copy

So many types... How not to get confused here? Let's see what columns have object type. To do it, we are going to use the same attribute and within square brackets set the condition. Since we received a Series object, column names will be indexes of this Series, so for convenient output we will output indexes only.

1234567
# Importing the library import pandas as pd # Reading the file df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/f2947b09-5f0d-4ad9-992f-ec0b87cd4b3f/data.csv') # Output only columns with 'object' type print(df.dtypes[df.dtypes == object].index)
copy

Here is the first problem. Columns 'totinch', 'morgh', 'valueh', 'grosrth', 'omphtotinch' should be considered as numerical, taking into account their specifics.

ColumnDescription
'TOTINCH'Total Household Income
'MORGH'Presence of Mortgage
'VALUEH'Value of Dwelling
'GROSRTH'Monthly Gross Rent
'OMPH'Owner's Major Payments (Monthly)

Let's find out why these columns were considered as object.

Everything was clear?

How can we improve it?

Thanks for your feedback!

Section 1. Chapter 2
some-alt