Conteúdo do Curso
Dealing with Dates and Times in Python
Dealing with Dates and Times in Python
Challenge: Investigation
As you noticed from the previous chapter, there are trips with negative and extremely huge durations (like more than 50 days). Surely this data can not be real, and we need to fix it if we want to go further.
What is the reason for extremely long trips? Most likely, it happened because some drivers forgot to turn off the taximeter when done with the route. The easiest way to deal with it - is simply to remove them as outliers. We will remove all the observations with durations greater-equal than 2 days (1-day duration will be investigated).
But what can be the real reason for negative durations? Let's try to find it out. Do not forget about timedelta
objects, since we want to compare durations (measured in hours, minutes, and seconds; rarely, in days).
To not convert both columns to
datetime
every time, we can setparse_dates
argument within.read_csv
function to list with column names we want to convert.
Tarefa
- Remove from
df
dataframe rows with abnormally long trips (duration is greater-equal than 2 days). - Extract the first 10 rows with the negative trip duration (
duration
column).
Obrigado pelo seu feedback!
Challenge: Investigation
As you noticed from the previous chapter, there are trips with negative and extremely huge durations (like more than 50 days). Surely this data can not be real, and we need to fix it if we want to go further.
What is the reason for extremely long trips? Most likely, it happened because some drivers forgot to turn off the taximeter when done with the route. The easiest way to deal with it - is simply to remove them as outliers. We will remove all the observations with durations greater-equal than 2 days (1-day duration will be investigated).
But what can be the real reason for negative durations? Let's try to find it out. Do not forget about timedelta
objects, since we want to compare durations (measured in hours, minutes, and seconds; rarely, in days).
To not convert both columns to
datetime
every time, we can setparse_dates
argument within.read_csv
function to list with column names we want to convert.
Tarefa
- Remove from
df
dataframe rows with abnormally long trips (duration is greater-equal than 2 days). - Extract the first 10 rows with the negative trip duration (
duration
column).
Obrigado pelo seu feedback!
Challenge: Investigation
As you noticed from the previous chapter, there are trips with negative and extremely huge durations (like more than 50 days). Surely this data can not be real, and we need to fix it if we want to go further.
What is the reason for extremely long trips? Most likely, it happened because some drivers forgot to turn off the taximeter when done with the route. The easiest way to deal with it - is simply to remove them as outliers. We will remove all the observations with durations greater-equal than 2 days (1-day duration will be investigated).
But what can be the real reason for negative durations? Let's try to find it out. Do not forget about timedelta
objects, since we want to compare durations (measured in hours, minutes, and seconds; rarely, in days).
To not convert both columns to
datetime
every time, we can setparse_dates
argument within.read_csv
function to list with column names we want to convert.
Tarefa
- Remove from
df
dataframe rows with abnormally long trips (duration is greater-equal than 2 days). - Extract the first 10 rows with the negative trip duration (
duration
column).
Obrigado pelo seu feedback!
As you noticed from the previous chapter, there are trips with negative and extremely huge durations (like more than 50 days). Surely this data can not be real, and we need to fix it if we want to go further.
What is the reason for extremely long trips? Most likely, it happened because some drivers forgot to turn off the taximeter when done with the route. The easiest way to deal with it - is simply to remove them as outliers. We will remove all the observations with durations greater-equal than 2 days (1-day duration will be investigated).
But what can be the real reason for negative durations? Let's try to find it out. Do not forget about timedelta
objects, since we want to compare durations (measured in hours, minutes, and seconds; rarely, in days).
To not convert both columns to
datetime
every time, we can setparse_dates
argument within.read_csv
function to list with column names we want to convert.
Tarefa
- Remove from
df
dataframe rows with abnormally long trips (duration is greater-equal than 2 days). - Extract the first 10 rows with the negative trip duration (
duration
column).