Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Challenge: Investigation | Working with Dates and Times in pandas
Dealing with Dates and Times in Python

bookChallenge: Investigation

As you noticed from the previous chapter, there are trips with negative and extremely huge durations (like more than 50 days). Surely this data can not be real, and we need to fix it if we want to go further.

What is the reason for extremely long trips? Most likely, it happened because some drivers forgot to turn off the taximeter when done with the route. The easiest way to deal with it - is simply to remove them as outliers. We will remove all the observations with durations greater-equal than 2 days (1-day duration will be investigated).

But what can be the real reason for negative durations? Let's try to find it out. Do not forget about timedelta objects, since we want to compare durations (measured in hours, minutes, and seconds; rarely, in days).

To not convert both columns to datetime every time, we can set parse_dates argument within .read_csv function to list with column names we want to convert.

Task

Swipe to start coding

  1. Remove from df dataframe rows with abnormally long trips (duration is greater-equal than 2 days).
  2. Extract the first 10 rows with the negative trip duration (duration column).

Solution

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 4. ChapterΒ 4
single

single

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

Suggested prompts:

Summarize this chapter

Explain the code in file

Explain why file doesn't solve the task

close

Awesome!

Completion rate improved to 3.23

bookChallenge: Investigation

Swipe to show menu

As you noticed from the previous chapter, there are trips with negative and extremely huge durations (like more than 50 days). Surely this data can not be real, and we need to fix it if we want to go further.

What is the reason for extremely long trips? Most likely, it happened because some drivers forgot to turn off the taximeter when done with the route. The easiest way to deal with it - is simply to remove them as outliers. We will remove all the observations with durations greater-equal than 2 days (1-day duration will be investigated).

But what can be the real reason for negative durations? Let's try to find it out. Do not forget about timedelta objects, since we want to compare durations (measured in hours, minutes, and seconds; rarely, in days).

To not convert both columns to datetime every time, we can set parse_dates argument within .read_csv function to list with column names we want to convert.

Task

Swipe to start coding

  1. Remove from df dataframe rows with abnormally long trips (duration is greater-equal than 2 days).
  2. Extract the first 10 rows with the negative trip duration (duration column).

Solution

Switch to desktopSwitch to desktop for real-world practiceContinue from where you are using one of the options below
Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 4. ChapterΒ 4
single

single

some-alt