Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Removing Characters: Method 1 | Preprocessing Data: Part I
Analyzing and Visualizing Real-World Data
course content

Зміст курсу

Analyzing and Visualizing Real-World Data

Analyzing and Visualizing Real-World Data

1. Preprocessing Data: Part I
2. Preprocessing Data: Part II
3. Analyzing Data
4. Visualizing Data

bookRemoving Characters: Method 1

There are at least two different ways to solve the problem of redundant symbols. The first method is to treat the column values as strings and then apply the necessary string method to remove the redundant characters.

Note

To treat column values as strings, use the .str accessor.

After deleting symbols, we can convert the columns into numerical format.

There are at least two ways to do this:

  • The first method is to use the .astype(type) method on a column, where type is either int for integers or float for real numbers. For instance, df['column'] = df['column'].astype(int);
  • The second method is to use the .to_numeric() method of pd (pandas), passing the column as the parameter. For instance, df['column'] = pd.to_numeric(df['column']).

Завдання

  1. Import the pandas library with the pd alias.
  2. Read the csv file and save it as a dataframe in the df variable.
  3. Remove the redundant symbols from prices and convert them to float type:
    • Select the 'Fuel_Price' column;
    • Use the .str accessor;
    • Remove the '$' characters from the left using the .lstrip() function;
    • Convert the resulting values to numerical format (float) using the .astype() method;
    • Assign the result to the 'Fuel_Price' column of df.
  4. Remove the % symbols from the 'Unemployment' column (which are located on the right side), using the .rstrip() method, and convert the values to float format using the same algorithm as in step 3.
  5. Remove the °C symbols from the 'Temperature' column (which are located on the right side), using the .rstrip() method, and convert the values to float format using the same algorithm as in step 3.
  6. Display the first row of the df dataframe and the data types of the df dataframe.

Once you've completed this task, click the button below the code to check your solution.

Switch to desktopПерейдіть на комп'ютер для реальної практикиПродовжуйте з того місця, де ви зупинились, використовуючи один з наведених нижче варіантів
Все було зрозуміло?

Як ми можемо покращити це?

Дякуємо за ваш відгук!

Секція 1. Розділ 4
toggle bottom row

bookRemoving Characters: Method 1

There are at least two different ways to solve the problem of redundant symbols. The first method is to treat the column values as strings and then apply the necessary string method to remove the redundant characters.

Note

To treat column values as strings, use the .str accessor.

After deleting symbols, we can convert the columns into numerical format.

There are at least two ways to do this:

  • The first method is to use the .astype(type) method on a column, where type is either int for integers or float for real numbers. For instance, df['column'] = df['column'].astype(int);
  • The second method is to use the .to_numeric() method of pd (pandas), passing the column as the parameter. For instance, df['column'] = pd.to_numeric(df['column']).

Завдання

  1. Import the pandas library with the pd alias.
  2. Read the csv file and save it as a dataframe in the df variable.
  3. Remove the redundant symbols from prices and convert them to float type:
    • Select the 'Fuel_Price' column;
    • Use the .str accessor;
    • Remove the '$' characters from the left using the .lstrip() function;
    • Convert the resulting values to numerical format (float) using the .astype() method;
    • Assign the result to the 'Fuel_Price' column of df.
  4. Remove the % symbols from the 'Unemployment' column (which are located on the right side), using the .rstrip() method, and convert the values to float format using the same algorithm as in step 3.
  5. Remove the °C symbols from the 'Temperature' column (which are located on the right side), using the .rstrip() method, and convert the values to float format using the same algorithm as in step 3.
  6. Display the first row of the df dataframe and the data types of the df dataframe.

Once you've completed this task, click the button below the code to check your solution.

Switch to desktopПерейдіть на комп'ютер для реальної практикиПродовжуйте з того місця, де ви зупинились, використовуючи один з наведених нижче варіантів
Все було зрозуміло?

Як ми можемо покращити це?

Дякуємо за ваш відгук!

Секція 1. Розділ 4
toggle bottom row

bookRemoving Characters: Method 1

There are at least two different ways to solve the problem of redundant symbols. The first method is to treat the column values as strings and then apply the necessary string method to remove the redundant characters.

Note

To treat column values as strings, use the .str accessor.

After deleting symbols, we can convert the columns into numerical format.

There are at least two ways to do this:

  • The first method is to use the .astype(type) method on a column, where type is either int for integers or float for real numbers. For instance, df['column'] = df['column'].astype(int);
  • The second method is to use the .to_numeric() method of pd (pandas), passing the column as the parameter. For instance, df['column'] = pd.to_numeric(df['column']).

Завдання

  1. Import the pandas library with the pd alias.
  2. Read the csv file and save it as a dataframe in the df variable.
  3. Remove the redundant symbols from prices and convert them to float type:
    • Select the 'Fuel_Price' column;
    • Use the .str accessor;
    • Remove the '$' characters from the left using the .lstrip() function;
    • Convert the resulting values to numerical format (float) using the .astype() method;
    • Assign the result to the 'Fuel_Price' column of df.
  4. Remove the % symbols from the 'Unemployment' column (which are located on the right side), using the .rstrip() method, and convert the values to float format using the same algorithm as in step 3.
  5. Remove the °C symbols from the 'Temperature' column (which are located on the right side), using the .rstrip() method, and convert the values to float format using the same algorithm as in step 3.
  6. Display the first row of the df dataframe and the data types of the df dataframe.

Once you've completed this task, click the button below the code to check your solution.

Switch to desktopПерейдіть на комп'ютер для реальної практикиПродовжуйте з того місця, де ви зупинились, використовуючи один з наведених нижче варіантів
Все було зрозуміло?

Як ми можемо покращити це?

Дякуємо за ваш відгук!

There are at least two different ways to solve the problem of redundant symbols. The first method is to treat the column values as strings and then apply the necessary string method to remove the redundant characters.

Note

To treat column values as strings, use the .str accessor.

After deleting symbols, we can convert the columns into numerical format.

There are at least two ways to do this:

  • The first method is to use the .astype(type) method on a column, where type is either int for integers or float for real numbers. For instance, df['column'] = df['column'].astype(int);
  • The second method is to use the .to_numeric() method of pd (pandas), passing the column as the parameter. For instance, df['column'] = pd.to_numeric(df['column']).

Завдання

  1. Import the pandas library with the pd alias.
  2. Read the csv file and save it as a dataframe in the df variable.
  3. Remove the redundant symbols from prices and convert them to float type:
    • Select the 'Fuel_Price' column;
    • Use the .str accessor;
    • Remove the '$' characters from the left using the .lstrip() function;
    • Convert the resulting values to numerical format (float) using the .astype() method;
    • Assign the result to the 'Fuel_Price' column of df.
  4. Remove the % symbols from the 'Unemployment' column (which are located on the right side), using the .rstrip() method, and convert the values to float format using the same algorithm as in step 3.
  5. Remove the °C symbols from the 'Temperature' column (which are located on the right side), using the .rstrip() method, and convert the values to float format using the same algorithm as in step 3.
  6. Display the first row of the df dataframe and the data types of the df dataframe.

Once you've completed this task, click the button below the code to check your solution.

Switch to desktopПерейдіть на комп'ютер для реальної практикиПродовжуйте з того місця, де ви зупинились, використовуючи один з наведених нижче варіантів
Секція 1. Розділ 4
Switch to desktopПерейдіть на комп'ютер для реальної практикиПродовжуйте з того місця, де ви зупинились, використовуючи один з наведених нижче варіантів
some-alt