Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Processing Tabular Data with Pandas | Python Basics
Introduction to Finance with Python
course content

Contenido del Curso

Introduction to Finance with Python

Introduction to Finance with Python

1. Python Basics
2. Options Trading
3. Time Series Forecasting

bookProcessing Tabular Data with Pandas

Introduction

Important thing to know for data manipulation in Python - is Pandas library.

We can import it using import pandas as pd.

This library provides us with two new data structures.

Series

The first one is called Series, and represents a one-dimensional array, which allows us to use not only the default numerical index, but working with data using both default and customized indexation, which may be fine in several cases.

Additionally, it provides us with a more flexible set of operations for data processing.

We can create Series instance using, for example, the next code:

1234567
import pandas as pd # Creating lists `array` with data and `ind` with indexation array = [1, 2, 3] ind = ['a', 'b', 'c'] # Creating Series `s` s = pd.Series(data = array, index = ind) print(s)
copy

Another way - is using dictionary, where key-value pairs - are corresponding index-value pairs:

1234
import pandas as pd # Creating Series `s` s = pd.Series({"a": 1, "b": 2, "c": 3}) print(s)
copy

DataFrame

Another data structure implemented in Pandas - is, so called, DataFrame.

It represents a two-dimensional table, which have corresponding attributes index and columns, where, unlike in NumPy arrays, columns can have different data types, and like in Series, each of the dimensions have two indexations: default and customized.

Also, like Series, it provides a more flexible set of operations for data processing.

We can create DataFrame object using, for example, the next code:

123456789
import pandas as pd import numpy as np # Creating lists `array` with data, `ind` with indexation and `cols` with name of columns array = np.arange(1,10).reshape((3,3)) ind = ['a', 'b', 'c'] cols = ['First', 'Second', 'Third'] # Creating DataFrame `df` df = pd.DataFrame(data = array, index = ind, columns = cols) print(df)
copy

As well as in the case of Series, we can create a DataFrame using a dictionary.

However, here each key-value is corresponding column name - collection of column values pair:

1234
import pandas as pd # Creating DataFrame `df` df = pd.DataFrame({"a": ['e', 'f', 'g'], "b": [3, 4, 5]}) print(df)
copy

Also we should mention that in the code above, index will be set to default.

If we want to make other indexation, we should write it directly:

1234
import pandas as pd # Creating DataFrame `df` df = pd.DataFrame({"a": ['e', 'f', 'g'], "b": [3, 4, 5]}, index = [6, 7, 8]) print(df)
copy

Additionally to all operations for working with vectors and matrices, DataFrames provides us with an opportunity of performing some operations as those into relational databases and not only.

For example, we can perform aggregation by values of a specific column:

123456
import pandas as pd # Creating DataFrame `df` df = pd.DataFrame({'a': [0, 1, 0, 2, 1], 'b': [3, 4, 5, 6, 7]}) # Grouping `df` by values of `a` column, and applying aggregation by taking `mean` values of `b` column df_grouped = df.groupby(by=["a"]).mean() print(df_grouped)
copy

Addition

More concepts with examples - in the following video:

What is the difference between `loc` and `iloc` methods of `DataFrame` object?

What is the difference between loc and iloc methods of DataFrame object?

Selecciona la respuesta correcta

¿Todo estuvo claro?

¿Cómo podemos mejorarlo?

¡Gracias por tus comentarios!

Sección 1. Capítulo 5
some-alt