Course Content
Introduction to Finance with Python
Introduction to Finance with Python
Processing Tabular Data with Pandas
Introduction
Important thing to know for data manipulation in Python - is Pandas library.
We can import it using import pandas as pd
.
This library provides us with two new data structures.
Series
The first one is called Series, and represents a one-dimensional array, which allows us to use not only the default numerical index, but working with data using both default and customized indexation, which may be fine in several cases.
Additionally, it provides us with a more flexible set of operations for data processing.
We can create Series instance using, for example, the next code:
import pandas as pd # Creating lists `array` with data and `ind` with indexation array = [1, 2, 3] ind = ['a', 'b', 'c'] # Creating Series `s` s = pd.Series(data = array, index = ind) print(s)
Another way - is using dictionary, where key-value pairs - are corresponding index-value pairs:
import pandas as pd # Creating Series `s` s = pd.Series({"a": 1, "b": 2, "c": 3}) print(s)
DataFrame
Another data structure implemented in Pandas - is, so called, DataFrame.
It represents a two-dimensional table, which have corresponding attributes index
and columns
, where, unlike in NumPy arrays, columns can have different data types, and like in Series, each of the dimensions have two indexations: default and customized.
Also, like Series, it provides a more flexible set of operations for data processing.
We can create DataFrame object using, for example, the next code:
import pandas as pd import numpy as np # Creating lists `array` with data, `ind` with indexation and `cols` with name of columns array = np.arange(1,10).reshape((3,3)) ind = ['a', 'b', 'c'] cols = ['First', 'Second', 'Third'] # Creating DataFrame `df` df = pd.DataFrame(data = array, index = ind, columns = cols) print(df)
As well as in the case of Series
, we can create a DataFrame
using a dictionary.
However, here each key-value is corresponding column name - collection of column values pair:
import pandas as pd # Creating DataFrame `df` df = pd.DataFrame({"a": ['e', 'f', 'g'], "b": [3, 4, 5]}) print(df)
Also we should mention that in the code above, index
will be set to default.
If we want to make other indexation, we should write it directly:
import pandas as pd # Creating DataFrame `df` df = pd.DataFrame({"a": ['e', 'f', 'g'], "b": [3, 4, 5]}, index = [6, 7, 8]) print(df)
Additionally to all operations for working with vectors and matrices, DataFrames
provides us with an opportunity of performing some operations as those into relational databases and not only.
For example, we can perform aggregation by values of a specific column:
import pandas as pd # Creating DataFrame `df` df = pd.DataFrame({'a': [0, 1, 0, 2, 1], 'b': [3, 4, 5, 6, 7]}) # Grouping `df` by values of `a` column, and applying aggregation by taking `mean` values of `b` column df_grouped = df.groupby(by=["a"]).mean() print(df_grouped)
Addition
More concepts with examples - in the following video:
Thanks for your feedback!