Desliza para mostrar el menú

Introduction

Important thing to know for data manipulation in Python - is Pandas library.

We can import it using import pandas as pd.

This library provides us with two new data structures.

Series

The first one is called Series, and represents a one-dimensional array, which allows us to use not only the default numerical index, but working with data using both default and customized indexation, which may be fine in several cases.

Additionally, it provides us with a more flexible set of operations for data processing.

We can create Series instance using, for example, the next code:


              1234567
            
import pandas as pd
# Creating lists `array` with data and `ind` with indexation
array = [1, 2, 3]
ind = ['a', 'b', 'c']
# Creating Series `s`
s = pd.Series(data = array, index = ind)
print(s)

Another way - is using dictionary, where key-value pairs - are corresponding index-value pairs:


              1234
            
import pandas as pd
# Creating Series `s`
s = pd.Series({"a": 1, "b": 2, "c": 3})
print(s)

DataFrame

Another data structure implemented in Pandas - is, so called, DataFrame.

It represents a two-dimensional table, which have corresponding attributes index and columns, where, unlike in NumPy arrays, columns can have different data types, and like in Series, each of the dimensions have two indexations: default and customized.

Also, like Series, it provides a more flexible set of operations for data processing.

We can create DataFrame object using, for example, the next code:


              123456789
            
import pandas as pd
import numpy as np
# Creating lists `array` with data, `ind` with indexation and `cols` with name of columns
array = np.arange(1,10).reshape((3,3))
ind = ['a', 'b', 'c']
cols = ['First', 'Second', 'Third']
# Creating DataFrame `df`
df = pd.DataFrame(data = array, index = ind, columns = cols)
print(df)

As well as in the case of Series, we can create a DataFrame using a dictionary.

However, here each key-value is corresponding column name - collection of column values pair:


              1234
            
import pandas as pd
# Creating DataFrame `df`
df = pd.DataFrame({"a": ['e', 'f', 'g'], "b": [3, 4, 5]})
print(df)

Also we should mention that in the code above, index will be set to default.

If we want to make other indexation, we should write it directly:


              1234
            
import pandas as pd
# Creating DataFrame `df`
df = pd.DataFrame({"a": ['e', 'f', 'g'], "b": [3, 4, 5]}, index = [6, 7, 8])
print(df)

Additionally to all operations for working with vectors and matrices, DataFrames provides us with an opportunity of performing some operations as those into relational databases and not only.

For example, we can perform aggregation by values of a specific column:


              123456
            
import pandas as pd
# Creating DataFrame `df`
df = pd.DataFrame({'a': [0, 1, 0, 2, 1], 'b': [3, 4, 5, 6, 7]})
# Grouping `df` by values of `a` column, and applying aggregation by taking `mean` values of `b` column
df_grouped = df.groupby(by=["a"]).mean()
print(df_grouped)

Addition

More concepts with examples - in the following video:

¿Todo estuvo claro?

¡Gracias por tus comentarios!

Sección 1. Capítulo 5

Pregunte a AI

Pregunte lo que quiera o pruebe una de las preguntas sugeridas para comenzar nuestra charla

Sección 1. Capítulo 5

Processing Tabular Data with Pandas

Introduction

Series

DataFrame

Addition