Understanding DataFrames
AI in Action
12345678910import pandas as pd products = pd.DataFrame({ "Name": ["Book", "Pen", "Notebook", "Pencil", "Eraser", "Ruler", "Marker"], "Price": [12.5, 1.5, 4.0, 0.8, 0.5, 2.0, 1.2], "Quantity": [5, 20, 12, 30, 50, 15, 25] }) print(products.head()) print(products.info())
A DataFrame is the core pandas data structure: a two-dimensional table with labeled rows and columns. You can think of it as a complete spreadsheet or SQL table inside Python, where each column is a Series.
Creating DataFrames
Just like Series, there's more than one way to build a DataFrame.
From a Dictionary of Lists
12345678910import pandas as pd data = { "Name": ["Alice", "Bob"], "Age": [25, 30], "City": ["New York", "Chicago"] } df = pd.DataFrame(data) print(df)
Each key becomes a column name, and the values form the column data.
From a List of Dictionaries
123456789import pandas as pd people = [ {"Name": "Alice", "Age": 25, "City": "New York"}, {"Name": "Bob", "Age": 30, "City": "Chicago"} ] df = pd.DataFrame(people) print(df)
Each dictionary represents one row of data.
A Quick Look at a DataFrame
When working with real data, you often don't want to print the whole table - especially if it has thousands of rows. Pandas gives you a few handy methods for quick checks:
1234567891011import pandas as pd df = pd.DataFrame({ "Name": ["Alice", "Bob", "Carol", "Dan", "Eve", "Frank", "Grace"], "Age": [25, 30, 27, 22, 29, 31, 28], "City": ["New York", "Chicago", "Boston", "Seattle", "Austin", "Denver", "Miami"] }) print(df.head()) print(df.tail()) print(df.sample(3))
.head(): shows the first rows (default 5);.tail(): shows the last rows (default 5);.sample(): shows a random selection of rows (default 1).
Metadata
A DataFrame also carries information about itself:
1234567891011121314import pandas as pd products = pd.DataFrame({ "Name": ["Book", "Pen", "Notebook", "Pencil", "Eraser", "Ruler", "Marker"], "Price": [12.5, 1.5, 4.0, 0.8, 0.5, 2.0, 1.2], "Quantity": [5, 20, 12, 30, 50, 15, 25] }) print(products.columns) print(products.index) print(products.dtypes) print(products.shape) print(products.size) print(products.info())
.columns: labels for columns;.index: labels for rows;.dtypes: data type of each column;.shape: number of rows and columns;.size: total number of elements;.info(): a summary of the DataFrame's metadata.
1. What is the main difference between a Series and a DataFrame?
2. By default, how many rows does df.head() display?
3. Which method provides a summary of the DataFrame's metadata?
¡Gracias por tus comentarios!
Pregunte a AI
Pregunte a AI
Pregunte lo que quiera o pruebe una de las preguntas sugeridas para comenzar nuestra charla
Can you explain what the output of products.info() means?
What does the .shape attribute tell me about the DataFrame?
How can I access a specific column or row in the DataFrame?
Genial!
Completion tasa mejorada a 5.26
Understanding DataFrames
Desliza para mostrar el menú
AI in Action
12345678910import pandas as pd products = pd.DataFrame({ "Name": ["Book", "Pen", "Notebook", "Pencil", "Eraser", "Ruler", "Marker"], "Price": [12.5, 1.5, 4.0, 0.8, 0.5, 2.0, 1.2], "Quantity": [5, 20, 12, 30, 50, 15, 25] }) print(products.head()) print(products.info())
A DataFrame is the core pandas data structure: a two-dimensional table with labeled rows and columns. You can think of it as a complete spreadsheet or SQL table inside Python, where each column is a Series.
Creating DataFrames
Just like Series, there's more than one way to build a DataFrame.
From a Dictionary of Lists
12345678910import pandas as pd data = { "Name": ["Alice", "Bob"], "Age": [25, 30], "City": ["New York", "Chicago"] } df = pd.DataFrame(data) print(df)
Each key becomes a column name, and the values form the column data.
From a List of Dictionaries
123456789import pandas as pd people = [ {"Name": "Alice", "Age": 25, "City": "New York"}, {"Name": "Bob", "Age": 30, "City": "Chicago"} ] df = pd.DataFrame(people) print(df)
Each dictionary represents one row of data.
A Quick Look at a DataFrame
When working with real data, you often don't want to print the whole table - especially if it has thousands of rows. Pandas gives you a few handy methods for quick checks:
1234567891011import pandas as pd df = pd.DataFrame({ "Name": ["Alice", "Bob", "Carol", "Dan", "Eve", "Frank", "Grace"], "Age": [25, 30, 27, 22, 29, 31, 28], "City": ["New York", "Chicago", "Boston", "Seattle", "Austin", "Denver", "Miami"] }) print(df.head()) print(df.tail()) print(df.sample(3))
.head(): shows the first rows (default 5);.tail(): shows the last rows (default 5);.sample(): shows a random selection of rows (default 1).
Metadata
A DataFrame also carries information about itself:
1234567891011121314import pandas as pd products = pd.DataFrame({ "Name": ["Book", "Pen", "Notebook", "Pencil", "Eraser", "Ruler", "Marker"], "Price": [12.5, 1.5, 4.0, 0.8, 0.5, 2.0, 1.2], "Quantity": [5, 20, 12, 30, 50, 15, 25] }) print(products.columns) print(products.index) print(products.dtypes) print(products.shape) print(products.size) print(products.info())
.columns: labels for columns;.index: labels for rows;.dtypes: data type of each column;.shape: number of rows and columns;.size: total number of elements;.info(): a summary of the DataFrame's metadata.
1. What is the main difference between a Series and a DataFrame?
2. By default, how many rows does df.head() display?
3. Which method provides a summary of the DataFrame's metadata?
¡Gracias por tus comentarios!