Understanding DataFrames
AI in Action
12345678910import pandas as pd products = pd.DataFrame({ "Name": ["Book", "Pen", "Notebook", "Pencil", "Eraser", "Ruler", "Marker"], "Price": [12.5, 1.5, 4.0, 0.8, 0.5, 2.0, 1.2], "Quantity": [5, 20, 12, 30, 50, 15, 25] }) print(products.head()) print(products.info())
A DataFrame is the core pandas data structure: a two-dimensional table with labeled rows and columns. You can think of it as a complete spreadsheet or SQL table inside Python, where each column is a Series.
Creating DataFrames
Just like Series, there's more than one way to build a DataFrame.
From a Dictionary of Lists
12345678910import pandas as pd data = { "Name": ["Alice", "Bob"], "Age": [25, 30], "City": ["New York", "Chicago"] } df = pd.DataFrame(data) print(df)
Each key becomes a column name, and the values form the column data.
From a List of Dictionaries
123456789import pandas as pd people = [ {"Name": "Alice", "Age": 25, "City": "New York"}, {"Name": "Bob", "Age": 30, "City": "Chicago"} ] df = pd.DataFrame(people) print(df)
Each dictionary represents one row of data.
A Quick Look at a DataFrame
When working with real data, you often don't want to print the whole table - especially if it has thousands of rows. Pandas gives you a few handy methods for quick checks:
1234567891011import pandas as pd df = pd.DataFrame({ "Name": ["Alice", "Bob", "Carol", "Dan", "Eve", "Frank", "Grace"], "Age": [25, 30, 27, 22, 29, 31, 28], "City": ["New York", "Chicago", "Boston", "Seattle", "Austin", "Denver", "Miami"] }) print(df.head()) print(df.tail()) print(df.sample(3))
.head(): shows the first rows (default 5);.tail(): shows the last rows (default 5);.sample(): shows a random selection of rows (default 1).
Metadata
A DataFrame also carries information about itself:
1234567891011121314import pandas as pd products = pd.DataFrame({ "Name": ["Book", "Pen", "Notebook", "Pencil", "Eraser", "Ruler", "Marker"], "Price": [12.5, 1.5, 4.0, 0.8, 0.5, 2.0, 1.2], "Quantity": [5, 20, 12, 30, 50, 15, 25] }) print(products.columns) print(products.index) print(products.dtypes) print(products.shape) print(products.size) print(products.info())
.columns: labels for columns;.index: labels for rows;.dtypes: data type of each column;.shape: number of rows and columns;.size: total number of elements;.info(): a summary of the DataFrame's metadata.
1. What is the main difference between a Series and a DataFrame?
2. By default, how many rows does df.head() display?
3. Which method provides a summary of the DataFrame's metadata?
Takk for tilbakemeldingene dine!
Spør AI
Spør AI
Spør om hva du vil, eller prøv ett av de foreslåtte spørsmålene for å starte chatten vår
Can you explain what the output of products.info() means?
What does the .shape attribute tell me about the DataFrame?
How can I access a specific column or row in the DataFrame?
Fantastisk!
Completion rate forbedret til 5.26
Understanding DataFrames
Sveip for å vise menyen
AI in Action
12345678910import pandas as pd products = pd.DataFrame({ "Name": ["Book", "Pen", "Notebook", "Pencil", "Eraser", "Ruler", "Marker"], "Price": [12.5, 1.5, 4.0, 0.8, 0.5, 2.0, 1.2], "Quantity": [5, 20, 12, 30, 50, 15, 25] }) print(products.head()) print(products.info())
A DataFrame is the core pandas data structure: a two-dimensional table with labeled rows and columns. You can think of it as a complete spreadsheet or SQL table inside Python, where each column is a Series.
Creating DataFrames
Just like Series, there's more than one way to build a DataFrame.
From a Dictionary of Lists
12345678910import pandas as pd data = { "Name": ["Alice", "Bob"], "Age": [25, 30], "City": ["New York", "Chicago"] } df = pd.DataFrame(data) print(df)
Each key becomes a column name, and the values form the column data.
From a List of Dictionaries
123456789import pandas as pd people = [ {"Name": "Alice", "Age": 25, "City": "New York"}, {"Name": "Bob", "Age": 30, "City": "Chicago"} ] df = pd.DataFrame(people) print(df)
Each dictionary represents one row of data.
A Quick Look at a DataFrame
When working with real data, you often don't want to print the whole table - especially if it has thousands of rows. Pandas gives you a few handy methods for quick checks:
1234567891011import pandas as pd df = pd.DataFrame({ "Name": ["Alice", "Bob", "Carol", "Dan", "Eve", "Frank", "Grace"], "Age": [25, 30, 27, 22, 29, 31, 28], "City": ["New York", "Chicago", "Boston", "Seattle", "Austin", "Denver", "Miami"] }) print(df.head()) print(df.tail()) print(df.sample(3))
.head(): shows the first rows (default 5);.tail(): shows the last rows (default 5);.sample(): shows a random selection of rows (default 1).
Metadata
A DataFrame also carries information about itself:
1234567891011121314import pandas as pd products = pd.DataFrame({ "Name": ["Book", "Pen", "Notebook", "Pencil", "Eraser", "Ruler", "Marker"], "Price": [12.5, 1.5, 4.0, 0.8, 0.5, 2.0, 1.2], "Quantity": [5, 20, 12, 30, 50, 15, 25] }) print(products.columns) print(products.index) print(products.dtypes) print(products.shape) print(products.size) print(products.info())
.columns: labels for columns;.index: labels for rows;.dtypes: data type of each column;.shape: number of rows and columns;.size: total number of elements;.info(): a summary of the DataFrame's metadata.
1. What is the main difference between a Series and a DataFrame?
2. By default, how many rows does df.head() display?
3. Which method provides a summary of the DataFrame's metadata?
Takk for tilbakemeldingene dine!