Understanding DataFrames
AI in Action
12345678910import pandas as pd products = pd.DataFrame({ "Name": ["Book", "Pen", "Notebook", "Pencil", "Eraser", "Ruler", "Marker"], "Price": [12.5, 1.5, 4.0, 0.8, 0.5, 2.0, 1.2], "Quantity": [5, 20, 12, 30, 50, 15, 25] }) print(products.head()) print(products.info())
A DataFrame is the core pandas data structure: a two-dimensional table with labeled rows and columns. You can think of it as a complete spreadsheet or SQL table inside Python, where each column is a Series.
Creating DataFrames
Just like Series, there's more than one way to build a DataFrame.
From a Dictionary of Lists
12345678910import pandas as pd data = { "Name": ["Alice", "Bob"], "Age": [25, 30], "City": ["New York", "Chicago"] } df = pd.DataFrame(data) print(df)
Each key becomes a column name, and the values form the column data.
From a List of Dictionaries
123456789import pandas as pd people = [ {"Name": "Alice", "Age": 25, "City": "New York"}, {"Name": "Bob", "Age": 30, "City": "Chicago"} ] df = pd.DataFrame(people) print(df)
Each dictionary represents one row of data.
A Quick Look at a DataFrame
When working with real data, you often don't want to print the whole table - especially if it has thousands of rows. Pandas gives you a few handy methods for quick checks:
1234567891011import pandas as pd df = pd.DataFrame({ "Name": ["Alice", "Bob", "Carol", "Dan", "Eve", "Frank", "Grace"], "Age": [25, 30, 27, 22, 29, 31, 28], "City": ["New York", "Chicago", "Boston", "Seattle", "Austin", "Denver", "Miami"] }) print(df.head()) print(df.tail()) print(df.sample(3))
.head(): shows the first rows (default 5);.tail(): shows the last rows (default 5);.sample(): shows a random selection of rows (default 1).
Metadata
A DataFrame also carries information about itself:
1234567891011121314import pandas as pd products = pd.DataFrame({ "Name": ["Book", "Pen", "Notebook", "Pencil", "Eraser", "Ruler", "Marker"], "Price": [12.5, 1.5, 4.0, 0.8, 0.5, 2.0, 1.2], "Quantity": [5, 20, 12, 30, 50, 15, 25] }) print(products.columns) print(products.index) print(products.dtypes) print(products.shape) print(products.size) print(products.info())
.columns: labels for columns;.index: labels for rows;.dtypes: data type of each column;.shape: number of rows and columns;.size: total number of elements;.info(): a summary of the DataFrame's metadata.
1. What is the main difference between a Series and a DataFrame?
2. By default, how many rows does df.head() display?
3. Which method provides a summary of the DataFrame's metadata?
Thanks for your feedback!
Ask AI
Ask AI
Ask anything or try one of the suggested questions to begin our chat
Awesome!
Completion rate improved to 10
Understanding DataFrames
Swipe to show menu
AI in Action
12345678910import pandas as pd products = pd.DataFrame({ "Name": ["Book", "Pen", "Notebook", "Pencil", "Eraser", "Ruler", "Marker"], "Price": [12.5, 1.5, 4.0, 0.8, 0.5, 2.0, 1.2], "Quantity": [5, 20, 12, 30, 50, 15, 25] }) print(products.head()) print(products.info())
A DataFrame is the core pandas data structure: a two-dimensional table with labeled rows and columns. You can think of it as a complete spreadsheet or SQL table inside Python, where each column is a Series.
Creating DataFrames
Just like Series, there's more than one way to build a DataFrame.
From a Dictionary of Lists
12345678910import pandas as pd data = { "Name": ["Alice", "Bob"], "Age": [25, 30], "City": ["New York", "Chicago"] } df = pd.DataFrame(data) print(df)
Each key becomes a column name, and the values form the column data.
From a List of Dictionaries
123456789import pandas as pd people = [ {"Name": "Alice", "Age": 25, "City": "New York"}, {"Name": "Bob", "Age": 30, "City": "Chicago"} ] df = pd.DataFrame(people) print(df)
Each dictionary represents one row of data.
A Quick Look at a DataFrame
When working with real data, you often don't want to print the whole table - especially if it has thousands of rows. Pandas gives you a few handy methods for quick checks:
1234567891011import pandas as pd df = pd.DataFrame({ "Name": ["Alice", "Bob", "Carol", "Dan", "Eve", "Frank", "Grace"], "Age": [25, 30, 27, 22, 29, 31, 28], "City": ["New York", "Chicago", "Boston", "Seattle", "Austin", "Denver", "Miami"] }) print(df.head()) print(df.tail()) print(df.sample(3))
.head(): shows the first rows (default 5);.tail(): shows the last rows (default 5);.sample(): shows a random selection of rows (default 1).
Metadata
A DataFrame also carries information about itself:
1234567891011121314import pandas as pd products = pd.DataFrame({ "Name": ["Book", "Pen", "Notebook", "Pencil", "Eraser", "Ruler", "Marker"], "Price": [12.5, 1.5, 4.0, 0.8, 0.5, 2.0, 1.2], "Quantity": [5, 20, 12, 30, 50, 15, 25] }) print(products.columns) print(products.index) print(products.dtypes) print(products.shape) print(products.size) print(products.info())
.columns: labels for columns;.index: labels for rows;.dtypes: data type of each column;.shape: number of rows and columns;.size: total number of elements;.info(): a summary of the DataFrame's metadata.
1. What is the main difference between a Series and a DataFrame?
2. By default, how many rows does df.head() display?
3. Which method provides a summary of the DataFrame's metadata?
Thanks for your feedback!