Displaying and Visualizing Results
メニューを表示するにはスワイプしてください
The display() function is a built-in Databricks command used to render data in an interactive, tabular, or graphical format. It allows users to explore datasets and create visual charts directly within a notebook without needing external libraries.
In Databricks, simply running a variable name or a query will show you raw data. However, to make that data readable and "presentation-ready," we use the display() command. This is the primary way to turn raw numbers into visual insights.
The Power of display()
When working with Python, you might be used to using print(). While print() works for text, it is not ideal for large datasets. By using display(your_dataframe), Databricks renders the data as an interactive table.
You can scroll through thousands of rows. You can click on column headers to sort data in ascending or descending order. You can use the built-in search bar within the results to find specific values instantly:
- You can scroll through thousands of rows;
- You can click on column headers to sort data in ascending or descending order;
- You can use the built-in search bar within the results to find specific values instantly.
Creating Automatic Charts
Once you have executed a cell using display() or run a SQL query, a result table appears. Directly below this table, you will see a "+" icon. Clicking this allows you to select "Visualization".
- This opens the Visualization Editor;
- You can choose from a variety of chart types: Bar, Line, Area, Pie, Scatter, and more;
- You simply drag and drop the columns you want for your X-axis and Y-axis. Databricks handles the aggregation (like summing or averaging the values) automatically.
Customizing the Visuals
The Visualization Editor is designed for speed. You can:
- Change Colors: assign specific colors to different data series;
- Label Axes: add custom titles to your horizontal and vertical axes;
- Group Data: use the "Group by" field to split a single line chart into multiple lines based on a category, such as "Region" or "Product Type."
Adding Visuals to the Notebook
Once you save your visualization, it appears as a separate tab alongside your data table. You can have multiple visualizations for the same cell. For example, one tab can show the raw sales data, the second can show a bar chart of sales by region, and the third can show a pie chart of product distribution. This keeps your notebook organized and allows stakeholders to see the "story" behind the data without looking at the underlying code.
Data Profiling
In addition to charts, Databricks provides a "Data Profile" tab in the results area. Clicking this gives you an instant statistical summary of your data, showing the distribution of values, missing counts, and min/max ranges for every column. This is an essential step for data cleaning before you begin a deeper analysis.
1. What is the main advantage of using display() instead of print() for a dataset?
2. Where do you click to start creating a chart from your query results?
フィードバックありがとうございます!
AIに質問する
AIに質問する
何でも質問するか、提案された質問の1つを試してチャットを始めてください