Writing and Running Python Code
Veeg om het menu te tonen
Python is a primary language in Databricks used for data manipulation, machine learning, and automation. Databricks notebooks provide an interactive environment where Python code is written in cells and executed against a connected cluster.
Now that your notebook is created and attached to a cluster, you can begin writing and executing code. Because Databricks notebooks are built on a specialized version of Jupyter, the experience will feel familiar if you have used other coding environments.
Understanding Cells
The building block of a notebook is the cell. You can have as many cells as you like in a single notebook. To create a new cell, hover your mouse at the top or bottom of an existing cell and click the "+" icon next to either the Code or the Text option, establishing a cell of your choice.
- Input: you type your Python code into the gray box;
- Output: once executed, the results (such as data tables, printed text, or error messages) appear directly below that specific cell.
Running Code
There are three main ways to execute a Python cell in Databricks:
- The Play Icon: click the "Run Cell" (triangle) icon in the top-right corner of the cell;
- Shift + Enter: this runs the current cell and automatically moves your cursor to the next cell (or creates a new one);
- Ctrl + Enter (Cmd + Enter on Mac): this runs the current cell and keeps your cursor inside it. This is useful when you are testing and re-testing the same block of code.
A Simple Python Exercise
In the video of this chapter we saw how to work with variables in cells. Let us try now a more interesting example to test that your environment is working: running a simple calculation. Copy the following code into a cell:
12345678910# Define variables price = 100 quantity = 5 tax_rate = 0.1 # Perform calculation total_cost = (price * quantity) * (1 + tax_rate) # Print result print(f"The total cost of the items is: ${total_cost}")
When you run this cell, the cluster processes the variables and displays the text: The total cost of the items is: $550.0.
Working with Variables Across Cells
A key feature of Databricks is state persistence. This means that if you define a variable in one cell, it remains available in every subsequent cell in that notebook as long as the cluster is running.
For example, if you create a new cell below the previous one and simply type print(total_cost), it will still remember the value was 550.0. If you restart the cluster or "Clear State," you will need to run the cells again from the top to re-initialize those variables.
Comments and Documentation
In Python cells, any line starting with a # is a comment. These are ignored by the cluster but are essential for explaining your logic to teammates. Using comments allows you to maintain professional code standards within the collaborative environment of the Workspace.
1. Which keyboard shortcut allows you to run a cell and stay inside that same cell?
2. What happens to a variable defined in Cell 1 when you try to use it in Cell 2?
Bedankt voor je feedback!
Vraag AI
Vraag AI
Vraag wat u wilt of probeer een van de voorgestelde vragen om onze chat te starten.