Creating Your First Compute Resource
Scorri per mostrare il menu
Creating a compute resource (Cluster) is the act of provisioning virtual hardware in the cloud to execute your data tasks. For learning purposes, we use a Single Node configuration to balance performance and cost.
It's time to turn on the "engine." In this chapter, we will walk through the exact steps to create your first cluster. This is the resource that will allow you to run the SQL and Python code we'll be writing later in the course. Follow these steps carefully to ensure your environment is set up correctly and cost-effectively.
Databricks allows you to create more specialised clusters, such as the job compute which is better for workflows. Although the purpose of this chapter is to go through the basics, and therefore we will explore the creation of an all-purpose cluster, everything applies to the creation and handling of other cluster categories as well.
Step 1: Accessing the Compute Menu
On the left-hand sidebar, click on the Compute icon. This will take you to the compute overview page. In the top right corner, click the blue button labeled Create Compute.
Step 2: Choosing the Cluster Type
You will see two main options at the top: Multi Node and Single Node.
- Select Single Node. * Why? Multi-node clusters are for bigger to massive, enterprise-scale data. For learning, a Single Node cluster is much cheaper (or even free in some editions) and provides plenty of power for the datasets we will be using.
Step 3: Naming and Runtime
- Name: give your cluster a clear name, like
Student_Cluster_1; - Databricks Runtime Version: this dropdown determines the "engine" version. Look for the latest version that has LTS next to it. LTS stands for "Long Term Support." It is the most stable version and the one most companies use for their real-world projects.
Step 4: Configuring the "Hardware"
Under Node Type, you will see a list of cloud virtual machines (like Standard_DS3_v2 on Azure or i3.xlarge on AWS).
- For this course, the default selection is usually fine;
- Ensure it has at least 15GB of Memory if you plan on doing more advanced data science later, but for basic SQL and Python, the smallest available option is often sufficient.
Step 5: The Most Important Step - Auto-Termination
Look for the checkbox labeled "Terminate after ___ minutes of inactivity."
- Set this to 20 minutes;
- As we discussed in the previous chapter, this is your safety net. If you finish your work and close your laptop but forget to turn off your cluster, Databricks will detect that no code is running and automatically shut down the "engine" after 20 minutes to stop the billing clock.
Step 6: Create and Wait
Click Create Compute. You will see a spinning solid circle next to your cluster name. It usually takes 3 to 5 minutes for the cloud provider to "warm up" the servers. Once the circle turns into a green checkmark or a green "Running" status, your engine is ready to go!
1. What is the correct way to access the menu for creating a new compute resource (cluster) in Databricks?
2. When setting up your first Databricks cluster for this course, why should you choose a Single Node cluster over a Multi Node cluster?
Grazie per i tuoi commenti!
Chieda ad AI
Chieda ad AI
Chieda pure quello che desidera o provi una delle domande suggerite per iniziare la nostra conversazione