Importing Sample Data for Practice
Pyyhkäise näyttääksesi valikon
Data Ingestion is the process of bringing data from outside sources into your Databricks environment. Using the Data Ingestion UI, you can transform a raw file, like a CSV, into a structured table in your Catalog with just a few clicks.
You have your Workspace set up, and your Cluster is running. Now, we need something to work with. In the real world, data might come from streaming sensors or massive cloud databases, but most data projects start with a simple file. In this chapter, we will use the modern Data Ingestion capability to upload a CSV file and turn it into a permanent table in your Catalog.
There are various ways to ingest data in Databricks, some more advanced and complicated than others (for example you can set up your own endpoints in your cloud provider, or connect with third party applications). In this chapter, we are exploring the most basic one: uploading data from your own computer, to get you started.
Accessing Data Ingestion
There are two quick ways to find this tool:
- Click the "New" button at the top of the sidebar and select "File Upload".
- Alternatively, go to the Catalog tab and click the "Create Table" button (often represented by a plus sign).
Step 2: Uploading the File
Once you are in the upload interface, you can drag and drop your file or browse your computer.
- The Scenario: for this exercise, we are using a sample file called
diamonds.csv; - The Upload: once the file is uploaded, Databricks will store it temporarily in a "staging" area while it prepares to move it into the Catalog.
Step 3: Configuring the Table (The Preview)
This is where the "magic" happens. Databricks will show you a preview of your data.
- Catalog and Schema: you must choose where the table will live. For now, we will use the
workspacecatalog and thedefaultschema; - Table Name: give your table a clear name, such as
diamonds; - Data Types: Look at the columns. Databricks automatically guesses if a column is a "String" (text), an "Integer" (number), or a "Timestamp" (date). If it guesses wrong, you can manually change the data type right here in the UI.
Step 4: Creating the Table
Click Create Table. Databricks will now start a small background job (using your cluster) to read the CSV and write it as a high-performance Delta Table. Once finished, you will be taken to the Table UI, where you can see the schema, sample data, and even who has permission to view it.
Congratulations! You have successfully moved data from your personal computer into the cloud-native Lakehouse.
1. When you upload a CSV file using the Data Ingestion UI, what does Databricks turn that file into?
2. Why is the "Preview" step important during the data ingestion process?
3. If you want to find your newly created table later, which sidebar tab should you visit?
Kiitos palautteestasi!
Kysy tekoälyä
Kysy tekoälyä
Kysy mitä tahansa tai kokeile jotakin ehdotetuista kysymyksistä aloittaaksesi keskustelumme