Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Leer Importing Sample Data for Practice | Setting Up the Workspace
Databricks Fundamentals: A Beginner's Guide

bookImporting Sample Data for Practice

Veeg om het menu te tonen

Note
Definition

Data Ingestion is the process of bringing data from outside sources into your Databricks environment. Using the Data Ingestion UI, you can transform a raw file, like a CSV, into a structured table in your Catalog with just a few clicks.

You have your Workspace set up, and your Cluster is running. Now, we need something to work with. In the real world, data might come from streaming sensors or massive cloud databases, but most data projects start with a simple file. In this chapter, we will use the modern Data Ingestion capability to upload a CSV file and turn it into a permanent table in your Catalog.

Note
Note

There are various ways to ingest data in Databricks, some more advanced and complicated than others (for example you can set up your own endpoints in your cloud provider, or connect with third party applications). In this chapter, we are exploring the most basic one: uploading data from your own computer, to get you started.

Accessing Data Ingestion

There are two quick ways to find this tool:

  • Click the "New" button at the top of the sidebar and select "File Upload".
  • Alternatively, go to the Catalog tab and click the "Create Table" button (often represented by a plus sign).

Step 2: Uploading the File

Once you are in the upload interface, you can drag and drop your file or browse your computer.

  • The Scenario: for this exercise, we are using a sample file called diamonds.csv;
  • The Upload: once the file is uploaded, Databricks will store it temporarily in a "staging" area while it prepares to move it into the Catalog.

Step 3: Configuring the Table (The Preview)

This is where the "magic" happens. Databricks will show you a preview of your data.

  • Catalog and Schema: you must choose where the table will live. For now, we will use the workspace catalog and the default schema;
  • Table Name: give your table a clear name, such as diamonds;
  • Data Types: Look at the columns. Databricks automatically guesses if a column is a "String" (text), an "Integer" (number), or a "Timestamp" (date). If it guesses wrong, you can manually change the data type right here in the UI.

Step 4: Creating the Table

Click Create Table. Databricks will now start a small background job (using your cluster) to read the CSV and write it as a high-performance Delta Table. Once finished, you will be taken to the Table UI, where you can see the schema, sample data, and even who has permission to view it.

Congratulations! You have successfully moved data from your personal computer into the cloud-native Lakehouse.

1. When you upload a CSV file using the Data Ingestion UI, what does Databricks turn that file into?

2. Why is the "Preview" step important during the data ingestion process?

3. If you want to find your newly created table later, which sidebar tab should you visit?

question mark

When you upload a CSV file using the Data Ingestion UI, what does Databricks turn that file into?

Select the correct answer

question mark

Why is the "Preview" step important during the data ingestion process?

Select the correct answer

question mark

If you want to find your newly created table later, which sidebar tab should you visit?

Select the correct answer

Was alles duidelijk?

Hoe kunnen we het verbeteren?

Bedankt voor je feedback!

Sectie 2. Hoofdstuk 6

Vraag AI

expand

Vraag AI

ChatGPT

Vraag wat u wilt of probeer een van de voorgestelde vragen om onze chat te starten.

Sectie 2. Hoofdstuk 6
some-alt