Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lære Managing Files in the Workspace | Setting Up the Workspace
Databricks Fundamentals: A Beginner's Guide

bookManaging Files in the Workspace

Sveip for å vise menyen

Note
Definition

In Databricks, there is a clear distinction between Workspace Files (your notebooks and code) and Data Objects (your tables and raw files). The Catalog is the modern gateway used to manage and discover these data objects.

One of the first things you need to learn is that Databricks has "two sides to the house." One side is for your work - your scripts and notebooks. The other side is for the actual data you are analyzing. Understanding where each lives will save you a lot of frustration when you start writing code.

Workspace Files: Where your code lives

When you click on the Workspace tab in the sidebar, you are looking at a file system for your logic.

  • This is where you create folders, sub-folders, and notebooks.
  • You can also store non-notebook files here, like small Python scripts or requirement files.
  • Important: these are not "data tables." You don't store a 100GB CSV file here. This area is for your intellectual property - the code that tells Databricks what to do.

The Catalog: Where your data lives

When you want to see your data, you go to the Catalog tab. In the past, Databricks relied heavily on something called DBFS (Databricks File System). While you might still see references to DBFS in older documentation, it is now considered a legacy approach.

Today, we use the Catalog (powered by Unity Catalog). This provides a structured, "SQL-like" way to view your data:

  • Unity Catalogs: a logical grouping (e.g., production_data or marketing_data) of schemas;
  • Schemas (or Databases): a way to organize tables within a catalog, as well as Volumes (see below), ML models and functions;
  • Tables: the actual rows and columns you will query.

Volumes: Handling Raw Files

Sometimes you have data that isn't a table yet - like a raw CSV or an image file. In the modern Databricks UI, these are stored in Volumes. Think of a Volume as a bridge between the old "folder" way of thinking and the new, secure "Catalog" way of thinking. You can browse these volumes directly inside the Catalog UI to see your raw files before they are loaded into tables.

Why does the distinction matter?

It all comes down to Security and Performance. By keeping code in the Workspace and data in the Catalog, Databricks allows administrators to give a user permission to edit a notebook without necessarily giving them permission to see the sensitive data inside a table. This "separation of concerns" is what makes Databricks an enterprise-grade platform.

1. If you want to create a new folder to organize your Python Notebooks, which sidebar tab should you use?

2. What is the modern, recommended way to manage and discover data tables in Databricks?

3. Which legacy term might you see in older Databricks documentation that is now being replaced by the Catalog and Volumes?

question mark

If you want to create a new folder to organize your Python Notebooks, which sidebar tab should you use?

Select the correct answer

question mark

What is the modern, recommended way to manage and discover data tables in Databricks?

Select the correct answer

question mark

Which legacy term might you see in older Databricks documentation that is now being replaced by the Catalog and Volumes?

Select the correct answer

Alt var klart?

Hvordan kan vi forbedre det?

Takk for tilbakemeldingene dine!

Seksjon 2. Kapittel 5

Spør AI

expand

Spør AI

ChatGPT

Spør om hva du vil, eller prøv ett av de foreslåtte spørsmålene for å starte chatten vår

Seksjon 2. Kapittel 5
some-alt