Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Apprendre Managing Files in the Workspace | Setting Up the Workspace
Databricks Fundamentals: A Beginner's Guide

bookManaging Files in the Workspace

Glissez pour afficher le menu

Note
Definition

In Databricks, there is a clear distinction between Workspace Files (your notebooks and code) and Data Objects (your tables and raw files). The Catalog is the modern gateway used to manage and discover these data objects.

One of the first things you need to learn is that Databricks has "two sides to the house." One side is for your work - your scripts and notebooks. The other side is for the actual data you are analyzing. Understanding where each lives will save you a lot of frustration when you start writing code.

Workspace Files: Where your code lives

When you click on the Workspace tab in the sidebar, you are looking at a file system for your logic.

  • This is where you create folders, sub-folders, and notebooks.
  • You can also store non-notebook files here, like small Python scripts or requirement files.
  • Important: these are not "data tables." You don't store a 100GB CSV file here. This area is for your intellectual property - the code that tells Databricks what to do.

The Catalog: Where your data lives

When you want to see your data, you go to the Catalog tab. In the past, Databricks relied heavily on something called DBFS (Databricks File System). While you might still see references to DBFS in older documentation, it is now considered a legacy approach.

Today, we use the Catalog (powered by Unity Catalog). This provides a structured, "SQL-like" way to view your data:

  • Unity Catalogs: a logical grouping (e.g., production_data or marketing_data) of schemas;
  • Schemas (or Databases): a way to organize tables within a catalog, as well as Volumes (see below), ML models and functions;
  • Tables: the actual rows and columns you will query.

Volumes: Handling Raw Files

Sometimes you have data that isn't a table yet - like a raw CSV or an image file. In the modern Databricks UI, these are stored in Volumes. Think of a Volume as a bridge between the old "folder" way of thinking and the new, secure "Catalog" way of thinking. You can browse these volumes directly inside the Catalog UI to see your raw files before they are loaded into tables.

Why does the distinction matter?

It all comes down to Security and Performance. By keeping code in the Workspace and data in the Catalog, Databricks allows administrators to give a user permission to edit a notebook without necessarily giving them permission to see the sensitive data inside a table. This "separation of concerns" is what makes Databricks an enterprise-grade platform.

1. If you want to create a new folder to organize your Python Notebooks, which sidebar tab should you use?

2. What is the modern, recommended way to manage and discover data tables in Databricks?

3. Which legacy term might you see in older Databricks documentation that is now being replaced by the Catalog and Volumes?

question mark

If you want to create a new folder to organize your Python Notebooks, which sidebar tab should you use?

Select the correct answer

question mark

What is the modern, recommended way to manage and discover data tables in Databricks?

Select the correct answer

question mark

Which legacy term might you see in older Databricks documentation that is now being replaced by the Catalog and Volumes?

Select the correct answer

Tout était clair ?

Comment pouvons-nous l'améliorer ?

Merci pour vos commentaires !

Section 2. Chapitre 5

Demandez à l'IA

expand

Demandez à l'IA

ChatGPT

Posez n'importe quelle question ou essayez l'une des questions suggérées pour commencer notre discussion

Section 2. Chapitre 5
some-alt