Managing Files in the Workspace
Swipe um das Menü anzuzeigen
In Databricks, there is a clear distinction between Workspace Files (your notebooks and code) and Data Objects (your tables and raw files). The Catalog is the modern gateway used to manage and discover these data objects.
One of the first things you need to learn is that Databricks has "two sides to the house." One side is for your work - your scripts and notebooks. The other side is for the actual data you are analyzing. Understanding where each lives will save you a lot of frustration when you start writing code.
Workspace Files: Where your code lives
When you click on the Workspace tab in the sidebar, you are looking at a file system for your logic.
- This is where you create folders, sub-folders, and notebooks.
- You can also store non-notebook files here, like small Python scripts or requirement files.
- Important: these are not "data tables." You don't store a 100GB CSV file here. This area is for your intellectual property - the code that tells Databricks what to do.
The Catalog: Where your data lives
When you want to see your data, you go to the Catalog tab. In the past, Databricks relied heavily on something called DBFS (Databricks File System). While you might still see references to DBFS in older documentation, it is now considered a legacy approach.
Today, we use the Catalog (powered by Unity Catalog). This provides a structured, "SQL-like" way to view your data:
- Unity Catalogs: a logical grouping (e.g., production_data or marketing_data) of schemas;
- Schemas (or Databases): a way to organize tables within a catalog, as well as Volumes (see below), ML models and functions;
- Tables: the actual rows and columns you will query.
Volumes: Handling Raw Files
Sometimes you have data that isn't a table yet - like a raw CSV or an image file. In the modern Databricks UI, these are stored in Volumes. Think of a Volume as a bridge between the old "folder" way of thinking and the new, secure "Catalog" way of thinking. You can browse these volumes directly inside the Catalog UI to see your raw files before they are loaded into tables.
Why does the distinction matter?
It all comes down to Security and Performance. By keeping code in the Workspace and data in the Catalog, Databricks allows administrators to give a user permission to edit a notebook without necessarily giving them permission to see the sensitive data inside a table. This "separation of concerns" is what makes Databricks an enterprise-grade platform.
1. If you want to create a new folder to organize your Python Notebooks, which sidebar tab should you use?
2. What is the modern, recommended way to manage and discover data tables in Databricks?
3. Which legacy term might you see in older Databricks documentation that is now being replaced by the Catalog and Volumes?
Danke für Ihr Feedback!
Fragen Sie AI
Fragen Sie AI
Fragen Sie alles oder probieren Sie eine der vorgeschlagenen Fragen, um unser Gespräch zu beginnen