Architecture¶

Divided into two classic planes

Control Plane	Data Plane
Managed	User control
Databricks cloud account	User cloud account
Web App, Config, Notebooks, Repos, DBSQL	Data, DBFS Root (Cloud storage)
Cluster manager	Cluster

Serverless Data Plane¶

Databricks managed cluster
available only to users of DBSQL
Pros: instant (no cluster provisioning), minimal configuration, automatic software updates, reduced idle time, no over-provisioning
Elastic: scales up and down automatically

Medallion architecture
Bronze: Raw ingestion, historical, source schema validated
Silver: cleansed, validated, conformed
Gold: business-level aggregates, analytical model, reporting model
DW layers to
Delta Live Tables (DLT): declarative, full and incremental refresh, dependency management, checkpoint restart
Databricks Workflows: pipeline, written using DLT, dbt or other tools
Streaming
- DLT, Spark Structured Streaming
ML: built-in
- Frameworks: TensorFlow, Spark, Keras, XGBoost
- Distributed training: Spark, TensorFlow
- AutoML and hyperparameter tuning
- GPU acceleration

Photon: Proprietary execution engine written in C++ which is not available to Apache Spark

Photon: An optimized C based execution engine, much faster than standard Spark JVM based engine
workers: type (CSP node type), min and max workers
driver: type, defaults to same as worker node
enable auto-scaling: scale up to max workers
terminate after: n minutes of activity
Access mode:
- Single user:
  - Allows credential passthrough, that is, logged-in user's credentials are used to access object storage
- Multi user:
Databricks Runtime: two categories, Standard and ML, with various versions available
- Some ML runtimes support GPU