All purpose compute -R/W/X - More expensive
Serverless version of all purpose compute
All purpose is also known as Classic Compute.
Classic Compute - VMs, Databricks Consumption DBU/hr.
Job Compute - R/X - Cheaper
Serverless version of Job Compute
You can't run Scala/R on Serverless compute.
Serverless DBU cost is higher as VM is in-built into it.
RDD - Resilient, Dataset, Distributed
Worker dies, it can recreate data partition and keep running. RDD keeps extra RAM available.
Vector Search - Word embeddings. Array of floats. Specialized engine to build index of those numbers.
Pools - Pool of VMs that you need to be paying for. Classic compute scenario. Pools have gone away.
Serverless Compute - cheaper version
Serverless Compute - performance optimized version - usually 5 seconds
Cluster - Drivers and Worker Nodes. Single node cluster - driver is the worker. SkLearn, Pandas consume driver memory.
Use Job or Serverless clusters in production. Avoid interactive clusters in prod. Enable Photon for faster and cheaper execution. Reuse clusters to reduce startup time and cost.
Serverless - Photon engine, Predictive IO, Intelligent Workload Management
Pro - Photon, Predictive IO
Classic - Photon engine
Performance considerations: SKEW/SPILL/STORAGE/SHUFFLE/SERIALIZATION
Adaptive Query Execution helps code optimization
Row Filter: