Tuesday, December 16, 2025

Databricks Delta Sharing (D2O) with Open Delta Sharing – A Practical, Step‑by‑Step Guide for Data Engineers

Data products only create value when they can be shared and consumed easily, securely, and at scale. Delta Sharing was designed exactly for that: an open, cross‑platform protocol that lets you share live data from your databricks lakehouse with any downstream tool or platform over HTTPS, without copies or custom integrations.

In this blog post, I walk through Databricks‑to‑Open (D2O) Delta Sharing using Open Delta Sharing in a practical, step‑by‑step way. The focus is on helping data teams move from theory to a concrete implementation pattern that works in real projects.

What the article covers:

  • How Delta Sharing fits into a modern data collaboration strategy and when to choose Open Sharing (D2O) over Databricks‑to‑Databricks (D2D).
  • The core workflow: creating recipients, configuring authentication (bearer token or federated/OIDC), defining shares in Unity Catalog, and granting access to tables and views.
  • How external consumers can connect using open connectors (Python/pandas, Apache Spark, Power BI, Tableau, Excel and others) without needing a Databricks workspace.
  • Security, governance, and operational considerations such as token TTL, auditing activity, and avoiding data duplication by sharing from your existing Delta Lake and Parquet data.
  • Whether you are building a data‑as‑a‑service offering, exposing governed data products to partners, or just trying to simplify ad‑hoc external access, D2O can significantly reduce friction and integration work

Here is a step-by-step guide to Databricks Delta Sharing using Open Delta Sharing (D2O).

1. Create Recipient

2. Create Delta Share and assign Recipients


You can create a OIDC Federation Authentication or Token based authentication for your recipients.

Tables with RLS and column masks cannot be shared using Delta Sharing.



Select Recipient you had created prior. 



Additional Information:







Thursday, December 11, 2025

Databricks Training Notes - Compute

All purpose compute -R/W/X - More expensive

Serverless version of all purpose compute

All purpose is also known as Classic Compute.

Classic Compute - VMs, Databricks Consumption DBU/hr.

Job Compute - R/X - Cheaper

Serverless version of Job Compute

You can't run Scala/R on Serverless compute.

Serverless DBU cost is higher as VM is in-built into it.

RDD - Resilient, Dataset, Distributed

Worker dies, it can recreate data partition and keep running. RDD keeps extra RAM available.

Vector Search - Word embeddings. Array of floats. Specialized engine to build index of those numbers.

Pools - Pool of VMs that you need to be paying for. Classic compute scenario. Pools have gone away.

Serverless Compute - cheaper version

Serverless Compute - performance optimized version - usually 5 seconds

Cluster - Drivers and Worker Nodes. Single node cluster - driver is the worker. SkLearn, Pandas consume driver memory.

Use Job or Serverless clusters in production. Avoid interactive clusters in prod. Enable Photon for faster and cheaper execution. Reuse clusters to reduce startup time and cost.

Serverless - Photon engine, Predictive IO, Intelligent Workload Management

Pro - Photon, Predictive IO

Classic - Photon engine

Performance considerations: SKEW/SPILL/STORAGE/SHUFFLE/SERIALIZATION 

Adaptive Query Execution helps code optimization

Row Filter:

CREATE OR REPLACE FUNCTION device_filter(device_id INT)
  RETURN IF(IS_ACCOUNT_GROUP_MEMBER('admin'), true, device_id < 30);

ALTER TABLE silver
SET ROW FILTER device_filter ON (device_id);

SELECT *
FROM silver
ORDER BY device_id DESC;


SELECT
  *,
  cast(from_unixtime(user_first_touch_timestamp/1000000) AS DATE) AS first_touch_date
FROM read_files(
  "/Volumes/dbacademy_ecommerce/v01/raw/users-historical",
  format => 'parquet')
LIMIT 10;