Refactoring the Lastpass Data Warehouse Using Databricks and DBT

This case study explores how Closeloop refactored LastPass’s legacy data warehouse using Databricks and dbt to address fragmented data pipelines poor data quality and performance bottlenecks by implementing a scalable Medallion architecture automated data ingestion modular transformations and strong data governance resulting in faster reporting improved data accuracy enhanced compliance and a reliable foundation for self-service analytics.

Feb 02, 2026 10 Minutes Read Password Management Agency

LastPass is a leading global password management platform, securely stores millions of user credentials, personal identifiers, and sensitive enterprise records. As data volumes surged beyond hundreds of TB/day across multiple systems — including the LastPass site data, Salesforce, Boss, Marketo, Segment, and AWS S3 storage — the company's existing Databricks setup faced significant performance, scalability, and governance challenges.

To address this, the Closeloop Data Engineering Team undertook a comprehensive refactoring of LastPass's Databricks environment, leveraging the Medallion Architecture (Bronze, Silver, and Gold layers) and implementing a DBT-based transformation layer at the Silver and Gold stages.

This project modernized data handling, ensured GDPR-compliant PII segregation, improved system performance, and introduced a scalable, documented, and governed data architecture.

Databricks Development

Modernizing the LastPass Data Warehouse

The objective of this project was to modernize the LastPass analytics infrastructure by restructuring the existing Databricks environment and implementing a scalable Medallion architecture powered by DBT transformations.



LastPass operates a large-scale data ecosystem that collects information from multiple operational systems including application logs, Salesforce, Boss, Marketo, Segment, and AWS S3 storage. As the platform grew globally, the existing data pipelines became difficult to manage due to fragmented data models, inconsistent transformations, and limited governance.

Closeloop's Data Engineering team refactored the entire Databricks environment by implementing a structured data pipeline based on the Medallion Architecture (Bronze, Silver, and Gold layers). Raw data ingestion pipelines were optimized, transformations were rebuilt using DBT, and a governed analytics layer was introduced to support faster and more reliable reporting.

The new architecture also introduced strong governance controls including PII data segregation, schema documentation, lineage tracking, and automated testing. This transformation enabled LastPass to scale their analytics platform while ensuring security, reliability, and faster business insights.

70+ TB

Total Data Managed

120 GB

Daily Data Ingestion

Medallion

Architecture Model

DBT

Transformation Framework

Foundation

Medallion Architecture

Reorganized fragmented pipelines into clearly defined Bronze, Silver, and Gold layers for governance and scale.

Transformation

DBT-Driven Models

Introduced governed transformation logic, lineage visibility, and standardized model development across analytics layers.

Security

PII-Safe Data Flows

Separated sensitive information early in the pipeline to support GDPR requirements and stronger operational controls.

Outcome

Faster Analytics Delivery

Improved ingestion performance, reduced manual intervention, and enabled more reliable reporting across the business.

Fragmented Data Impacting Insights

LastPass manages vast amounts of sensitive data, including user credentials, audit logs, billing details, and partner integrations. Over time, its data environment had grown organically, leading to fragmentation and a lack of standardization.

Unstructured Data Organization

Data from multiple sources such as Salesforce, Marketo, Segment, and AWS S3 lacked a consistent structure, making it difficult to organize raw data and maintain reliable analytics pipelines.

Ingestion and Pipeline Instability

Data ingestion pipelines frequently failed during peak loads, causing delays in processing and affecting downstream analytics and reporting workflows.

Governance Gaps & PII Exposure

Sensitive information such as user credentials and personal identifiers required strict governance policies and secure segregation to meet compliance standards including GDPR.

Lack of Documentation and Ownership

Data models and transformations lacked clear documentation and lineage, making it difficult for teams to understand pipeline logic and maintain the system efficiently.

Performance Bottlenecks

Inefficient SQL queries and heavy joins significantly slowed down analytics processing and increased query execution times.

Multi-System Integrations

Integrating data from multiple operational platforms required a scalable architecture capable of handling large volumes while maintaining consistency and reliability.

Why the Existing Warehouse Needed a Structural Reset

As data volume, source-system variety, and compliance requirements increased, the warehouse could no longer support reliable ingestion, trusted transformations, and scalable analytics without a foundational redesign.

Current state data warehouse architecture

Modernized Data Pipeline Architecture

Closeloop redesigned the LastPass data platform using Databricks, Medallion Architecture, DBT, and Unity Catalog — replacing fragmented pipelines with a governed, scalable, and analytics-ready lakehouse that processes over 120 GB of data every day.

Architecture Overview

Three-Layer Medallion Pipeline

Every byte flows through a structured Bronze → Silver → Gold pipeline, ensuring clean ingestion, trusted transformations, and business-ready analytics at every stage.

Bronze — Raw ingestion, full source fidelity & PII masking
Silver — Cleaned, deduplicated & cross-source joined
Gold — Business-ready datasets for dashboards & analytics
Databricks Medallion Architecture Diagram
01

Centralized Data Ingestion

14 enterprise systems — Salesforce, Marketo, Segment, Gainsight, PayPal, Stripe, Pendo, and AWS S3 — ingested via REST API, S3 pull, FTP, and JDBC. Incremental loading patterns minimize reprocessing overhead and keep pipelines fault-tolerant.

REST API S3 Pull JDBC FTP
02

Bronze Layer – Raw Storage

Acts as the immutable system of record. All incoming data lands in Delta Lake partitioned by source and ingestion date. PII fields are identified and masked at this stage before any downstream transformation begins, enforcing compliance from day one.

PII Masking Delta Lake Audit Trail
03

Silver Layer – Clean & Enriched

DBT models clean, validate, and deduplicate raw records into consistent schemas. Cross-source joins merge Salesforce accounts with Gainsight health scores and Pendo usage events. Data quality tests embedded in every model catch anomalies before they reach reporting.

DBT Models Quality Tests Cross-joins
04

Gold Layer – Business Analytics

Pre-computed Delta tables deliver MRR, churn rates, customer health scores, and product adoption trends. Each Gold model is owned by a business domain — Finance, Sales, or Customer Success — reducing time-to-insight and improving stakeholder accountability.

Dashboards Domain Owned Fast Insights
05

Governance & Performance

Role-based access controls, automated DBT lineage tracking, and schema documentation govern every layer. Spark query plans were profiled and optimized; Z-ORDER clustering applied to high-cardinality Delta tables. Pipeline restructuring improved end-to-end throughput by over 40%.

RBAC Z-ORDER +40% Throughput
06

DBT Transformation Layer

All transformation logic re-implemented as versioned, modular SQL models — replacing ad-hoc notebook scripts. Not-null, uniqueness, and referential integrity tests run on every model. Auto-generated DBT docs create a shared catalog with full column-level lineage. CI/CD validates changes in staging before production promotion.

SQL Models CI/CD DBT Docs
Step 07

Databricks
Unity Catalog

Unity Catalog for Databricks

Central Governance Layer

Databricks Unity Catalog was adopted as the single governance layer across the entire data platform. It provides a unified metastore for all assets — tables, views, volumes, and ML models — with fine-grained access control at the catalog, schema, and table level. Column-level security automatically enforces PII masking policies, while built-in lineage tracks the end-to-end data flow from raw ingestion through Bronze, Silver, and Gold. Audit logs captured via Unity Catalog support compliance reporting and proactive data governance without any custom tooling overhead.

Unified Metastore Column-level Security Data Lineage Audit Logs Fine-grained RBAC

Bronze

Immutable raw ingestion with Delta Lake partitioning, PII masking, and full source fidelity.

Silver

Cleaned, deduplicated, and enriched data modelled via DBT for quality and cross-source consistency.

Gold

Domain-owned, business-ready Delta tables powering dashboards, KPIs, and stakeholder reporting.

DBT

Versioned SQL models with built-in tests, CI/CD pipelines, and auto-generated data documentation.

Unity Catalog

Centralized governance with column-level security, lineage tracking, and compliance audit logs.

End-to-End Data Processing Architecture

The redesigned platform processes data through multiple Medallion layers, ensuring reliable ingestion, transformation, and analytics delivery.

Source Systems

Salesforce, Marketo, Segment

Bronze Layer

Raw Data Storage

Silver Layer

Cleaned & Transformed Data

Gold Layer

Analytics Ready Data

Analytics

Dashboards & BI

How Data Flows Across the Platform

The system processes data through a structured Medallion architecture, transforming raw inputs into analytics-ready insights.

Sources

APIs

AWS S3

CRM / Tools

Bronze

Raw Data

PII Segregation

Silver (DBT)

Cleaned Data

SCD-II + Validation

Gold

Business Data

Analytics Ready

Consumption

Dashboards

Insights

Tech Stack Overview

The platform is built using a layered stack of modern technologies, enabling scalable ingestion, transformation, and analytics.

Data Ingestion

Source systems and transfer mechanisms that bring raw enterprise data into the platform.

APIs AWS S3 SFTP

Silver Layer Transformations

Transformation logic that cleans, validates, and standardizes raw data for downstream use.

DBT SCD-II Data Validation

Gold Layer Models

Business-ready datasets designed to support trusted reporting, metrics, and decision-making.

DBT Models Aggregations

Analytics & BI

Visualization and reporting tools used to surface insights from curated data layers.

Power BI Tableau

Ensuring Data Security & Compliance

The platform enforces strong governance policies, ensuring secure access, compliance with regulations, and complete data transparency.

Role-Based Access

Granular access control for Admins, Engineers, Analysts, and Compliance teams.

PII Protection

Sensitive data is segregated and masked to ensure GDPR compliance.

Encryption

AES-256 encryption ensures secure storage and transmission of sensitive data.

Audit & Monitoring

All data access is tracked with logs and alerts for full transparency.

Data Lineage

DBT-powered lineage tracking ensures visibility across pipelines.

Dynamic Masking

Critical fields like email and credentials are masked dynamically.

Access Control Model

Admin

Full Access

Engineer

Bronze & Silver (Read/Write)

Analyst

Gold Layer (Read Only)

Compliance

PII Access (Masked)

Business Impact and Results

The refactored architecture significantly improved performance, reliability, and operational efficiency across the data platform.

50%

Faster Data Ingestion

47%

Improved Query Performance

99.8%

Pipeline Success Rate

80%

Reduction in Manual Effort

Metric
Before
After
Ingestion Time
4 Hours
2 Hours
Query Speed
5.7 sec
3.0 sec
Success Rate
87%
99.8%
Manual Work
Daily
Weekly

Operational Efficiency

Automation and better orchestration reduced repetitive manual work across ingestion monitoring and data preparation workflows.

Trust in Data

Clearer lineage, governance controls, and standardized transformations improved confidence in downstream reporting outputs.

Scalability Readiness

The redesigned lakehouse foundation positioned the platform to absorb future growth without repeating legacy performance bottlenecks.

Data Sources & Retrieval Methods

Thirteen enterprise and third-party systems feed the LastPass data platform — each with a distinct retrieval protocol, format, and ingestion cadence before landing in the Bronze layer.

14

Total Sources

9

REST APIs

2

S3 Buckets

1

FTP Transfer

2

Internal Table

REST API AWS S3 FTP Internal DB

Zoomin

Documentation Platform
Daily Batch

Structured content exports pulled from remote FTP server and landed to AWS S3 for Bronze ingestion.

FTP Pull CSV / JSON Bronze

Salesforce

CRM Platform
Scheduled

Account, contact, and opportunity records extracted via SOQL over REST API with OAuth 2.0 and incremental LastModifiedDate filtering.

SOQL / REST API JSON / CSV Bronze → Silver

PayPal

Payment Processor
Real-Time

Transaction, refund, and billing events via OAuth 2.0 webhooks in real-time, with daily batch reconciliation. PII masked at Bronze.

REST API + Webhooks JSON PII Masked

Gainsight

Customer Success
Daily Batch

Health scores, NPS, and engagement metrics via paginated API queries. Scores merged with Salesforce at the Silver layer.

REST API + API Key JSON Silver Merge

IP2Geo

Geolocation Service
On-Demand

IP-to-region mapping called synchronously during Bronze→Silver enrichment. Results cached in Delta Lake to minimize repeat calls.

REST API JSON Delta Cached

Stripe

Payment Platform
Real-Time

Subscription, invoice, and transaction events via API Key webhooks with batch backfill. PII masked at Bronze ingestion.

REST API + Webhooks JSON PII Masked

Boss

Internal Data Store
Daily Batch

Operational exports delivered to date-partitioned S3 buckets, pulled via boto3 into the Bronze landing zone with incremental partition scanning.

S3 File Pull CSV / Parquet Bronze

Currency Conversion

FX Rate Service
On-Demand

Real-time and historical FX rates called during Silver-layer enrichment. Rate table cached in Delta Lake to avoid duplicate calls.

REST API JSON Delta Cached

Cybersource

Payment & Fraud Mgmt
Scheduled

Authorization, settlement, and risk-decision records via HTTP Signature auth. Fraud scores joined with PayPal and Stripe at the Gold layer.

REST API + HTTP Sig JSON Gold Join

Pendo

Product Analytics
Scheduled

Feature usage, page views, and guidance events via paginated Integration Key queries. Joined with Salesforce at Silver for adoption metrics.

REST API + Int. Key JSON Silver Join

Chase

Financial Data
Daily Batch

Financial transaction exports delivered to S3, pulled via boto3 into Bronze. Date-partitioned with Delta Lake incremental path scanning.

S3 File Pull CSV / Parquet Bronze

SSO

Internal Auth Tables
Scheduled

LastPass SSO authentication events, session records, and identity data extracted via JDBC SQL queries with timestamp-based incremental loading.

JDBC / SQL Delta / Tabular PII Masked

GetFeedback

Survey & Feedback
Daily Batch

NPS, CSAT, and open-text survey responses via OAuth 2.0 paginated pull. Scores merged with Gainsight health data at the Gold layer.

REST API + OAuth 2.0 JSON Gold Merge

Marketo

Marketing Automation
Scheduled

Lead, contact, and campaign activity records extracted from Marketo internal tables via scheduled SQL queries with incremental timestamp-based loading.

Internal Table Delta / Tabular Bronze → Silver

Building a Scalable & Future-Ready Data Platform

The transformation of the LastPass data warehouse marked a significant shift from a fragmented and performance-heavy system to a scalable, governed, and high-performing lakehouse architecture.

Modern Foundation

By implementing the Medallion Architecture and leveraging Databricks with DBT, the platform now supports reliable data ingestion, efficient transformations, and faster analytics delivery.

Better Governance

The introduction of governance controls, data lineage, and PII protection ensures compliance while maintaining flexibility for future growth.

Operational Impact

This modernization not only improved performance and reduced operational complexity but also enabled data teams to focus on delivering meaningful insights rather than managing pipeline inefficiencies.

As data continues to grow, the new architecture positions LastPass to scale seamlessly, adapt quickly to new business requirements, and unlock the full potential of data-driven decision-making.

Explore the complete journey from problem to results—backed by real data and insights.

Client Value & Feedback

"Closeloop helped us bring structure and reliability to a complex data environment. The Databricks and DBT refactor improved pipeline performance, gave us much better visibility into our data models, and created a stronger foundation for future analytics initiatives."

Head of Data & Integrations, LastPass

Frequently Asked Questions

No questions match your search.

Databricks and dbt help businesses modernize outdated data pipelines by improving scalability, automating transformations, and streamlining analytics workflows. Legacy systems often struggle with slow processing, inconsistent reporting, and growing maintenance costs. By implementing cloud-native architectures and structured transformation frameworks, organizations can process large volumes of data more efficiently, improve reporting accuracy, and reduce operational bottlenecks. Closeloop Technologies helps enterprises build modern, scalable data platforms that support long-term business growth and advanced analytics capabilities.

Refactoring existing data architectures allows businesses to improve performance and scalability without completely disrupting current operations. Rebuilding from scratch can be expensive, time-consuming, and risky for organizations that rely heavily on continuous data availability. Refactoring helps optimize existing workflows, remove inefficiencies, and integrate modern technologies such as Databricks and dbt while preserving critical business processes. Closeloop Technologies follows a strategic modernization approach that minimizes downtime and maximizes operational efficiency.

Modern data engineering solutions improve reporting by organizing and processing data in a structured and scalable way. Businesses can generate faster dashboards, improve data consistency, and reduce delays caused by fragmented systems or manual processes. Technologies like Databricks and dbt help automate data transformations and create analytics-ready datasets that support real-time decision-making. Closeloop Technologies builds high-performance reporting systems that help organizations gain faster and more reliable business insights.

Legacy data systems often create challenges such as slow query performance, inconsistent data quality, complex maintenance requirements, and limited scalability. As data volumes grow, these systems become harder to manage and can delay business-critical reporting and analytics. Organizations also face difficulties integrating modern tools and supporting real-time data processing. Closeloop Technologies helps businesses overcome these limitations by implementing scalable cloud-native data engineering solutions designed for long-term efficiency and growth.

DBT simplifies and standardizes the data transformation process by allowing teams to create modular, reusable, and version-controlled transformation logic. It helps businesses improve data quality, automate testing, and maintain better visibility into data workflows. By organizing transformations more efficiently, companies can reduce manual effort and accelerate analytics delivery. Closeloop Technologies uses dbt to build maintainable and scalable transformation pipelines that improve operational efficiency and business intelligence capabilities.

Real-time and near real-time data processing allow businesses to make faster decisions based on the most current information available. Delayed reporting can impact operational efficiency, customer experience, and strategic planning. Modern data engineering platforms enable organizations to process incoming data continuously and generate insights more quickly. Closeloop Technologies helps businesses implement high-performance data architectures that support faster analytics, improved responsiveness, and better decision-making.

Optimized data pipelines reduce operational costs by improving processing efficiency, minimizing redundant workflows, and decreasing the amount of manual intervention required to manage data systems. Businesses can process data faster, reduce infrastructure waste, and improve resource utilization. Automated transformation frameworks also reduce maintenance overhead and improve overall reliability. Closeloop Technologies designs optimized data engineering solutions that help organizations lower operational expenses while improving system performance.

Yes, modern data architectures are designed to integrate seamlessly with existing enterprise systems, applications, and analytics platforms. Businesses do not always need to replace their entire infrastructure to modernize data operations. Cloud-native solutions can connect with CRMs, ERPs, BI tools, APIs, and third-party platforms while improving overall performance and accessibility. Closeloop Technologies specializes in building integrated data ecosystems that enhance operational continuity and maximize technology investments.

Download Full Case Study

Fill in the form below to get instant access to the full case study PDF.