This case study explores how Closeloop refactored LastPass’s legacy data warehouse using Databricks and dbt to address fragmented data pipelines poor data quality and performance bottlenecks by implementing a scalable Medallion architecture automated data ingestion modular transformations and strong data governance resulting in faster reporting improved data accuracy enhanced compliance and a reliable foundation for self-service analytics.
LastPass is a leading global password management platform, securely stores millions of user credentials, personal identifiers, and sensitive enterprise records. As data volumes surged beyond hundreds of TB/day across multiple systems — including the LastPass site data, Salesforce, Boss, Marketo, Segment, and AWS S3 storage — the company's existing Databricks setup faced significant performance, scalability, and governance challenges.
To address this, the Closeloop Data Engineering Team undertook a comprehensive refactoring of LastPass's Databricks environment, leveraging the Medallion Architecture (Bronze, Silver, and Gold layers) and implementing a DBT-based transformation layer at the Silver and Gold stages.
This project modernized data handling, ensured GDPR-compliant PII segregation, improved system performance, and introduced a scalable, documented, and governed data architecture.
The objective of this project was to modernize the LastPass analytics infrastructure by restructuring the existing Databricks environment and implementing a scalable Medallion architecture powered by DBT transformations.
LastPass operates a large-scale data ecosystem that collects information from multiple operational systems including application logs, Salesforce, Boss, Marketo, Segment, and AWS S3 storage. As the platform grew globally, the existing data pipelines became difficult to manage due to fragmented data models, inconsistent transformations, and limited governance.
Closeloop's Data Engineering team refactored the entire Databricks environment by implementing a structured data pipeline based on the Medallion Architecture (Bronze, Silver, and Gold layers). Raw data ingestion pipelines were optimized, transformations were rebuilt using DBT, and a governed analytics layer was introduced to support faster and more reliable reporting.
The new architecture also introduced strong governance controls including PII data segregation, schema documentation, lineage tracking, and automated testing. This transformation enabled LastPass to scale their analytics platform while ensuring security, reliability, and faster business insights.
Total Data Managed
Daily Data Ingestion
Architecture Model
Transformation Framework
Reorganized fragmented pipelines into clearly defined Bronze, Silver, and Gold layers for governance and scale.
Introduced governed transformation logic, lineage visibility, and standardized model development across analytics layers.
Separated sensitive information early in the pipeline to support GDPR requirements and stronger operational controls.
Improved ingestion performance, reduced manual intervention, and enabled more reliable reporting across the business.
LastPass manages vast amounts of sensitive data, including user credentials, audit logs, billing details, and partner integrations. Over time, its data environment had grown organically, leading to fragmentation and a lack of standardization.
Data from multiple sources such as Salesforce, Marketo, Segment, and AWS S3 lacked a consistent structure, making it difficult to organize raw data and maintain reliable analytics pipelines.
Data ingestion pipelines frequently failed during peak loads, causing delays in processing and affecting downstream analytics and reporting workflows.
Sensitive information such as user credentials and personal identifiers required strict governance policies and secure segregation to meet compliance standards including GDPR.
Data models and transformations lacked clear documentation and lineage, making it difficult for teams to understand pipeline logic and maintain the system efficiently.
Inefficient SQL queries and heavy joins significantly slowed down analytics processing and increased query execution times.
Integrating data from multiple operational platforms required a scalable architecture capable of handling large volumes while maintaining consistency and reliability.
As data volume, source-system variety, and compliance requirements increased, the warehouse could no longer support reliable ingestion, trusted transformations, and scalable analytics without a foundational redesign.
Closeloop redesigned the LastPass data platform using Databricks, Medallion Architecture, DBT, and Unity Catalog — replacing fragmented pipelines with a governed, scalable, and analytics-ready lakehouse that processes over 120 GB of data every day.
14 enterprise systems — Salesforce, Marketo, Segment, Gainsight, PayPal, Stripe, Pendo, and AWS S3 — ingested via REST API, S3 pull, FTP, and JDBC. Incremental loading patterns minimize reprocessing overhead and keep pipelines fault-tolerant.
Acts as the immutable system of record. All incoming data lands in Delta Lake partitioned by source and ingestion date. PII fields are identified and masked at this stage before any downstream transformation begins, enforcing compliance from day one.
DBT models clean, validate, and deduplicate raw records into consistent schemas. Cross-source joins merge Salesforce accounts with Gainsight health scores and Pendo usage events. Data quality tests embedded in every model catch anomalies before they reach reporting.
Pre-computed Delta tables deliver MRR, churn rates, customer health scores, and product adoption trends. Each Gold model is owned by a business domain — Finance, Sales, or Customer Success — reducing time-to-insight and improving stakeholder accountability.
Role-based access controls, automated DBT lineage tracking, and schema documentation govern every layer. Spark query plans were profiled and optimized; Z-ORDER clustering applied to high-cardinality Delta tables. Pipeline restructuring improved end-to-end throughput by over 40%.
All transformation logic re-implemented as versioned, modular SQL models — replacing ad-hoc notebook scripts. Not-null, uniqueness, and referential integrity tests run on every model. Auto-generated DBT docs create a shared catalog with full column-level lineage. CI/CD validates changes in staging before production promotion.
Databricks
Unity Catalog
Databricks Unity Catalog was adopted as the single governance layer across the entire data platform. It provides a unified metastore for all assets — tables, views, volumes, and ML models — with fine-grained access control at the catalog, schema, and table level. Column-level security automatically enforces PII masking policies, while built-in lineage tracks the end-to-end data flow from raw ingestion through Bronze, Silver, and Gold. Audit logs captured via Unity Catalog support compliance reporting and proactive data governance without any custom tooling overhead.
Immutable raw ingestion with Delta Lake partitioning, PII masking, and full source fidelity.
Cleaned, deduplicated, and enriched data modelled via DBT for quality and cross-source consistency.
Domain-owned, business-ready Delta tables powering dashboards, KPIs, and stakeholder reporting.
Versioned SQL models with built-in tests, CI/CD pipelines, and auto-generated data documentation.
Centralized governance with column-level security, lineage tracking, and compliance audit logs.
The redesigned platform processes data through multiple Medallion layers, ensuring reliable ingestion, transformation, and analytics delivery.
Salesforce, Marketo, Segment
Raw Data Storage
Cleaned & Transformed Data
Analytics Ready Data
Dashboards & BI
The system processes data through a structured Medallion architecture, transforming raw inputs into analytics-ready insights.
APIs
AWS S3
CRM / Tools
Raw Data
Cleaned Data
Business Data
Dashboards
Insights
The platform is built using a layered stack of modern technologies, enabling scalable ingestion, transformation, and analytics.
Source systems and transfer mechanisms that bring raw enterprise data into the platform.
The core data processing layer built for scalable ingestion, storage, and transformation performance.
Transformation logic that cleans, validates, and standardizes raw data for downstream use.
Business-ready datasets designed to support trusted reporting, metrics, and decision-making.
Visualization and reporting tools used to surface insights from curated data layers.
The platform enforces strong governance policies, ensuring secure access, compliance with regulations, and complete data transparency.
Granular access control for Admins, Engineers, Analysts, and Compliance teams.
Sensitive data is segregated and masked to ensure GDPR compliance.
AES-256 encryption ensures secure storage and transmission of sensitive data.
All data access is tracked with logs and alerts for full transparency.
DBT-powered lineage tracking ensures visibility across pipelines.
Critical fields like email and credentials are masked dynamically.
Full Access
Bronze & Silver (Read/Write)
Gold Layer (Read Only)
PII Access (Masked)
The refactored architecture significantly improved performance, reliability, and operational efficiency across the data platform.
Faster Data Ingestion
Improved Query Performance
Pipeline Success Rate
Reduction in Manual Effort
Automation and better orchestration reduced repetitive manual work across ingestion monitoring and data preparation workflows.
Clearer lineage, governance controls, and standardized transformations improved confidence in downstream reporting outputs.
The redesigned lakehouse foundation positioned the platform to absorb future growth without repeating legacy performance bottlenecks.
Thirteen enterprise and third-party systems feed the LastPass data platform — each with a distinct retrieval protocol, format, and ingestion cadence before landing in the Bronze layer.
Total Sources
REST APIs
S3 Buckets
FTP Transfer
Internal Table
Structured content exports pulled from remote FTP server and landed to AWS S3 for Bronze ingestion.
Account, contact, and opportunity records extracted via SOQL over REST API with OAuth 2.0 and incremental LastModifiedDate filtering.
Transaction, refund, and billing events via OAuth 2.0 webhooks in real-time, with daily batch reconciliation. PII masked at Bronze.
Health scores, NPS, and engagement metrics via paginated API queries. Scores merged with Salesforce at the Silver layer.
IP-to-region mapping called synchronously during Bronze→Silver enrichment. Results cached in Delta Lake to minimize repeat calls.
Subscription, invoice, and transaction events via API Key webhooks with batch backfill. PII masked at Bronze ingestion.
Operational exports delivered to date-partitioned S3 buckets, pulled via boto3 into the Bronze landing zone with incremental partition scanning.
Real-time and historical FX rates called during Silver-layer enrichment. Rate table cached in Delta Lake to avoid duplicate calls.
Authorization, settlement, and risk-decision records via HTTP Signature auth. Fraud scores joined with PayPal and Stripe at the Gold layer.
Feature usage, page views, and guidance events via paginated Integration Key queries. Joined with Salesforce at Silver for adoption metrics.
Financial transaction exports delivered to S3, pulled via boto3 into Bronze. Date-partitioned with Delta Lake incremental path scanning.
LastPass SSO authentication events, session records, and identity data extracted via JDBC SQL queries with timestamp-based incremental loading.
NPS, CSAT, and open-text survey responses via OAuth 2.0 paginated pull. Scores merged with Gainsight health data at the Gold layer.
Lead, contact, and campaign activity records extracted from Marketo internal tables via scheduled SQL queries with incremental timestamp-based loading.
The transformation of the LastPass data warehouse marked a significant shift from a fragmented and performance-heavy system to a scalable, governed, and high-performing lakehouse architecture.
By implementing the Medallion Architecture and leveraging Databricks with DBT, the platform now supports reliable data ingestion, efficient transformations, and faster analytics delivery.
The introduction of governance controls, data lineage, and PII protection ensures compliance while maintaining flexibility for future growth.
This modernization not only improved performance and reduced operational complexity but also enabled data teams to focus on delivering meaningful insights rather than managing pipeline inefficiencies.
Client feedback reflects not just satisfaction with the refactored data warehouse, but confidence in the platform's improved scalability, governance, and reporting performance.
"Closeloop helped us bring structure and reliability to a complex data environment. The Databricks and DBT refactor improved pipeline performance, gave us much better visibility into our data models, and created a stronger foundation for future analytics initiatives."
Head of Data & Integrations, LastPassNo questions match your search.
Databricks and dbt help businesses modernize outdated data pipelines by improving scalability, automating transformations, and streamlining analytics workflows. Legacy systems often struggle with slow processing, inconsistent reporting, and growing maintenance costs. By implementing cloud-native architectures and structured transformation frameworks, organizations can process large volumes of data more efficiently, improve reporting accuracy, and reduce operational bottlenecks. Closeloop Technologies helps enterprises build modern, scalable data platforms that support long-term business growth and advanced analytics capabilities.
Refactoring existing data architectures allows businesses to improve performance and scalability without completely disrupting current operations. Rebuilding from scratch can be expensive, time-consuming, and risky for organizations that rely heavily on continuous data availability. Refactoring helps optimize existing workflows, remove inefficiencies, and integrate modern technologies such as Databricks and dbt while preserving critical business processes. Closeloop Technologies follows a strategic modernization approach that minimizes downtime and maximizes operational efficiency.
Modern data engineering solutions improve reporting by organizing and processing data in a structured and scalable way. Businesses can generate faster dashboards, improve data consistency, and reduce delays caused by fragmented systems or manual processes. Technologies like Databricks and dbt help automate data transformations and create analytics-ready datasets that support real-time decision-making. Closeloop Technologies builds high-performance reporting systems that help organizations gain faster and more reliable business insights.
Legacy data systems often create challenges such as slow query performance, inconsistent data quality, complex maintenance requirements, and limited scalability. As data volumes grow, these systems become harder to manage and can delay business-critical reporting and analytics. Organizations also face difficulties integrating modern tools and supporting real-time data processing. Closeloop Technologies helps businesses overcome these limitations by implementing scalable cloud-native data engineering solutions designed for long-term efficiency and growth.
DBT simplifies and standardizes the data transformation process by allowing teams to create modular, reusable, and version-controlled transformation logic. It helps businesses improve data quality, automate testing, and maintain better visibility into data workflows. By organizing transformations more efficiently, companies can reduce manual effort and accelerate analytics delivery. Closeloop Technologies uses dbt to build maintainable and scalable transformation pipelines that improve operational efficiency and business intelligence capabilities.
Real-time and near real-time data processing allow businesses to make faster decisions based on the most current information available. Delayed reporting can impact operational efficiency, customer experience, and strategic planning. Modern data engineering platforms enable organizations to process incoming data continuously and generate insights more quickly. Closeloop Technologies helps businesses implement high-performance data architectures that support faster analytics, improved responsiveness, and better decision-making.
Optimized data pipelines reduce operational costs by improving processing efficiency, minimizing redundant workflows, and decreasing the amount of manual intervention required to manage data systems. Businesses can process data faster, reduce infrastructure waste, and improve resource utilization. Automated transformation frameworks also reduce maintenance overhead and improve overall reliability. Closeloop Technologies designs optimized data engineering solutions that help organizations lower operational expenses while improving system performance.
Yes, modern data architectures are designed to integrate seamlessly with existing enterprise systems, applications, and analytics platforms. Businesses do not always need to replace their entire infrastructure to modernize data operations. Cloud-native solutions can connect with CRMs, ERPs, BI tools, APIs, and third-party platforms while improving overall performance and accessibility. Closeloop Technologies specializes in building integrated data ecosystems that enhance operational continuity and maximize technology investments.