Portfolio

Databricks Data Platform Modernization – LastPass

Data Engineering & Analytics

Project Overview

LastPass manages massive volumes of sensitive credentials, billing, and user activity data across numerous enterprise systems. As the platform scaled, its Databricks environment began experiencing pipeline instability, fragmented datasets, and governance gaps around sensitive information. Closeloop’s data engineering team refactored the architecture using a Medallion data model (Bronze, Silver, Gold) and introduced DBT for transformation, documentation, and lineage tracking. The modernized platform improved ingestion reliability, secured PII data handling, standardized pipelines, and enabled scalable analytics across integrated systems such as Salesforce, Marketo, and AWS S3.

Business Challenges

  • Fragmented datasets and inconsistent catalog structures across multiple enterprise systems.
  • High ingestion latency and frequent pipeline failures during peak data loads.
  • Lack of segregation and governance controls for sensitive PII data.
  • Limited documentation and unclear ownership of existing data pipelines.
  • Performance bottlenecks caused by inefficient transformations and schema inconsistencies.

Solution

  • Implemented a scalable Medallion architecture (Bronze, Silver, Gold) within Databricks.
  • Introduced DBT for structured transformations, schema documentation, and lineage tracking.
  • Segregated PII and non-PII datasets with role-based access controls.
  • Rebuilt ingestion pipelines using PySpark with validation checkpoints.
  • Standardized catalogs, schemas, and naming conventions across datasets.

Approach

  • Conducted gap analysis to identify ingestion and transformation bottlenecks.
  • Reorganized legacy datasets into Medallion layers aligned with best practices.
  • Rebuilt ETL pipelines using PySpark and SQL on Databricks clusters.
  • Implemented SCD-II logic through DBT snapshots for historical data tracking.
  • Added monitoring, validation checkpoints, and automated alerts for pipeline reliability.

Result & Benefits

50%

Faster Data Ingestion

Optimized pipelines reduced ingestion time through parallel processing and improved job orchestration.

47%

Improved Query Performance

Refactored transformations and structured data layers accelerated analytics queries and reporting.

99.8%

Pipeline Success Rate

Validation checkpoints and monitoring improved pipeline reliability and reduced manual intervention.

Contact Us