Work

Production AI systems, data products, and governance frameworks that pass audits, generate revenue, and maintain stakeholder trust.

Transforming Federal AI

U.S. Food & Drug Administration
Lead Data Scientist | 2025-Present | Security Clearance

GenAI Regulated AI Zero to One FedRAMP High Leadership Governance
Leading strategic data architecture planning

The Challenge

When I joined FDA's CDRH office, AI was seen as too risky for production use in regulatory workflows. The perception was that compliance requirements (FedRAMP High, HIPAA, audit trails) would slow innovation to a crawl.

Device submission reviewers were spending hours manually linking related submissions across systems—a perfect use case for AI, but one that required absolute trust, explainability, and regulatory compliance.

The Discovery

The real blocker wasn't regulation—it was architecture. By designing governance into the system from day one rather than bolting it on later, we could move fast and stay compliant.

I discovered that AWS GovCloud had the security and compliance infrastructure we needed, but we had to architect for:

  • Provenance tracking (where did this data come from?)
  • Grounding citations (what evidence supports this answer?)
  • Bias detection and drift monitoring
  • Blue-green deployments with automated rollback
  • Complete audit trails for every prediction

The Solution

I architected an agentic RAG system with AWS Bedrock that:

1. Unified Data Architecture

Migrated Databricks medallion layers to production S3 GovCloud data lake with Glue Data Catalog, Lake Formation governance, and Redshift Serverless analytics

2. Production GenAI

Built entity-linking service using SageMaker Feature Store, OpenSearch Serverless (vector search), and Lambda inference endpoints

3. MLOps Governance

Implemented SageMaker Pipelines, Model Registry, Model Monitor (drift), and Clarify (bias detection) with multi-variant endpoints and automated rollback

4. Event-Driven Architecture

Delivered microservices with EventBridge, SNS/SQS FIFO, API Gateway, and ECS Fargate—idempotent consumers reduced MTTR ~30%

The Impact

  • 85% business alignment achieved within 3 months
  • 15-25% precision/recall improvement in submission linking
  • <500ms P50 latency at scale serving 200+ reviewers
  • 30% faster incident resolution through event-driven architecture
  • Security clearance granted for production GenAI in federal environment
  • Framework adopted agency-wide as standard for AI deployments
  • 20% cost reduction in data pipelines while improving freshness

The Lesson

Compliance isn't a barrier to AI innovation—it's a design constraint that forces you to build better systems. Governance as code, not as review committees. Observability from day one, not retrofitted later.

This became the template for how federal agencies can safely deploy production AI.

Technical Stack:

AWS Bedrock • SageMaker • Lake Formation • OpenSearch Serverless • EventBridge • Lambda • ECS Fargate • Step Functions • CloudTrail • KMS • Macie • Redshift Serverless

Launching a $5M ARR Data Product

Agora.io
Director of Data Engineering | 2021-2023

Data Product Zero to One Team Building Revenue Impact Salesforce Integration
Leading data strategy and innovation

The Challenge

Agora had no data engineering function when I joined. Product telemetry (video/voice/messaging usage) and Salesforce CRM data lived in silos across AWS S3, Lambda functions, and disconnected databases.

The company had ~200 enterprise customers but no unified view of customer health, churn risk, usage patterns, cost optimization opportunities, revenue drivers, or fraud patterns.

The Discovery

While unifying these data sources into a platform on AWS + Databricks, I asked a question that changed the direction: "What if we made this product customers could use, not just internal dashboards?"

Our customers had the same problem we did—fragmented data about their own usage. If we could surface insights about their usage patterns, quality metrics, and cost optimization, that would be a product they'd pay for.

The Solution

I built the data function from scratch:

1. Team Building

Hired and led 4-person engineering team; established sprint cadences, code review standards, dbt conventions, and on-call practices

2. Unified Data Platform

Integrated Salesforce CRM and AWS product telemetry (S3, Lambda, Glue, Redshift, Lake Formation) with Databricks Unity Catalog governance

3. Event-Driven CDC

Built near-real-time ingestion using Salesforce Change Data Capture and Platform Events into S3; reduced integration defects 40%

4. Agora Analytics Product

Partnered with Sales, Marketing, and Product leadership to define KPIs customers cared about; launched configurable dashboards in 90 days

5. ML-Powered Fraud Detection

Built streaming fraud detection pipeline using SageMaker endpoints and Step Functions orchestration

The Impact

  • $5M+ ARR expansion opportunity created through Agora Analytics product
  • 30% faster fraud incident response, 15% fewer false positives
  • 40% reduction in integration defects through data contracts
  • 20% faster SLAs for go-to-market reporting
  • 18% cloud spend reduction while improving performance
  • 35% fewer data defects escaping production through dbt testing
  • 50% faster security questionnaire turnaround for enterprise deals

The Lesson

Data teams can be revenue drivers, not just cost centers. The mindset shift from "internal dashboards" to "external products" unlocked a $5M opportunity that was hiding in plain sight.

The key: Partner with Sales and Product early. Define KPIs that matter to customers, not just internal stakeholders. Launch quickly, iterate based on usage.

Technical Stack:

Databricks • Unity Catalog • AWS (S3, Lambda, Glue, Redshift, Lake Formation) • Salesforce CDC • SageMaker • Step Functions • dbt • Python

Building a CDP Featured by Google

SoundCommerce
Data Analytics Architect | 2018-2020

CDP GCP BigQuery Data Monetization Looker
Collaborating with team on data platform architecture

The Challenge

E-commerce retailers were drowning in data across Salesforce CRM, Shopify/Magento platforms, logistics systems, marketing tools, and customer service platforms.

They had critical questions but no unified way to answer them: Which customers are highest lifetime value? What's our true inventory position? Where should we spend our next marketing dollar? Why did we lose that high-value customer?

The Solution

I led a 6-person team architecting a Customer Data Platform on GCP:

1. Canonical Data Models

Standardized models for Customers, Orders, Products, Inventory, Marketing Spend—unified across all source systems

2. Event-Driven Ingestion

Built microservices using Cloud Functions and Pub/Sub to ingest data from Salesforce, e-commerce platforms, and marketing tools in near-real-time

3. BigQuery Data Warehouse

Implemented medallion architecture with partitioning, clustering, and right-sizing—achieved sub-second query times under peak loads

4. ML-Powered Insights

Delivered propensity scoring, churn prediction, and next-best-action models; integrated back to Salesforce via reverse-ETL

5. Configurable Dashboards

Built self-serve Looker Studio dashboards retailers could customize without engineering support

The Impact

  • Featured on Google Data Analytics blog for innovative CDP architecture
  • $20M+ GMV enabled across retail partners through better inventory allocation and marketing attribution
  • 30% reduction in manual reporting workload for retailers
  • 40% fewer integration defects through data contracts and schema validation
  • Double-digit compute cost savings while maintaining sub-second performance
  • Accelerated partner onboarding through governed data sharing via Analytics Hub

The Lesson

The hardest part of building a CDP isn't the technology—it's the data modeling. Retailers all think they're unique, but 80% of the data model is the same.

We invested heavily in canonical models upfront (Customers, Orders, Products) and made them extensible through metadata. This let us onboard new retailers in weeks instead of months.

Technical Stack:

BigQuery • Pub/Sub • Cloud Functions • Dataflow • Looker • dbt • Reverse-ETL • Salesforce API • Python • Terraform

View more detailed case studies including clinical AI for Pfizer and AI-powered marketing at scale.

View Full Experience