Work
Production AI systems, data products, and governance frameworks that pass audits, generate revenue, and maintain stakeholder trust.
Transforming Federal AI
U.S. Food & Drug Administration
Lead Data Scientist | 2025-Present | Security Clearance
The Challenge
When I joined FDA's CDRH office, AI was seen as too risky for production use in regulatory workflows. The perception was that compliance requirements (FedRAMP High, HIPAA, audit trails) would slow innovation to a crawl.
Device submission reviewers were spending hours manually linking related submissions across systems—a perfect use case for AI, but one that required absolute trust, explainability, and regulatory compliance.
The Discovery
The real blocker wasn't regulation—it was architecture. By designing governance into the system from day one rather than bolting it on later, we could move fast and stay compliant.
I discovered that AWS GovCloud had the security and compliance infrastructure we needed, but we had to architect for:
- Provenance tracking (where did this data come from?)
- Grounding citations (what evidence supports this answer?)
- Bias detection and drift monitoring
- Blue-green deployments with automated rollback
- Complete audit trails for every prediction
The Solution
I architected an agentic RAG system with AWS Bedrock that:
1. Unified Data Architecture
Migrated Databricks medallion layers to production S3 GovCloud data lake with Glue Data Catalog, Lake Formation governance, and Redshift Serverless analytics
2. Production GenAI
Built entity-linking service using SageMaker Feature Store, OpenSearch Serverless (vector search), and Lambda inference endpoints
3. MLOps Governance
Implemented SageMaker Pipelines, Model Registry, Model Monitor (drift), and Clarify (bias detection) with multi-variant endpoints and automated rollback
4. Event-Driven Architecture
Delivered microservices with EventBridge, SNS/SQS FIFO, API Gateway, and ECS Fargate—idempotent consumers reduced MTTR ~30%
The Impact
- ✓ 85% business alignment achieved within 3 months
- ✓ 15-25% precision/recall improvement in submission linking
- ✓ <500ms P50 latency at scale serving 200+ reviewers
- ✓ 30% faster incident resolution through event-driven architecture
- ✓ Security clearance granted for production GenAI in federal environment
- ✓ Framework adopted agency-wide as standard for AI deployments
- ✓ 20% cost reduction in data pipelines while improving freshness
The Lesson
Compliance isn't a barrier to AI innovation—it's a design constraint that forces you to build better systems. Governance as code, not as review committees. Observability from day one, not retrofitted later.
This became the template for how federal agencies can safely deploy production AI.
Technical Stack:
AWS Bedrock • SageMaker • Lake Formation • OpenSearch Serverless • EventBridge • Lambda • ECS Fargate • Step Functions • CloudTrail • KMS • Macie • Redshift Serverless
Launching a $5M ARR Data Product
Agora.io
Director of Data Engineering | 2021-2023
The Challenge
Agora had no data engineering function when I joined. Product telemetry (video/voice/messaging usage) and Salesforce CRM data lived in silos across AWS S3, Lambda functions, and disconnected databases.
The company had ~200 enterprise customers but no unified view of customer health, churn risk, usage patterns, cost optimization opportunities, revenue drivers, or fraud patterns.
The Discovery
While unifying these data sources into a platform on AWS + Databricks, I asked a question that changed the direction: "What if we made this product customers could use, not just internal dashboards?"
Our customers had the same problem we did—fragmented data about their own usage. If we could surface insights about their usage patterns, quality metrics, and cost optimization, that would be a product they'd pay for.
The Solution
I built the data function from scratch:
1. Team Building
Hired and led 4-person engineering team; established sprint cadences, code review standards, dbt conventions, and on-call practices
2. Unified Data Platform
Integrated Salesforce CRM and AWS product telemetry (S3, Lambda, Glue, Redshift, Lake Formation) with Databricks Unity Catalog governance
3. Event-Driven CDC
Built near-real-time ingestion using Salesforce Change Data Capture and Platform Events into S3; reduced integration defects 40%
4. Agora Analytics Product
Partnered with Sales, Marketing, and Product leadership to define KPIs customers cared about; launched configurable dashboards in 90 days
5. ML-Powered Fraud Detection
Built streaming fraud detection pipeline using SageMaker endpoints and Step Functions orchestration
The Impact
- ✓ $5M+ ARR expansion opportunity created through Agora Analytics product
- ✓ 30% faster fraud incident response, 15% fewer false positives
- ✓ 40% reduction in integration defects through data contracts
- ✓ 20% faster SLAs for go-to-market reporting
- ✓ 18% cloud spend reduction while improving performance
- ✓ 35% fewer data defects escaping production through dbt testing
- ✓ 50% faster security questionnaire turnaround for enterprise deals
The Lesson
Data teams can be revenue drivers, not just cost centers. The mindset shift from "internal dashboards" to "external products" unlocked a $5M opportunity that was hiding in plain sight.
The key: Partner with Sales and Product early. Define KPIs that matter to customers, not just internal stakeholders. Launch quickly, iterate based on usage.
Technical Stack:
Databricks • Unity Catalog • AWS (S3, Lambda, Glue, Redshift, Lake Formation) • Salesforce CDC • SageMaker • Step Functions • dbt • Python
Building a CDP Featured by Google
SoundCommerce
Data Analytics Architect | 2018-2020
The Challenge
E-commerce retailers were drowning in data across Salesforce CRM, Shopify/Magento platforms, logistics systems, marketing tools, and customer service platforms.
They had critical questions but no unified way to answer them: Which customers are highest lifetime value? What's our true inventory position? Where should we spend our next marketing dollar? Why did we lose that high-value customer?
The Solution
I led a 6-person team architecting a Customer Data Platform on GCP:
1. Canonical Data Models
Standardized models for Customers, Orders, Products, Inventory, Marketing Spend—unified across all source systems
2. Event-Driven Ingestion
Built microservices using Cloud Functions and Pub/Sub to ingest data from Salesforce, e-commerce platforms, and marketing tools in near-real-time
3. BigQuery Data Warehouse
Implemented medallion architecture with partitioning, clustering, and right-sizing—achieved sub-second query times under peak loads
4. ML-Powered Insights
Delivered propensity scoring, churn prediction, and next-best-action models; integrated back to Salesforce via reverse-ETL
5. Configurable Dashboards
Built self-serve Looker Studio dashboards retailers could customize without engineering support
The Impact
- ✓ Featured on Google Data Analytics blog for innovative CDP architecture
- ✓ $20M+ GMV enabled across retail partners through better inventory allocation and marketing attribution
- ✓ 30% reduction in manual reporting workload for retailers
- ✓ 40% fewer integration defects through data contracts and schema validation
- ✓ Double-digit compute cost savings while maintaining sub-second performance
- ✓ Accelerated partner onboarding through governed data sharing via Analytics Hub
The Lesson
The hardest part of building a CDP isn't the technology—it's the data modeling. Retailers all think they're unique, but 80% of the data model is the same.
We invested heavily in canonical models upfront (Customers, Orders, Products) and made them extensible through metadata. This let us onboard new retailers in weeks instead of months.
Technical Stack:
BigQuery • Pub/Sub • Cloud Functions • Dataflow • Looker • dbt • Reverse-ETL • Salesforce API • Python • Terraform
View more detailed case studies including clinical AI for Pfizer and AI-powered marketing at scale.
View Full Experience