Project: Guardrails Model Training
DDWAW-DAED — EY GDS Consulting (Mar 2025 – Present)
An end-to-end language model fine-tuning and evaluation platform built on AWS, enabling scalable model customization with guardrails for enterprise-grade safety and alignment.
Business Problem Statement
The client required a robust pipeline to fine-tune and evaluate large language models for guardrails use cases, ensuring responses remain safe, aligned, and governed. Existing tooling was limited to an on-premise Dell cluster with no cloud-native integration, no synthetic data strategy, and no repeatable evaluation framework — making scalable model adaptation and deployment very difficult.
Role & Contributions
Cloud Architecture & PoC Development
- Architected the high-level cloud solution using AWS services, defining the end-to-end approach for model customization, deployment, and inference aligned with enterprise PoC architecture standards.
- Developed a fine-tuning application PoC — including dataset & model onboarding, training workflow, and evaluation pipelines — enabling seamless end-to-end language model customization and establishing the foundational workflow for scalable model adaptation.
- Enhanced the fine-tuning framework on AWS by extending the baseline fine-tuning script provided by the onsite team (originally developed for the Dell cluster), ensuring compatibility and cloud-native integration.
- Deployed base and fine-tuned models on AWS via CMI and executed comprehensive latency-focused inference testing using benchmark scenarios, evaluating performance across base models and measuring improvements achieved through fine-tuning.
- Led the end-to-end deployment and validation of the application within the EY AWS Sandbox environment, ensuring seamless integration and successful functional testing.
Governance, SDG Pipeline & Model Training
- Established foundational setup and governance: selected base models, transferred model weights to cluster, set up Vanguard governance cadence, open code processes, budget tracking, and defined the model transfer protocol.
- Designed and operationalized the Synthetic Data Generation (SDG) pipeline: developed the SDG strategy by analyzing existing data, creating data volumes and target formats, selecting and labelling QA pairs, generating and validating synthetic datasets, creating SDG scripts, and standing up a red-teaming SDG pipeline to improve data quality and safety.
- Built the complete evaluation framework — including evaluation scripts, baseline model evaluation, and configuration validation — and stood up all base models on the cluster for inference, followed by detailed inference testing and benchmarking.
- Defined training/output data formats, formatted training datasets, deployed training scripts across model families, and executed the training run followed by multiple training iterations involving hypothesis documentation, data/algorithm modifications, evaluation runs, and results analysis.
Tools & Frameworks Used
- AWS (SageMaker, CMI, Sandbox environment)
- Large Language Models — fine-tuning & inference
- Synthetic Data Generation (SDG) pipeline
- Red-teaming frameworks for data safety
- Python — training scripts, evaluation pipelines
- Model evaluation & benchmarking tools
- Vanguard governance & open code processes