Project: Guardrails Model Training

DDWAW-DAED — EY GDS Consulting (Mar 2025 – Present)

An end-to-end language model fine-tuning and evaluation platform built on AWS, enabling scalable model customization with guardrails for enterprise-grade safety and alignment.

Business Problem Statement

The client required a robust pipeline to fine-tune and evaluate large language models for guardrails use cases, ensuring responses remain safe, aligned, and governed. Existing tooling was limited to an on-premise Dell cluster with no cloud-native integration, no synthetic data strategy, and no repeatable evaluation framework — making scalable model adaptation and deployment very difficult.

Role & Contributions

Cloud Architecture & PoC Development

Architected the high-level cloud solution using AWS services, defining the end-to-end approach for model customization, deployment, and inference aligned with enterprise PoC architecture standards.
Developed a fine-tuning application PoC — including dataset & model onboarding, training workflow, and evaluation pipelines — enabling seamless end-to-end language model customization and establishing the foundational workflow for scalable model adaptation.
Enhanced the fine-tuning framework on AWS by extending the baseline fine-tuning script provided by the onsite team (originally developed for the Dell cluster), ensuring compatibility and cloud-native integration.
Deployed base and fine-tuned models on AWS via CMI and executed comprehensive latency-focused inference testing using benchmark scenarios, evaluating performance across base models and measuring improvements achieved through fine-tuning.
Led the end-to-end deployment and validation of the application within the EY AWS Sandbox environment, ensuring seamless integration and successful functional testing.

Governance, SDG Pipeline & Model Training

Established foundational setup and governance: selected base models, transferred model weights to cluster, set up Vanguard governance cadence, open code processes, budget tracking, and defined the model transfer protocol.
Designed and operationalized the Synthetic Data Generation (SDG) pipeline: developed the SDG strategy by analyzing existing data, creating data volumes and target formats, selecting and labelling QA pairs, generating and validating synthetic datasets, creating SDG scripts, and standing up a red-teaming SDG pipeline to improve data quality and safety.
Built the complete evaluation framework — including evaluation scripts, baseline model evaluation, and configuration validation — and stood up all base models on the cluster for inference, followed by detailed inference testing and benchmarking.
Defined training/output data formats, formatted training datasets, deployed training scripts across model families, and executed the training run followed by multiple training iterations involving hypothesis documentation, data/algorithm modifications, evaluation runs, and results analysis.

Tools & Frameworks Used

AWS (SageMaker, CMI, Sandbox environment)
Large Language Models — fine-tuning & inference
Synthetic Data Generation (SDG) pipeline
Red-teaming frameworks for data safety
Python — training scripts, evaluation pipelines
Model evaluation & benchmarking tools
Vanguard governance & open code processes