Indhu Shree Prakash

Indhu Shree Prakash

Data Engineer

⏱ 1.5 years experience
🏢 4 companies
📈 Production-scale analytics systems
  • ⚙️ I build pipelines.
    Not PowerPoints about pipelines.
  • 🇬🇧 UK commercial experience —
    Severn Trent Water (FTSE 100).

Spent 1.5 years designing ETL pipelines, optimising SQL and data models, and automating the workflows behind enterprise reporting.

Experience across Boomi, Severn Trent Water, Cognizant and Infosys.

SQL for breakfast, data models for lunch, pipelines all afternoon — and refactor Python for dinner.

From Analytics to Engineering

How reporting turned into engineering

I started in reporting and analytics. The job was to help stakeholders understand data. The unintended consequence was that I kept ending up nose-deep in the pipelines feeding those reports — asking why they were slow, who broke them, and whether anyone had written documentation. (The answer was usually no.)

At Boomi, working with 20M+ records across a 32-table platform exposed me to the real cost of poor data modelling: a 6 GB report that took too long, cost too much, and broke too often. Fixing it meant restructuring data models, rewriting SQL, and automating workflows that had been done by hand for months.

At Severn Trent Water, I built a pipeline from scratch — ingesting 1,449 days of sensor data, surfacing 463 overflow events, and delivering a compliance dashboard for an executive audience operating under regulatory scrutiny.

Most people see dashboards.
I became more interested in the pipelines behind them.

How It Evolved

  1. Reporting & Analytics
  2. SQL Optimisation
  3. Data Modelling
  4. Data Quality
  5. Automation
  6. Cloud Systems
  7. Data Engineering

ETL Pipelines

Designing reliable data pipelines that move information from operational systems to analytics platforms. Built ingestion, transformation, and validation workflows handling production-scale datasets across enterprise and utility environments.

Tools:

PythonPandasAWS LambdaS3Snowflake

Data Modelling

Building data models that make reporting faster, simpler, and more reliable. Restructured schemas, removed redundant joins, and improved data quality across analytics platforms serving business stakeholders.

Tools:

SQLSnowflakePower BI

SQL Optimisation

Optimising queries and reporting systems to reduce load times, lower costs, and improve maintainability. Reduced a reporting platform from 6 GB to 2.8 GB while improving dashboard performance by 35%.

Tools:

SQLSnowflakePostgreSQL

Cloud Engineering

Building event-driven cloud architectures that scale automatically without server management. Developed serverless workflows using AWS services for ingestion, processing, and storage.

Tools:

AWS S3LambdaDynamoDBDockerPython

Professional Experience

Where the work actually happened

Data Analyst, Engineering Team

🇬🇧 Severn Trent Water — United Kingdom

Feb 2026 – Apr 2026 Python · Pandas · Power BI
1,449Days of sensor data
463Overflow events identified
5 KPIsExecutive dashboard
100%Compliance projected
  • Challenge: Evaluate the performance of a constructed wetland treatment solution using 1,449 days of unstructured wastewater flow and water-quality sensor readings — no existing schema, no defined baseline, and findings subject to regulatory scrutiny.
  • What I built: A Python and Pandas pipeline to clean, structure, and analyse the time-series sensor data, applying threshold-based detection to flag overflow events and surface seasonal flow patterns.
  • Impact: Identified 463 overflow events and recurring seasonal trends; delivered a 5-KPI executive dashboard across 6 visualisations in Power BI, automating a previously manual compliance assessment process and supporting a projected 100% regulatory compliance outcome.

Business Intelligence Reporting Analyst

Boomi Software India Pvt Ltd

Jan 2025 – Jul 2025 SQL Optimisation · Data Modelling · Automation
20M+Records in platform
6 → 2.8 GBReport size reduced
−35%Dashboard load time
−40%Analyst reporting effort
  • Challenge: A 6 GB Power BI platform spanning 32 source tables and ~20 million records had become slow to load, costly to query, and dependent on manual refreshes by analysts.
  • What I did: Restructured the underlying data model to remove redundant joins, rewrote inefficient SQL queries, and built automated refresh and validation workflows in Snowflake to catch data quality issues before they reached the dashboard.
  • Impact: Reduced platform size from 6 GB to 2.8 GB (53%), improved dashboard load times by 35%, cut query costs by 20%, reduced analyst reporting effort by 40%, and lowered data inconsistency incidents by 50%.

Earlier roles

AI and Analytics Analyst

Cognizant Technology Solutions

May – Jun 2024 SQL · Python · Flask · Data Validation
626K+Records processed
14Relational tables
25%Pre-processing error reduction
SQLTransformation models
  • Challenge: Data preparation for reporting across 626K+ records and 14 relational tables was handled manually and inconsistently, increasing the risk of errors reaching downstream dashboards.
  • What I did: Built reusable SQL transformation models to standardise data preparation, added automated validation checks to catch inconsistencies before they reached reporting layers, and exposed the cleaned data through a query-based backend for non-technical users.
  • Impact: Reduced data pre-processing errors by 25% and gave non-technical stakeholders direct, validated access to 626K+ records across 14 tables — without analyst involvement.

AI Analyst

Infosys Springboard

May – Aug 2024 Python · Embeddings · Qdrant · RAG
50+Job descriptions benchmarked
92%Retrieval accuracy
+4–8ppvs. ML approaches
Qdrant+ MiniLM embeddings
  • Challenge: Candidate-to-job matching relied on manual review across a large volume of job descriptions, with no benchmarked method for evaluating match quality.
  • What I did: Benchmarked vector search (Qdrant with MiniLM embeddings) against machine learning-based matching approaches across 50+ job descriptions, scoring each method's retrieval accuracy against a set of labelled relevant matches.
  • Impact: Vector search achieved 92% retrieval accuracy, outperforming machine learning approaches by 4–8 percentage points, and was adopted as the basis for a semantic matching pipeline using ranked retrieval.

Selected Projects

Things I built (and what I learned fixing them)

Serverless Data Pipeline — AWS

Event-driven cloud pipeline with zero infrastructure management

GitHub →
Problem
Building a scalable ingestion-to-storage pipeline without standing infrastructure — no servers to provision, patch, or babysit.
Why it was difficult
Event-driven design requires each stage to be independently reliable, with failures caught without a central coordinator. Getting IAM permissions, trigger chains, and DynamoDB writes to work together cleanly took architectural discipline.
Solution
Architected a fully serverless pipeline: S3 ingestion triggers Lambda for stateless processing, Lambda writes structured records to DynamoDB. Each stage is decoupled and scales automatically.
Impact
25+ assets processed through fully automated trigger-based workflows, with zero standing infrastructure — demonstrating an event-driven architecture: S3 → Lambda → DynamoDB.
AWS S3AWS LambdaDynamoDBPythonLocalStack

BI Reporting Using Chat Interface

Self-service analytics layer on a 14-table relational database

GitHub →
Problem
Business stakeholders needed access to 626K+ records across 14 tables, but had no SQL knowledge. Every report request went through an analyst, creating a bottleneck in decision-making.
Why it was difficult
Natural-language questions are often ambiguous, and translating them into correct SQL across 14 joined tables required careful query construction to avoid returning incorrect or misleading results.
Solution
Built a Flask backend with a structured query generation layer, input validation pipelines, and two export formats — so stakeholders could extract data without analyst involvement.
Impact
626K+ records queryable by non-technical users. Pre-processing errors reduced by 25%. Analyst reporting bottleneck eliminated.
PythonFlaskSQLPower BIPandas

AI StudyMate — RAG Learning Assistant

Retrieval-augmented generation system for academic study support

GitHub →
Problem
Students spend significant time searching across multiple disconnected resources to find relevant study materials, with no single interface able to surface topic-relevant answers.
Why it was difficult
Combining semantic retrieval, external service APIs, and AI-generated responses into a cohesive system required careful orchestration, with tolerance for retrieval latency and occasional misses.
Solution
Built a 6-module RAG system integrating 4 external services for semantic retrieval and AI-generated study content — unified through a single interface.
Impact
Topic-relevant study assistance across multiple content types, automated resource generation, and single-interface access to semantically retrieved materials.
PythonRAGVector SearchLLMsREST APIs

NewsSwarm — Agentic News Automation

Agentic AI workflow for news ingestion, summarisation, and automated publishing

Live Demo →
Problem
Manually sourcing, summarising, and publishing news content is time-consuming and does not scale. Each stage of the workflow (fetch, summarise, format, publish) was typically handled independently.
Why it was difficult
Integrating multiple external APIs into a coordinated multi-step workflow — where each stage depends on the output of the previous — required robust error handling, orchestration logic, and state management.
Solution
Developed an agentic system using Python, Groq LLM, NewsAPI, and social media APIs to automate the full pipeline: ingestion → summarisation → formatting → publishing.
Impact
Fully automated news workflow covering ingestion, summarisation, and publishing, enabling real-time aggregation and distribution without manual intervention.
PythonGroq LLMNewsAPIREST APIs

Let's Talk

Available September 2026 under the UK Graduate Route visa. No employer sponsorship required until 2028.