Retail Sales ETL Pipeline
Automated ETL pipeline ingesting multi-format data sources (CSV, Excel, JSON), applying normalization and cleaning transforms, and loading into a PostgreSQL schema optimized for downstream analytics and dashboards.
Production-style projects focused on data pipelines, information retrieval, and machine learning systems.
Automated ETL pipeline ingesting multi-format data sources (CSV, Excel, JSON), applying normalization and cleaning transforms, and loading into a PostgreSQL schema optimized for downstream analytics and dashboards.
Vertical search engine crawling academic publications to build an inverted index, ranking results via weighted TF-IDF scoring. Results surfaced through a RESTful Flask API supporting structured queries.
Text classification pipeline that crawls news articles and categorises them into Business, Entertainment, and Health domains. Achieved a macro F1-score exceeding 0.97 across all classes using a scalable preprocessing pipeline.
Processed a corpus of 45,000+ articles with NLP cleaning and feature engineering. Trained and evaluated multiple classifiers; deployed the best-performing model as a real-time inference API.
Core skills developed through academic coursework and independent project work.
Academic background aligned with data engineering and computational systems.
Softwarica College · Coventry University, UK
Focus areas: data engineering, cloud systems, machine learning for analytics, distributed data processing.
NAMI College · University of Northampton, UK
Strong foundation in backend systems, relational databases, algorithms and software architecture.
Happy to discuss projects, answer questions, or arrange a demo.