Open to Opportunities

Krishna

Sathvik

Senior Data Engineer · AI/ML & RAG Builder

I'm Krishna Sathvik Mantripragada, a Senior Data Engineer who loves turning complex, messy data into fast, reliable, and usable products. I design and build cloud data platforms, scalable ELT pipelines, and real-time streaming systems using Databricks, PySpark, Kafka, and Azure.

Beyond data engineering, I explore the intersection of AI, GenAI, and RAG — building production-ready applications using LangChain, vector databases, and LLM APIs. I also ship full-stack apps end-to-end, from backend APIs to polished frontend experiences.

I build real, production-grade data and AI systems — from streaming pipelines and analytics foundations to RAG chatbots and full-stack web applications.

Azure• Databricks• Python• Apache Kafka• Apache Spark• Snowflake• dbt• Power BI• Machine Learning• SQL• Apache Airflow• Tableau• AWS• Delta Lake• Apache Flink• Scala• LangChain• OpenAI• HuggingFace• MLflow• Vector Databases• Feature Store• Azure Data Factory• PostgreSQL• Docker• Azure DevOps• RAG• LLM• PySpark• Java• Azure• Databricks• Python• Apache Kafka• Apache Spark• Snowflake• dbt• Power BI• Machine Learning• SQL•

Featured Projects

Production-Ready

AI-Powered Platform

TrailVerse

Role: Founder · Full-Stack Developer · Data Engineer

TrailVerse is an AI-powered national parks exploration and trip planning platform for 470+ U.S. park units. It unifies NPS data, interactive maps, real-time weather, events, reviews, and dual LLM trip planning (OpenAI + Claude) into a single production-ready experience.

React 18.3 Node.js MongoDB OpenAI GPT-4

View Live Site

Job Tracking App

ApplyTrak - Enterprise Job Application Tracker

Role: Full-Stack Developer · Automation Engineer

ApplyTrak is a production-ready job application tracking platform that helps modern job seekers manage unlimited applications, goals, and analytics with real-time sync across devices. Built with React, TypeScript, Supabase, and Tailwind, it includes achievements, rich analytics, and a local-to-cloud migration system.

React 19 TypeScript Supabase

View Live App

Other Projects

LLM Engineer

RAG Chatbot - Advanced Interview Preparation Assistant

Role: LLM Engineer · RAG Developer

A dual-persona Retrieval-Augmented Generation (RAG) chatbot for interview preparation, combining 557+ curated knowledge chunks with FastAPI and a modern React frontend. It routes questions across AI/ML, Data Engineering, BI, and Analytics Engineering profiles to deliver structured, interview-ready answers.

React 19 FastAPI RAG

GitHub

Data Engineer

Real-time Fraud Detection

Role: Data Engineer · ML Engineer (Real-time Streaming · Anomaly Detection)

ML-powered fraud detection pipeline processing millions of transactions with sub-second latency. Built with Kafka for real-time event streaming, Spark for distributed processing, and machine learning models for anomaly detection and fraud classification.

Python Kafka Spark

GitHub

Data Engineer

Finance Tracker Pipeline

Role: Data Engineer · ETL Developer (Python · Pandas · SQLite)

A personal finance tracking pipeline that ingests CSV transaction data, cleans and categorizes expenses, stores them in SQLite, and exposes interactive summaries through a Streamlit dashboard. It generates monthly breakdowns, category views, and savings trends from raw bank exports.

Python pandas SQLite

GitHub

ML Engineer

Stock Price Prediction Pipeline

Role: ML Engineer (LSTM · Time-Series Modeling)

A complete end-to-end machine learning project for forecasting stock prices using traditional ML (Linear Regression, XGBoost) and deep learning (LSTM), along with time-series forecasting via Facebook Prophet. An interactive Streamlit dashboard makes model outputs, metrics, and visualizations easy to explore.

Python LSTM Streamlit

GitHub

Data Engineer

Market Basket Analysis Pipeline

Role: Data Engineer · ML Engineer (FP-Growth · Association Rules)

An end-to-end Market Basket Analysis pipeline that ingests retail transactions, cleans and filters them, and uses the FP-Growth algorithm to mine frequent itemsets and association rules. A Streamlit dashboard lets users filter by confidence/lift, search by product, and explore top co-occurring items.

Python Streamlit FP-Growth

GitHub

Data Engineer

Real-Time Vehicle Telemetry Pipeline

Role: Data Engineer · IoT Streaming Developer

This project simulates and processes real-time vehicle telemetry (GPS, speed, fuel level, engine temperature) using Kafka, Spark Structured Streaming, Cassandra, and Streamlit. It detects anomalies like overspeeding, overheating, and low fuel, and visualizes live metrics and alerts on a real-time dashboard.

Kafka Spark Cassandra

GitHub

AI/ML & GenAI

Personal Research & Projects

Beyond my professional data engineering work, I actively explore AI/ML, GenAI, and RAG technologies through hands-on projects. I build proof-of-concepts with LangChain, vector databases, and LLMs to understand how these tools work in practice and stay current with the AI landscape.

RAG System

RAG Chatbot - Advanced Interview Preparation Assistant

RAG OpenAI GPT-4 Claude Vector DB LangChain

View Project

GenAI

Generative AI Applications

Building production applications with GPT-4, Claude, and other LLMs. Exploring prompt engineering, fine-tuning, and agent-based architectures for real-world use cases.

OpenAI API Anthropic Claude Prompt Engineering LLM Agents

Ongoing exploration and experimentation

Machine Learning

ML & Deep Learning Projects

Personal ML projects including stock prediction with LSTM networks, fraud detection systems, and time series forecasting. Focus on production-ready implementations and model optimization.

LSTM XGBoost TensorFlow PyTorch

View Projects

Research

AI Research & Continuous Learning

Staying current with latest AI/ML research, experimenting with new architectures, and contributing to open-source AI projects. Regularly building proof-of-concepts and sharing learnings.

Research Papers Open Source Experimentation Knowledge Sharing

Active learning and contribution

AI Powered

Ask my AI Assistant

Query my background, tech stack, or availability. It reads directly from my resume data.

krishna-bot — node — 80x24

➜ ~ System online. Try asking: "What is your experience with Azure?"

➜ ~

Career
Timeline

I'm a Senior Data Engineer focused on building scalable, reliable data platforms that power analytics, machine learning, and real-time decision-making. My experience spans Azure and AWS, large-scale processing with Databricks and PySpark, and streaming architectures using Kafka and event-driven pipelines.

At Walgreens Boots Alliance, I design and operate enterprise data platforms that process terabytes of retail and healthcare data each month. I work across Azure Databricks, Synapse, ADF, and Kafka to build high-availability ETL/ELT workflows, improve data quality, and deliver trusted datasets consumed across finance, supply chain, and product teams.

Outside of work, I explore the intersection of Data Engineering and AI—building real, production-ready applications using GenAI, RAG, LangChain, vector databases, and LLM APIs. I also enjoy shipping full-stack applications end-to-end, which helps me bridge backend data systems with real user experiences.

What drives me: creating systems that scale for tomorrow, remain invisible to end users, and deliver value through clean design, automation, and long-term reliability.

Education

Master of Science in Computer Science

University of North Texas • 2021

Bachelor of Technology in IT

GITAM University • 2019

Senior Data Engineer

Walgreens Boots Alliance / Feb 2022 — Present

Designed large-scale data pipelines in Azure Databricks, Synapse, and ADF processing 10TB+ monthly across retail and healthcare systems, enabling reliable analytics and ML workflows for enterprise teams
Built scalable PySpark ETL and ELT pipelines integrating 15+ enterprise data sources, delivering high-availability curated datasets consumed by business, engineering, and data science partners
Implemented data quality frameworks using Python and SQL with automated validation and remediation, reducing data incidents 45% and eliminating over 200 hours of manual resolution work monthly
Partnered with data science teams to deliver feature-ready datasets and streamline ML training workflows by improving data consistency, availability, and lineage tracking across critical pipelines
Developed end-to-end pipeline observability with SLA monitoring, lineage metadata, and automated alerting, increasing critical pipeline reliability from 92% to 99.8% across production workloads
Automated CI/CD workflows using Azure DevOps to standardize deployments, reduce operational overhead, and improve delivery speed and stability for data engineering releases

Analytics Engineer

CVS Health / Oct 2020 — Dec 2021

Built Python and SQL ETL pipelines processing 7M+ daily records across claims, retail, and care data systems, enabling analytics teams to perform forecasting, planning, and operational reporting at scale
Automated ingestion workflows using Oracle Cloud and Python, reducing data refresh time from 8 hours to 2 hours and improving reporting freshness for business and analytics stakeholders
Designed data validation frameworks with rule-based checks and monitoring, reducing manual intervention by 30% and improving accuracy and trust in enterprise reporting datasets
Integrated ML model outputs into production ETL pipelines and delivered clean feature datasets, increasing model accuracy by 16% and enabling more reliable forecasting workflows
Delivered analytics-ready datasets consumed in Tableau and Power BI, accelerating leadership reporting cycles and improving decision-making for finance, operations, and care management teams

Data Science Intern

McKesson Corporation / Mar 2020 — Sep 2020

Built Python and Spark pipelines processing 2M+ prescription and utilization records, improving data readiness by 22% and enabling accurate analytics, forecasting, and downstream ML modeling
Created reusable feature datasets for XGBoost and time-series forecasting models, reducing model training time 40% and improving reliability and consistency across multiple model runs
Developed Tableau dashboards monitoring pipeline health and operational KPIs, reducing issue detection time from hours to minutes and improving system transparency for analytics teams
Delivered clean and well-documented datasets supporting supply chain optimization, enabling accurate demand forecasting and contributing to a 12% reduction in excess inventory costs

Software Developer

Inditek Pioneer Solutions / Jun 2018 — Dec 2019

Built responsive React and TypeScript applications improving page load times by ~30% and enhancing usability for internal healthcare tools used across multiple teams
Developed reusable UI components, shared API integration layers, and state management patterns that reduced frontend development time for new features by 25%
Implemented UI testing, debugging workflows, and CI/CD practices with backend and QA teams, reducing production UI defects by 20% and improving release stability

React.js TypeScript / JavaScript (ES6+) REST API Integration UI Component Architecture HTML5 / CSS3 / Responsive Design

Core Competencies

Data Architecture & Design

Enterprise data platform design, Medallion/Lakehouse architecture, dimensional modeling, cloud-native data solutions

ETL/ELT Pipeline Engineering

Real-time streaming with Kafka, batch ELT with PySpark/Databricks, orchestration with Airflow and ADF, scalable transformation layers using dbt-style modeling

Data Quality & Governance

Validation frameworks, automated monitoring, metadata management, lineage tracking, quality rules, SLA enforcement

ML/AI Data Infrastructure

Feature-ready dataset engineering, ML data pipelines, model input/output integration, automation for data used in training and inference

Cloud Data Platforms

Azure (Databricks, Synapse, Data Lake, ADF), AWS (S3), Snowflake (projects), dbt-style transformations, Delta Lakehouse architectures

DataOps & Automation

CI/CD for data pipelines, automated testing and deployment, monitoring and alerting, DevOps workflows for data engineering teams

Technical Skills

Data Engineering & Streaming

Databricks · PySpark · Apache Spark · Delta Lake · Kafka · dbt · Airflow · Azure Data Factory

AI, ML & RAG

Python (ML) · scikit-learn · LSTM · RAG · LangChain · OpenAI API · FAISS · Pinecone

Cloud & Warehouses

Azure · AWS · Snowflake · Azure Synapse · Delta Lakehouse

Programming & Scripting

Python · SQL · Scala · Java · PowerShell

BI & Analytics Tools

Power BI · Tableau · Streamlit

DevOps & Tooling

Git/GitHub · CI/CD · Docker · Azure DevOps

Certifications

Microsoft Azure Data Engineer Associate

Active • Credential ID: 2CA6D7588001CC9F

Designing and implementing data storage, data processing, and data security solutions using Azure services.

Microsoft Azure AI Engineer Associate

Active • Credential ID: 61B6FE700A01EC6

Designing and implementing AI solutions using Azure Cognitive Services, Azure Machine Learning, and Azure Bot Service.

SnowPro Core Certification

In Progress

Validates expertise in Snowflake data warehousing, administration, and analytics.

Databricks Certified Data Engineer

In Progress

Certified in Databricks lakehouse architecture, Spark, and data engineering best practices.

dbt Analytics Engineering Certification

Planned

Production-grade dbt transformations, testing, documentation, and analytics engineering.

O'Reilly ChatGPT Data Analysis

Active • 2025

Advanced techniques for using ChatGPT and AI tools for data analysis and business intelligence.

Publications

AI for Electricity Market Design

Book Chapter — Handbook of Smart Energy Systems, Springer (2023)

Published chapter on artificial intelligence applications in electricity market design and optimization.

Optimizing Parallel Computing Architectures

Feature Article — Medium (Dec 2024) • 8 min read

How to leverage parallel computing to process and analyze data efficiently at scale.

Data Engineering & Cloud Architecture Articles

Blog Series — Medium • Multiple articles

Regular content on data engineering best practices, cloud architecture, and real-world data solutions.

Status: Available

Let's Work
Together

I'm currently exploring roles as a Senior Data Engineer or AI/Data Engineer (DE + GenAI/RAG). If you'd like to chat about data platforms, streaming, or AI products, feel free to reach out.

krishnasathvikm@gmail.com

Location

Jefferson City, MO

Remote (USA)

Stack

Azure / Databricks / Snowflake

Python / SQL / dbt

Krishna Sathvik

Senior Data Engineer · AI/ML & RAG Builder

Featured Projects

TrailVerse

ApplyTrak - Enterprise Job Application Tracker

Other Projects

RAG Chatbot - Advanced Interview Preparation Assistant

Real-time Fraud Detection

Finance Tracker Pipeline

Stock Price Prediction Pipeline

Market Basket Analysis Pipeline

Real-Time Vehicle Telemetry Pipeline

AI/ML & GenAI

RAG Chatbot - Advanced Interview Preparation Assistant

Generative AI Applications

ML & Deep Learning Projects

AI Research & Continuous Learning

Ask my AI Assistant

CareerTimeline

Education

Senior Data Engineer

Analytics Engineer

Data Science Intern

Software Developer

Core Competencies

Data Architecture & Design

ETL/ELT Pipeline Engineering

Data Quality & Governance

ML/AI Data Infrastructure

Cloud Data Platforms

DataOps & Automation

Technical Skills

Data Engineering & Streaming

AI, ML & RAG

Cloud & Warehouses

Programming & Scripting

BI & Analytics Tools

DevOps & Tooling

Certifications

Microsoft Azure Data Engineer Associate

Microsoft Azure AI Engineer Associate

SnowPro Core Certification

Databricks Certified Data Engineer

dbt Analytics Engineering Certification

O'Reilly ChatGPT Data Analysis

Publications

Let's WorkTogether

Location

Stack

Krishna

Sathvik

Career
Timeline

Let's Work
Together