Available May 2026 · Open to relocation across the US

Data & AI systems engineer.

I build pipelines that don't break and AI systems that ship. From clinical knowledge graphs to multi-agent platforms in production — I care about what happens after the demo.

Based Boston, MA
Role Data / AI / Analytics Engineer
Program MS Information Systems · Northeastern
Now RA + TA under one prof, job hunting hard
Scroll
PythonLangGraphSnowflakeAirflow Claude APIdbtPySparkGCP QdrantMCPFastAPIDatabricks PythonLangGraphSnowflakeAirflow Claude APIdbtPySparkGCP QdrantMCPFastAPIDatabricks
3M+
Patient records processed
26,985
Qdrant vectors in production
80+
Grad students taught
97%
Schema compliance
85%
Ingest error reduction
10×
Query speedup (IMDb)
80%
Research time cut (ORBIT)
8.17/10
LLM-as-judge score
1.6M+
Indexed chunks (TechScope)
01 — The Story

Who I am.

I grew up in Gujarat, India — class president four years straight at Dharmsinh Desai University. Not because it looked good on paper. Because I'm wired to build structure where there isn't any.

I moved to Boston in 2024 to do my Master's at Northeastern. Within a year I was doing two jobs at once under the same professor: building healthcare AI pipelines in production on GCP, and designing two graduate data engineering courses from scratch for 80+ students.

My philosophy is simple. I build systems that work after the demo ends. Every project has validation, observability, and a clear answer to "what happens when this breaks at 2am?" That's the line between data engineering and data science theater.

I graduate May 2026. I'm in an intensive job search right now — looking for a team that takes data infrastructure or applied AI seriously. If that's you, let's talk.

I.
Ship before polish
A deployed pipeline with rough edges beats a perfect notebook. I bias toward production — Cloud Run, Airflow DAGs, Kubernetes. Not localhost demos.
II.
Validation is not optional
Schema validation, PII controls, audit trails, DVC versioning. Data without quality guarantees is just expensive noise.
III.
Teach what you know
I designed and taught two grad courses from scratch. Teaching forces clarity — if I can't explain it to 80 engineers, I don't understand it well enough.
IV.
Context-first systems
My AI systems carry state forward and generate follow-up questions, not one-shot answers. That's the difference between a chatbot and a tool.
03 — Selected Work

Projects that ship.

Filter by track or scroll for everything. The ones on the left are bigger for a reason.

AI · Agents 81 COMMITS · 3 CONTRIBUTORS

ORBIT AI-50 Intelligence

Agentic LLM platform for private-equity-grade due diligence on the Forbes AI 50.

Airflow DAGs ingest 500+ documents per run into GCS, chunked at 800 tokens / 100 overlap into 26,985 vectors (384-dim, all-MiniLM-L6-v2) in Qdrant. A 5-node LangGraph workflow — Planner → Data Generator → Evaluator → Risk Detector → HITL — uses the ReAct pattern with full execution traces. An MCP server exposes the intelligence as tools. Side-by-side RAG vs Structured (Pydantic) pipelines scored head-to-head at 8.17/10.

50
Forbes AI-50 companies
26,985
Qdrant vectors
500+
Docs per run
8.17/10
LLM-as-judge score
96%
Pipeline success rate
80%
Research time cut
LangGraphMCPReActHITL QdrantAirflowCloud Composer Cloud RunFastAPIInstructorPydantic
AI · Agents 64 COMMITS

TechScopeAI

Multi-agent intelligence for technical startup founders.

Seven specialized agents (Pitch, Competitive, Marketing, Patent, Policy, Team + Coordinator) on a shared LangGraph state. Weaviate Cloud with HNSW indexing. GPT-4 ↔ Gemini failover. MCP tools: DuckDuckGo, Pexels, USPTO, web extraction. Pitch agent integrates Gamma.ai across 5 themes. React + TypeScript + Vite frontend, FastAPI backend.

7
Specialized agents
1.6M+
Indexed chunks
7
Weaviate collections
2×
LLM failover
LangGraphMCPWeaviateReactTypeScriptFastAPIGamma.ai
AI · RAG 69 COMMITS

Aurelia Financial RAG

Cloud-native RAG over a 4,000-page Financial Toolbox.

Code-aware chunking with custom separator priority (1200-char chunks, 200 overlap) — 180/180 MATLAB code blocks preserved intact, 100% metadata preservation for citations. text-embedding-3-large at 3072 dimensions, Instructor-validated Pydantic outputs, Wikipedia fallback when concepts aren't in the PDF. Cloud Composer weekly refresh at $0.0024 per query.

4,000pg
Financial corpus
580
Embeddings
2-3s
Cached latency
65%
QA cycle cut
LangChainChromaDBInstructorPydanticFastAPIApp Engine
AI · MLOps In Progress

Project Polaris

End-to-end MLOps for real-time sentiment at production scale.

Complete MLOps lifecycle. Logistic Regression, Random Forest, XGBoost tracked via MLflow — winning model served by FastAPI on Kubernetes. Kafka live streaming, Prometheus + Grafana for latency / throughput / drift, GitHub Actions CI/CD.

3
Models tracked
K8s
Served on cluster
Live
Kafka streaming
CI/CD
Auto-deploy
MLflowKubernetesKafkaFastAPIXGBoostPrometheusGrafanaDocker
Analytics 63 COMMITS

IMDb Warehouse

Star schema over 90M rows, 10× query speedup.

Fact_Title_Ratings at proper grain + 5 dimensions + 2 bridge tables. Eliminates row explosion and metric inflation on many-to-many joins. ADF + Alteryx + Snowflake + Power BI.

90M
Rows
14GB
Raw data
10×
Query speedup
5+2
Dims + bridges
SnowflakeADFdbtPower BIAlteryx
Data Engineering 108 COMMITS

Food Inspections

First unified Chicago + Dallas health-safety model.

Two cities. Incompatible schemas (17 cols vs 114 cols). Different risk systems — categorical labels vs numeric scores. City-aware Alteryx ETL resolves the semantic mismatch into one trustworthy fact table.

200K+
Records per city
2
Cities unified
17→114
Column spread
6+1
Dims + bridge
SnowflakeAlteryxPower BISQL
Analytics PRIVATE

NYPD Arrest Analytics

5M+ records, MERGE-based incremental refresh.

Snowflake dimensional model. Power BI trends by demographics, offense type, precinct, time. Designed for ongoing policy evaluation, not static reports.

5M+
Arrest records
4
Analysis axes
MERGE
Incremental load
Star schema
SnowflakePower BIAlteryxADF
Data Engineering 91 COMMITS

Report Intelligence

SEC 10-K / 10-Q parsing with dual open-source engines + cloud benchmark.

Two complementary parsers: pdfplumber for speed, Docling for layout-aware extraction with reading-order and bounding-box provenance. Optional Google Document AI benchmark. XBRL cross-checks catch scaling mismatches (e.g., "in millions"). Full DVC reproducibility with metrics tracked.

2+1
Parser engines
XBRL
Validation layer
DVC
Reproducible
WER/F1
Quality metrics
DoclingpdfplumberDVCDocument AITesseractFAISS
Data Engineering 34 COMMITS

Project Lantern

Automated Dow-30 earnings pipeline with 3-tier Selenium scraping.

100% IR discovery across all 30 Dow Jones companies via hybrid strategy (subdomain patterns, homepage anchors, pattern guessing, DuckDuckGo fallback). Three-tier scraping: Requests → Selenium navigation → aggressive DOM manipulation. Docling parses top pages; Instructor + GPT-4 extracts 26 metadata fields per doc. Airflow to GCS.

30/30
IR discovery
92
Tables extracted
26
Metadata fields
95%
Filing accuracy
AirflowSeleniumDoclingInstructorGPT-4GCS
04 — Experience

The journey.

Four roles. Two countries. One throughline — systems that work at scale.

Sep 2025 — Present
Boston, MA

Research Assistant & Graduate Teaching Assistant

Northeastern University · D'Amore-McKim School of Business

Research: Building production GCP pipelines (Cloud Composer, BigQuery, PySpark) processing 867 clinical records + 1,200+ EMR cases. Multi-stage LLM extraction with schema validation and PII controls — 97% structured output compliance, 85% error reduction.

Teaching: Designed Database Management (SQL/Oracle) and Data Integration courses from scratch — curriculum, labs, assignments — then delivered both to 80+ graduate students. Both roles under the same professor.

AirflowBigQueryPySparkClaude APISnowflakedbt
Dual Role
Healthcare AI
Curriculum Design
80+ Students
Jan 2024 — May 2024
Gandhinagar, India

Research Intern — Cross-Lingual NLP

Dhirubhai Ambani Institute of Information & Communication Technology (DAIICT)

Built a cross-lingual information retrieval system translating English queries into Hindi, Gujarati, and Bengali. NLP pipelines for query translation, semantic matching, and cross-language relevance ranking. Genuine research — no existing solution to benchmark against.

NLPPythonInformation RetrievalMultilingual Systems
Research
NLP
Multilingual
May 2023 — Jul 2023
Vadodara, India

Technical Intern

Nifty Solutions

Built SQL reporting tables processing 30GB of sales data with automated weekly refresh — standardizing KPIs across 6 Power BI and Tableau dashboards. Refactored join logic, eliminated duplicate-driven overcounting, cut dashboard failures by 40%. Built an internal real-time monitoring tool that improved decision-making efficiency by 35%.

Power BITableauSQLMS SQL Server
Internship
35% Efficiency Gain
40% Fewer Failures
05 — Capabilities

Technical skills.

From raw ingestion to deployed AI — and all the infrastructure in between.

Data Engineering
01
PythonSQLPySparkPandasApache AirflowdbtAzure Data FactoryDatabricksDelta LakeKafkaDVCSelenium
AI, RAG & Agents
02
LangGraphLangChainClaude APIOpenAI GPT-4oGeminiMCP ServerInstructorPydanticChromaDBQdrantWeaviateFAISSKnowledge Graphs
Warehousing & DBs
03
SnowflakeSnowflake CortexBigQueryPostgreSQLpgvectorMySQLOracleDimensional ModelingStar Schema
Cloud & Infra
04
GCPCloud RunCloud ComposerAzureAWSDockerKubernetesTerraformCI/CDGitHub Actions
Apps & BI
05
FastAPIReactTypeScriptStreamlitPower BITableauDAXAlteryx
Dev Tools & ML
06
Claude CodeCursorGitHub CopilotGitMLflowPrometheusGrafanaXGBoostPyTorch
06 — Education

Academic background.

MS, Information Systems
Northeastern University · College of Engineering · Boston, MA
Concentration in data engineering, AI systems, and cloud architecture. Currently taking Generative AI with LLMs.
Expected May 2026
GPA 3.57 / 4.0
BTech, Information Technology
Dharmsinh Desai University · Gujarat, India
CS fundamentals in data structures, algorithms, and software engineering. Class President all four years. Foundation in Python, SQL, full-stack development.
May 2024
GPA 3.75 / 4.0
07 — Beyond the Code

Leadership & community.

2022 — 2023

Director of Execution & Finance

Samvaad — DDU Cultural Org

Led execution and finance for large-scale cultural events — budgeting, logistics, on-ground coordination. Also served as Head of Photography & Video, managing event coverage teams. Grew from Associate (2021) into dual-director role.

2020 — 2024

Class President · 4 Years

Dharmsinh Desai University

Elected every year of my undergrad. Primary liaison between students and faculty — coordinating academics, advocating for student needs, building trust across a large peer group. Not a one-time win. A four-year record.

2024 — Present

Volunteer

NU Sanskriti — Northeastern

Volunteer with Northeastern's South Asian cultural org — event planning, cultural programming, and community building for the South Asian student community in Boston.

08 — Let's Connect

Let's build something real.

Available for full-time Data Engineering, Analytics Engineering, and AI/ML Engineering roles from May 2026. Open to anywhere in the US.

talati.ak@northeastern.edu
Boston, MA (857) 867-2050 github.com/akshtalati