AI Engineering2025

RAG Knowledge Assistant

A retrieval-augmented assistant that answers questions over private business documents with citations, built on pgvector and Claude.

Problem

Teams lose hours searching wikis, PDFs, and past tickets. The goal: an assistant that answers questions from private company knowledge accurately, with citations — and says "I don't know" instead of hallucinating.

Architecture

Ingestion pipeline (parsing, semantic chunking, embedding into pgvector) feeding a FastAPI retrieval service with hybrid search (vector + keyword) and reranking. Claude generates grounded answers constrained to retrieved context, streaming into a Next.js chat UI with inline citations.

Challenges

Performance

High answer-acceptance rate in internal testing with strict citation coverage. (Draft metrics — replace with real numbers.)

Lessons Learned

Retrieval quality dominates model quality: most "LLM errors" were actually search errors.

Future Improvements

Multi-source connectors, conversational memory, and automated eval suite on a golden question set.