Production-ready Retrieval-Augmented Generation (RAG) system with multi-format document parsing (PDF, DOCX, CSV with link extraction), intelligent semantic chunking, and vector search using Qdrant. Features Azure OpenAI integration with conversation history, content filtering, adaptive garbage collection, and API monitoring with performance alerts. Built with async Quart backend, multiple embedding providers (SentenceTransformers, Azure, FastEmbed), and comprehensive lifecycle management.
Python
Quart
Qdrant
Azure OpenAI
RAG
Vector Search
NLP
Local document processing pipeline for PDF, DOCX and scanned images: OCR (Tesseract), text extraction, table detection & extraction, key-value parsing, entity recognition (spaCy), layout analysis and summarization. Results are produced as structured JSON suitable for downstream ingestion (data lakes, BI, search indexes). Repository is private — contact guch79@gmail.com for access and commercial options.
Python
Quart
OCR
Tesseract
spaCy
PDF processing
NLP
Fully asynchronous Quart web application with auto-generated PDF CV from dynamic content. Features intelligent HTML-to-PDF conversion with structured data extraction, professional formatting, and clickable links. Built with modern async Python patterns.
Quart
Uvicorn
FPDF2
Async Python
Jinja2
A live Model Context Protocol (MCP) server that provides context-aware crisis communication for AI agents. Test it live right here with the 'Generate Live Apology' button, or download the config to connect your own agent. Features multiple severity levels, styles (including Haiku), and SSE support.
Python
MCP Protocol
SSE
Docker
Async
FastMCP
Complete ETL pipeline for extracting data from Microsoft Dataverse, transforming with business logic, and loading to SQL Server. Includes fake data generation with Faker for testing before production deployment. Features parallel processing, connection pooling, and circuit breakers.
Python
Pandas
SQLAlchemy
Dataverse
Faker