Python · Supabase pgvector
RAG Ingestion
Python pipeline for ingesting documents into pgvector.
About this template
Python script that chunks documents (PDF, TXT, MD), generates embeddings via the OpenAI API, and stores vectors in Supabase pgvector. Reads configuration from a .env file and accepts a directory or single file as input. Includes hash-based deduplication and idempotent upsert so documents can be reprocessed without creating duplicates. Ready to use in RAG pipelines with any framework (LangChain, LlamaIndex, n8n AI nodes).
What you get
- ingest.py — main ingestion script
- requirements.txt with pinned dependencies
- SQL schema for embeddings table
- README in PT-BR and EN with usage examples
- ARCHITECTURE.md describing the pipeline
- .env.example with all variables
- Commercial LICENSE
Prerequisites
- Python 3.10+ with pip
- Supabase project with pgvector extension enabled
- OpenAI API key for embedding generation
- Dependencies listed in requirements.txt (included)
Built on
Python OpenAI Supabase pgvector