Members-Only
Recent Talks & Demos are for members only
You must be an AI Tinkerers active member to view these talks and demos.
One Weird Trick from 1990s DB Research to Slash Your RAG/Agent Eval Costs
Learn a 1990s database trick to efficiently compare dozens of RAG/agent configurations live on your data, adapting evaluation metrics incrementally with confidence intervals.
A confidence-interval-aware eval engine for comparing dozens of RAG/agent configs concurrently, with dynamic real-time control over running configs. It adapts a 1990s DB technique called “online aggregation” so you can compare configs live on your evals data instead of waiting for full batch processing. Demo on a public benchmark: BEIR’s SciFact with OpenAI models. No GPUs needed, just your laptop + OpenAI API key + free Colab.