One Weird Trick from 1990s DB Research to Slash Your RAG/Agent Eval Costs

Learn a 1990s database trick to efficiently compare dozens of RAG/agent configurations live on your data, adapting evaluation metrics incrementally with confidence intervals.

Overview

A confidence-interval-aware eval engine for comparing dozens of RAG/agent configs concurrently, with dynamic real-time control over running configs. It adapts a 1990s DB technique called “online aggregation” so you can compare configs live on your evals data instead of waiting for full batch processing. Demo on a public benchmark: BEIR’s SciFact with OpenAI models. No GPUs needed, just your laptop + OpenAI API key + free Colab.

Video

Links

Tech stack