RAG A/B Testing: A Scientific Approach to Comparing Pipeline Configurations
"Adding a Cross-Encoder feels better" is not a scientific evaluation. A/B testing tells you whether a change actually works, how much it helps, and which query types benefit.