writing · 2024-02-27

The feedback loop is the product

A search system you can't measure is a search system you can't improve. Notes on building human-in-the-loop feedback for an enterprise RAG pipeline.

The first version of any retrieval system is a guess. The second version is only better if you measured the first, and most teams never set up the machinery to measure at all.

When I worked on ML recommendation and search pipelines, one realization reorganized everything for me. The feedback loop is not a feature you bolt onto the product. It is the product. Everything else is just the current snapshot of what the loop has taught you.

Three grains of feedback

Not all feedback is the same resolution, and pretending it is throws away signal. I ended up designing for three grains at once.

The first is binary: was this result useful, yes or no. It is cheap to collect and it gives you a trend line. The second is categorical, which tells you why a result failed. Wrong document, right document but wrong passage, stale, off-topic. This is what turns “quality went down” into a fixable ticket. The third is fine-grained: ratings and ranked comparisons, the expensive signal you collect sparingly to calibrate the cheap signal.

None of this is exotic. They are the same feedback shapes Perplexity, GPT, and Bing already rely on. Borrowing a proven shape beats inventing your own.

Plumbing is the hard part

The interesting modeling is maybe a fifth of the work. The rest is an API and a schema that can ingest feedback without slowing the product, store it so it stays queryable months later, and join it back to the incidents that produced it. Boring infrastructure, and the reason the whole thing compounds instead of evaporating.

A model improves once. A feedback loop improves every week you keep it alive.

#search#rag#evaluation#human-in-the-loop

← all writing