Learnings & Thoughts on RAG

I enjoyed this essay on X about RAG and learned a lot.

Part of LLMs advancing is a technology called RAG (retrieval augmentation generation). RAG is an evolving technology in the world of information retrieval for LLMs but it has problems.

For large models, freshness of source data for time sensitive queries is a limiter right now. A LLM needs to update it’s data source and put it back through the AI model and know to site the new source when a user writes a prompt or query.
RAGs has a similar problem for smaller models too. Amongst a corpus of internal documents with perhaps conflicting information because there are drafts how does the RAG know which once to use?

I think one important quote from Aaron that ties to a post from Google is this: “The AI’s answer is only as good as the underlying information that you serve it in the prompt.” Google’s Liz Reid wrote: “With our custom Gemini model’s multi-step reasoning capabilities, AI Overviews will help with increasingly complex questions. Rather than breaking your question into multiple searches, you can ask your most complex questions, with all the nuances and caveats you have in mind, all in one go.”

This is something we started talking about 12+ months ago at this point where the better the prompt the better the answer. It’s clear search engines and LLMs are trying to nudge humans to change search behavior in this way to improve answers and reduce hallucinations.

The question on my mind is, will consumers change their search behavior to more complex searches or will tech companies need to find other ways to improve source identification during RAG to improve output?

One hard problem with AI right now is retrieval augmented generation (RAG) with wide-ranging heterogeneous information. A common architecture pattern in AI right now is that you connect up a large amount of data to an AI model, and when a user or machine sends in a query, you…
— Aaron Levie (@levie) May 24, 2024