Inference Scaling for Long-Context Retrieval Augmented Generation

scroll ↓ to Resources

Note

  • investigate how performance scales with increasing magnitude of the inference compute?
  • they consider 2 advanced RAG modifications
    • Demonstration RAG (DRAG) combines RAG with few-shot examples; its inference compute scales with both number of documents and number of queries
    • Iterative Demonstration-Based RAG (IterDRAG)
      • Decomposes the query into simpler sub-queries.
      • For each sub-query, performs retrieval and uses fetched context to generate intermediate answers.
      • After all sub-queries are resolved, the retrieved context, sub-queries, and their answers are combined to synthesize the final answer.

Resources


table file.inlinks, file.outlinks from [[]] and !outgoing([[]])  AND -"Changelog"