chunking strategy

scroll ↓ to Resources

Note

  • ==chunking strategy== can have a huge impact on RAG performance. ^c77646
    • small chunks β‡’ limited context β‡’ incomplete answers
    • large chunks β‡’ noise in data β‡’ poor recall
      • general, but not-universal advice: use larger chunks for fixed-output queries (e.g. extracting a specific answer\number) and smaller chunks for expanding-output queries (e.g. summarize, list all…).
    • By symbols, sentences, semantic meaning, using dedicated model or an LLM call
    • semantic chunking by detecting where the change of topic has happened
    • Consider inference latency, number of tokens embedding models were trained on
    • Overlapping or not?
    • Use small chunks on embedding stage and large size during the inference, by appending adjacent chunks before feeding to LLM
    • page-size chunks, because we answer the question β€œon which page can I find this?”
    • sub-chanks with links to a parent-chunk with larger context
  • Shuffling context chunks will create randomness in outputs, which is comparable to increasing diversity of the downstream output (as an alternative to hyperparameter tuning using softmax temperature) - e.g. previously purchased items are provided in random order to make recommendation engine output more creative ^447647
    • shuffle the order of retrieved sources to prevent position bias
      • unless sources are sorted by relevance (the model assumes that the 1st chunk is the most relevant)
      • newer models with large context windows are less prone to the Lost in the Middle effect and have improved recall across the whole context window

Resources


table file.inlinks, file.outlinks from [[]] and !outgoing([[]])  AND -"Changelog"