chunking strategy
scroll β to Resources
Note
- ==chunking strategy== can have a huge impact on RAG performance. ^c77646
- small chunks β limited context β incomplete answers
- large chunks β noise in data β poor recall
- general, but not-universal advice: use larger chunks for fixed-output queries (e.g. extracting a specific answer\number) and smaller chunks for expanding-output queries (e.g. summarize, list allβ¦).
- By symbols, sentences, semantic meaning, using dedicated model or an LLM call
- semantic chunking by detecting where the change of topic has happened
- Consider inference latency, number of tokens embedding models were trained on
- Overlapping or not?
- Use small chunks on embedding stage and large size during the inference, by appending adjacent chunks before feeding to LLM
- page-size chunks, because we answer the question βon which page can I find this?β
- sub-chanks with links to a parent-chunk with larger context
- Shuffling context chunks will create randomness in outputs, which is comparable to increasing diversity of the downstream output (as an alternative to hyperparameter tuning using softmax temperature) - e.g. previously purchased items are provided in random order to make recommendation engine output more creative ^447647
- shuffle the order of retrieved sources to prevent position bias
- unless sources are sorted by relevance (the model assumes that the 1st chunk is the most relevant)
- newer models with large context windows are less prone to the Lost in the Middle effect and have improved recall across the whole context window
- shuffle the order of retrieved sources to prevent position bias
Resources
Links to this File
table file.inlinks, file.outlinks from [[]] and !outgoing([[]]) AND -"Changelog"