chunking strategy

_{scroll ↓ to Resources}

Note

==chunking strategy== can have a huge impact on RAG performance. ^c77646
- small chunks ⇒ limited context ⇒ incomplete answers
- large chunks ⇒ noise in data ⇒ poor recall
  - general, but not-universal advice: use larger chunks for fixed-output queries (e.g. extracting a specific answer\number) and smaller chunks for expanding-output queries (e.g. summarize, list all…).
- By symbols, sentences, semantic meaning, using dedicated model or an LLM call
- semantic chunking by detecting where the change of topic has happened
- Consider inference latency, number of tokens embedding models were trained on
- Overlapping or not?
- Use small chunks on embedding stage and large size during the inference, by appending adjacent chunks before feeding to LLM
- page-size chunks, because we answer the question “on which page can I find this?”
- sub-chanks with links to a parent-chunk with larger context
Shuffling context chunks will create randomness in outputs, which is comparable to increasing diversity of the downstream output (as an alternative to hyperparameter tuning using softmax temperature) - e.g. previously purchased items are provided in random order to make recommendation engine output more creative ^447647
- shuffle the order of retrieved sources to prevent position bias
  - unless sources are sorted by relevance (the model assumes that the 1st chunk is the most relevant)
  - newer models with large context windows are less prone to the Lost in the Middle effect and have improved recall across the whole context window

Resources

Links to this File

table file.inlinks, file.outlinks from [[]] and !outgoing([[]])  AND -"Changelog"

Fluent Numbers 🌱

On this site

chunking strategy

Note

Resources

Links to this File

Graph View

On this page

Backlinks

Recent

chunking strategy

hard negative

How to kindly request the best interview feedback

Evaluating information retrieval

synthetic data generation for RAG evaluation