hard negative

scroll ↓ to Resources

Note

  • Each training sample consists of three objects\images\transactions\etc. - an anchor, a positive and a negative.
  • The idea is to make those a hard positive and a hard negative
  • This approach forces the model to learn the meaningful boundaries between similar but distinct concepts.
    • for a given anchor take positives which are far away in the embedding space (hard positives)
    • for a given anchor take negatives which are as close as possible (hard negatives)
      • use embedding search to find the most similar object\image\transaction, but from a different class then the anchor and positive
  • Actual user data is of utmost importance for creating hard negative examples: valuable negative examples come from user interactions that indicate a mismatch between what the system thought was relevant and what the user actually found useful

Examples

if users delete part of AI-generated content (email, blog, citations), then the query and deleted part are a good hard negative
- recommender systems
- skipping a song is weaker than deleting it from a playlist
- ecommerce
- bought and returned items
- data can also be generated synthetically
- create examples of the same abbreviation in different contexts

Resources


table file.inlinks, file.outlinks from [[]] and !outgoing([[]])  AND -"Changelog"