Particle.news

WinnowRAG and TeaRAG Debut With Promises to Cut RAG Noise and Token Load

New preprints outline techniques to make RAG answers cleaner with fewer tokens.

Overview

  • WinnowRAG proposes a two-stage approach that clusters retrieved documents by query, assigns LLM agents to each cluster, then uses a critic model to winnow out noisy content with strategic merging to retain useful evidence.
  • TeaRAG targets efficiency by compressing retrieval into a graph of concise triplets with Personalized PageRank to surface key facts and by trimming reasoning steps via Iterative Process-aware Direct Preference Optimization.
  • The authors report that TeaRAG improved average Exact Match by 4% and 2% while cutting output tokens by 61% and 59% on Llama3-8B-Instruct and Qwen2.5-14B-Instruct across six datasets, and they released code on GitHub.
  • WinnowRAG is described as model-agnostic and requiring no fine-tuning, with experiments reported to outperform state-of-the-art baselines on multiple realistic datasets.
  • A separate community guide shows a minimal local RAG stack using Go, Ollama, and Postgres with pgvector, while both research papers are new on arXiv and their claims await independent validation.