RAG : Unleashing the Full Potential of Large Language Models in Question Answering

Abdul Jaleel Kavungal
3 min readNov 3


Retrieval Augmented Generation (RAG) using large language models has ushered in a new era of question-answering capabilities over documents. While RAG harbors immense potential, its practical utility hinges on meticulous tweaking and tuning.

Here are few practical tips that can help to unleash full potential of LLMs while using RAG:

Optimized Retrieval

For coherence and relevance, enhance the retrieval process. Employ techniques like entity linking to bridge questions and documents. Incorporate generating clarifying questions to hone in on the most supportive text. For instance, if a user queries about “Einstein’s theories”, entity linking ensures that documents related to “Albert Einstein” are prioritized.

Hyperparameter Tuning

The number of documents retrieved is a critical hyperparameter. Overretrieval can introduce noise, diluting the relevance. Starting with a sweet spot of 10–20 documents can provide a balanced dataset for the RAG to operate on. This can be fine-tuned through experimentation to find the optimal number for different use cases.

Leverage Document Structure

Utilize the inherent structure and metadata of documents. Give precedence to information from headings, tables, and captions, which tend to be more succinct and focused. This strategy can significantly enhance the precision of retrieved information.

Segmentation for Focus

Rather than grappling with entire documents, segment them into coherent passages of 3–5 sentences. This helps the RAG system to generate more focused and precise answers. For example, extracting a passage about “Newton’s laws of motion” from a physics textbook can provide a direct and clear response to a relevant query.

Continuous Monitoring

Deploying a RAG system demands vigilant monitoring. For instance, if a user inquires about “symptoms of flu” and the model retrieves documents about “bird flu”, it’s an indication to refine the retrieval algorithm. Metrics like answer coherence and document segmentation would be pivotal in enhancing the system’s accuracy and relevance.

Dynamic Retrieval

A RAG system in a news aggregation service should prioritize recent articles for a query about “latest space missions”, while historical queries should pull from established archives. This demonstrates the need for temporal awareness in retrieval systems.

Cross-Lingual Retrieval

For culturally nuanced queries like “What’s the most popular food in Japan?”, retrieving documents in the native language might yield more authentic responses, highlighting the importance of cross-lingual capabilities in RAG systems.

Semantic Search Enhancements

Queries like “How to drive growth in a company?” require semantic understanding rather than mere keyword matching. A semantically enhanced RAG system would retrieve documents discussing comprehensive strategies for business growth, showcasing the importance of conceptual understanding in retrieval systems.

Contextual Passage Retrieval

For historical queries, it’s crucial that the RAG system understands the broader context. Retrieving passages about specific events without this context could lead to incomplete or disjointed answers.

Feedback Loops

Incorporating user feedback can significantly refine the effectiveness of RAG systems. For instance, a language learning app using RAG might adapt its responses based on user ratings, prioritizing more comprehensive grammar guides or examples in future queries.

RAG is not just a powerful new technique; it’s a complex system that thrives on engineering and experimentation. By integrating these strategies with real-world examples, RAG can evolve from a mere question-answering machine into a sophisticated, context-aware, and dynamically adaptive AI tool, successfully answering real-world questions with finesse and accuracy.



Abdul Jaleel Kavungal

No one is ready for the future, the best way to predict the future is to BUILD it