Advanced RAG 06: Generating results with low relevance? Quickly use Query Rewriting optimization technology

Estimated read time 31 min read

In real life, it is difficult for ordinary users to write appropriate prompt words (prompt) to instruct LLM to complete the desired tasks. Queries raised by users often have problems such as inaccurate vocabulary and lack of semantic information, making it difficult for LLM to understand and generate relevant model responses. Therefore, how to optimize queries and enhance LLM’s ability to accurately understand various query information is an important issue that needs to be overcome.

This article will explore some mainstream Query Rewriting technologies to optimize the performance of retrieval augmentation generation (RAG) systems.

This article introduces and analyzes the following Query Rewriting technologies in detail:

  • Hypothetical Document Embeddings (HyDE): Use LLM to generate hypothetical documents that are highly relevant to the query, so that they can maintain similar semantic attributes to the query submitted by the user in the embedding space.
  • Rewrite-Retrieve-Read: Rewrite query first, then retrieve the content to generate a model response.
  • Step-Back Prompting: Guides LLM to abstract high-level concepts from specific problems, helping the model to reason correctly.
  • Query2doc: Combines query and hypothetical documents generated by LLM to construct a new query representation at the semantic level.
  • ITER-RETGEN: iterative retrieval generation, which can use the previous round of large model generation results to guide a new round of retrieval.

Query Rewriting technology brings new optimization directions to the RAG system, but it also faces a large number of challenges (such as the high cost of LLM calls, etc.). Choosing the appropriate combination of optimization methods based on specific application scenarios is the guiding ideology of RAG system optimization.

In Retrieval Augmented Generation (RAG) systems, problems related to user’s original queries (for example, inaccurate vocabulary or lack of semantics) often occur information), making the RAG system difficult to understand. For example, a question like “The 2020 NBA champions are the Los Angeles Lakers! Please tell me what is the langchain framework?” If you search this question directly, it may cause LLM to give wrong answers or unanswerable model responses.

Therefore, it is important to unify the semantic space of user queries with the semantic space of documents stored in the system. Query rewriting (Translator’s Note: The process of reconstructing or rewriting user queries, trying to correct the wrong, vague or inaccurate parts that may exist in user queries.) Technology can effectively solve a problem. Its role in RAG is shown in Figure 1:

Figure 1: Query rewriting technology in RAG (marked by red dashed box). Image provided by the author.

From the perspective of its place in the RAG system, Query rewriting is a pre-retrieval method (Translator’s Note: Query rewriting is rewritten or improved before document retrieval.). This diagram roughly illustrates the position of Query rewriting in the RAG system, and below we will introduce some algorithms that can improve this process.

Query rewriting is a key technology for aligning the semantics of queries and documents stored in the system. For example:

  • Hypothetical Document Embeddings, HyDE: Aligns queries with the semantic space of documents stored in the system through hypothetical documents (not actual existing documents, but fictitious documents, used to align queries with the semantic space of documents stored in the system.)
  • Rewrite-Retrieve-Read: Proposes a framework that is different from the traditional retrieval and reading order, focusing on the use of query rewriting technology.
  • Step-Back Prompting: Allows LLM to reason and retrieve from abstract concepts or high-level information, rather than just specific details or low-level information.
  • Query2Doc: Use a small number of prompt words or relevant information to let LLM create pseudo-documents (Translator’s Note: Not real documents, used to assist the system in understanding user queries or information retrieval.). This information is then merged with user-entered queries to build new queries.
  • ITER-RETGEN: proposes a method to combine the results generated by the previous round with the previous query. Related documents are then retrieved and new results are generated. Repeat this process multiple times until you get the final result.

Let’s dig into the details of these methods.

01 Hypothetical Document Embeddings (HyDE)

The paper “Precise Zero-Shot Dense Retrieval without Relevance Labels” [1] proposes a method based on Hypothetical Document Embeddings (HyDE). Its main process is shown in Figure 2.

Figure 2: Schematic of the HyDE model, showing some document fragments. HyDE can serve all types of user-submitted queries without changing the underlying GPT-3 and Contriever/mContriever models. Source: Precise Zero-Shot Dense Retrieval without Relevance Labels

The process is mainly divided into four steps:

  1. Use LLM to generate k hypothetical documents based on queries submitted by users (Translator’s Note: fake documents generated by queries based on models (such as LLM) to simulate documents related to queries.). These generated documents may not correspond to actual facts and may contain errors, but they should be similar to the related documents. The purpose of this step is to interpret user-submitted queries through LLM.
  2. The generated hypothetical document is fed into the encoder, which maps it to a dense vector f(dk). We believe that the encoder has a filtering function that can filter out the noise in the hypothetical document. Here, dk represents the k-th generated document and f represents the encoder operation.
  3. Calculate the average of the following k vectors using the given formula :

We can also think of user-submitted queries q as reasonable assumptions about a problem:

4.  Use vector v to retrieve relevant content from the document library. As mentioned in step 3, this vector contains the user’s query and the specific information expected, which can improve recall.

My understanding of HyDE is shown in Figure 3. The goal of HyDE is to generate hypothetical documents (Translator’s Note: fake documents generated by models (such as LLM) based on queries, simulating documents related to queries.), so that the final query vector (query vector) v is in the vector space as much as possible It is close to and aligned with the actual document.

Figure 3: From my understanding, the goal of HyDE is to generate hypothetical documents. In this way, the final query vector v will be as close and aligned as possible to the actual document in the vector space. Picture provided by the original author.

LlamaIndex[2] and Langchain[3] both implement HyDE. The following uses LlamaIndex as an example.

Place this file [4] in YOUR_DIR_PATH . The test code is as follows (the LlamaIndex version I installed is 0.10.12):

import os

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.indices.query.query_transform import HyDEQueryTransform
from llama_index.core.query_engine import TransformQueryEngine

# Load documents, build the VectorStoreIndex
dir_path = "YOUR_DIR_PATH"
documents = SimpleDirectoryReader(dir_path).load_data()
index = VectorStoreIndex.from_documents(documents)

query_str = "what did paul graham do after going to RISD"

# Query without transformation: The same query string is used for embedding lookup and also summarization.
query_engine = index.as_query_engine()
response = query_engine.query(query_str)

print('-' * 100)
print("Base query:")

# Query with HyDE transformation
hyde = HyDEQueryTransform(include_original=True)
hyde_query_engine = TransformQueryEngine(query_engine, hyde)
response = hyde_query_engine.query(query_str)

print('-' * 100)
print("After HyDEQueryTransform:")

First, take a look at the default HyDE prompt words in LlamaIndex[5]:


    "Please write a passage to answer the question\n"
    "Try to include as many key details as possible.\n"

DEFAULT_HYDE_PROMPT = PromptTemplate(HYDE_TMPL, prompt_type=PromptType.SUMMARY)

The code of the HyDEQueryTransform class [6] is as follows. The purpose of the _def run function is to generate a hypothetical document. Three debugging statements are added to the _def run function to monitor the contents of the hypothetical document.

class HyDEQueryTransform(BaseQueryTransform):
    """Hypothetical Document Embeddings (HyDE) query transform.

    It uses an LLM to generate hypothetical answer(s) to a given query,
    and use the resulting documents as embedding strings.

    As described in `[Precise Zero-Shot Dense Retrieval without Relevance Labels]

    def __init__(
        llm: Optional[LLMPredictorType] = None,
        hyde_prompt: Optional[BasePromptTemplate] = None,
        include_original: bool = True,
    ) -> None:
        """Initialize HyDEQueryTransform.

            llm_predictor (Optional[LLM]): LLM for generating
                hypothetical documents
            hyde_prompt (Optional[BasePromptTemplate]): Custom prompt for HyDE
            include_original (bool): Whether to include original query
                string as one of the embedding strings

        self._llm = llm or Settings.llm
        self._hyde_prompt = hyde_prompt or DEFAULT_HYDE_PROMPT
        self._include_original = include_original

    def _get_prompts(self) -> PromptDictType:
        """Get prompts."""
        return {"hyde_prompt": self._hyde_prompt}

    def _update_prompts(self, prompts: PromptDictType) -> None:
        """Update prompts."""
        if "hyde_prompt" in prompts:
            self._hyde_prompt = prompts["hyde_prompt"]

    def _run(self, query_bundle: QueryBundle, metadata: Dict) -> QueryBundle:
        """Run query transform."""
        # TODO: support generating multiple hypothetical docs
        query_str = query_bundle.query_str
        hypothetical_doc = self._llm.predict(self._hyde_prompt, context_str=query_str)
        embedding_strs = [hypothetical_doc]
        if self._include_original:

        # The following three lines contain the added debug statements.
        print('-' * 100)
        print("Hypothetical doc:")

        return QueryBundle(

Run the test code as follows:

(llamaindex_010) Florian:~ Florian$ python /Users/Florian/Documents/ 
Base query:
Paul Graham resumed his old life in New York after attending RISD. He became rich and continued his old patterns, but with new opportunities such as being able to easily hail taxis and dine at charming restaurants. He also started experimenting with a new kind of still life painting technique.
Hypothetical doc:
["After attending the Rhode Island School of Design (RISD), Paul Graham went on to co-found Viaweb, an online store builder that was later acquired by Yahoo for $49 million. Following the success of Viaweb, Graham became an influential figure in the tech industry, co-founding the startup accelerator Y Combinator in 2005. Y Combinator has since become one of the most prestigious and successful startup accelerators in the world, helping launch companies like Dropbox, Airbnb, and Reddit. Graham is also known for his prolific writing on technology, startups, and entrepreneurship, with his essays being widely read and respected in the tech community. Overall, Paul Graham's career after RISD has been marked by innovation, success, and a significant impact on the startup ecosystem.", 'what did paul graham do after going to RISD']
After HyDEQueryTransform:
After going to RISD, Paul Graham resumed his old life in New York, but now he was rich. He continued his old patterns but with new opportunities, such as being able to easily hail taxis and dine at charming restaurants. He also started to focus more on his painting, experimenting with a new technique. Additionally, he began looking for an apartment to buy and contemplated the idea of building a web app for making web apps, which eventually led him to start a new company called Aspra.

embedding_strs is a list containing two elements. The first element in the list is the generated hypothetical document, and the second element is the original query. They are combined into a list for vector calculations.

In this case, HyDE was created by accurately imagining Paul Graham. In RISD (Translator’s Note: Abbreviation of Rhode Island School of Design), an art and design school located in Providence, Rhode Island, USA. In 2023, in the QS World of Art and Design Schools Ranked 3rd and 1st in the United States.) What I did after graduation (see the hypothetical document in the case) has greatly improved the quality of the output. This improves the quality of embeddings and final model outputs.

Of course, HyDE has its share of failures. Readers who are interested in this can test it by visiting this link [7].

The HyDE method appears to be unsupervised. This method does not train any model through labeled data, including the generative model (Translator’s Note: Its main task is to learn the distribution of data from the data, which can then be used to generate something like New data samples of training data, usually used to generate images, text, audio and other types of data) and contrastive encoder (Translator’s Note: Used to encode data into a vector representation with contrasting features. In this method, Similar samples are pulled closer in the vector space, while dissimilar samples are pushed away).

In summary, although HyDE introduces a new query rewriting method, this method also has some limitations. This approach does not rely on similarity between query embeddings, but instead emphasizes the similarity of one document to another. But if the language model is unfamiliar with that particular vertical, it may not always produce the best output, potentially leading to an increase in erroneous content.

02 Model-based method

This technology is proposed by the paper “Query Rewriting for Retrieval-Augmented Large Language Models” [8]. The paper believes that in real-world scenarios, original queries submitted by users may not all be suitable for direct handover to LLM for retrieval.

Therefore, this paper suggests that we should first rewrite queries using LLM. Instead of directly retrieving the content and generating a model response from the original query, the content is then retrieved and the model response is generated , as shown in Figure 4(b).

Figure 4: From left to right, (a) the standard “retrieve-then-read” method, (b) using LLM as a query rewriter in the “rewrite-retrieve-read” pipeline, and (c) using the available Trained rewriter. Source: Query Rewriting for Retrieval-Augmented Large Language Models[8].

To illustrate how query rewriting technology affects context retrieval and prediction performance, look at this prompt word example: “The 2020 NBA champion is the Los Angeles Lakers! Please tell me what the langchain framework is?” By rewriting (rewriting) is handled accurately.

This process can be implemented using Langchain[9]. The basic libraries required for installation are as follows:

pip install langchain
pip install openai
pip install langchainhub
pip install duckduckgo-search
pip install langchain_openai

Environment configuration and Python library import:

import os

from langchain_community.utilities import DuckDuckGoSearchAPIWrapper
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI

Build a process chain that handles queries and execute some simple queries to test the function and effect of the process chain:

def june_print(msg, res):
 print('-' * 100)

base_template = """Answer the users question based only on the following context:


Question: {question}

base_prompt = ChatPromptTemplate.from_template(base_template)

model = ChatOpenAI(temperature=0)

search = DuckDuckGoSearchAPIWrapper()

def retriever(query):

chain = (
 {"context": retriever, "question": RunnablePassthrough()}
 | base_prompt
 | model
 | StrOutputParser()

query = "The NBA champion of 2020 is the Los Angeles Lakers! Tell me what is langchain framework?"

 'The result of query:', 

 'The result of the searched contexts:', 

The running results are as follows:

(langchain) Florian:~ Florian$ python /Users/Florian/Documents/ 
The result of query:
I'm sorry, but the context provided does not mention anything about the langchain framework.
The result of the searched contexts:
The Los Angeles Lakers are the 2020 NBA Champions!Watch their championship celebration here!Subscribe to the NBA: Full Game Highli... Aug 4, 2023. The 2020 Los Angeles Lakers were truly one of the most complete teams over the decade. LeBron James' fourth championship was one of the biggest moments of his career. Only two players from the 2020 team remain on the Lakers. In the storied history of the NBA, few teams have captured the imagination of fans and left a lasting ... James had 28 points, 14 rebounds and 10 assists, and the Lakers beat the Miami Heat 106-93 on Sunday night to win the NBA finals in six games. James was also named Most Valuable Player of the NBA ... Portland Trail Blazers star Damian Lillard recently spoke about the 2020 NBA "bubble" playoffs and had an interesting perspective on the criticism the eventual winners, the Los Angeles Lakers, faced. But perhaps none were more surprising than Adebayo's opinion on the 2020 NBA Finals. The Heat were defeated by LeBron James and the Los Angeles Lakers in six games. Miller asked, "Tell me about ...

The results show that there is very little information about “langchain” in the searched context.

Now start building a rewriter to rewrite the search query (Translator’s Note: The question the user wants the system to answer or the keyword to provide information.).

rewrite_template = """Provide a better search query for \
web search engine to answer the given question, end \
the queries with ’**’. Question: \
{x} Answer:"""
rewrite_prompt = ChatPromptTemplate.from_template(rewrite_template)

def _parse(text):
 return text.strip("**")

rewriter = rewrite_prompt | ChatOpenAI(temperature=0) | StrOutputParser() | _parse
 'Rewritten query:', 
    rewriter.invoke({"x": query})

The running results are as follows:

Rewritten query:
What is langchain framework and how does it work?

Build rewrite_retrieve_read_chain and use the rewritten query .

rewrite_retrieve_read_chain = (
 "context": {"x": RunnablePassthrough()} | rewriter | retriever,
 "question": RunnablePassthrough(),
 | base_prompt
 | model
 | StrOutputParser()

 'The result of the rewrite_retrieve_read_chain:', 

The running results are as follows:

The result of the rewrite_retrieve_read_chain:
LangChain is a Python framework designed to help build AI applications powered by language models, particularly large language models (LLMs). It provides a generic interface to different foundation models, a framework for managing prompts, and a central interface to long-term memory, external data, other LLMs, and more. It simplifies the process of interacting with LLMs and can be used to build a wide range of applications, including chatbots that interact with users naturally.

At this point, by rewriting query , we have successfully obtained the correct answer.


“STEP-BACK PROMPTING[10]” is a simple prompt word technology that allows learners to abstract and refine high-level concepts and a certain field from questions containing a large number of specific details. The basic rules or core principles (basic principles) in . Enables large language models (LLMs) to abstract, extracting high-level concepts and fundamental principles from examples containing specific details. The idea is to define “step-back problems” as more abstract problems derived from the original concrete problem.

For example, if a query contains a lot of details, it will be difficult for LLM to retrieve relevant facts to help solve the task. As shown in the first example in Figure 5, for the physics question “What happens to the pressure, P, of an ideal gas if the temperature increases by a factor of 2 and the volume increases by a factor of 8?” if the temperature is increased by a factor of 2 and the volume is increased by a factor of 8?)” When reasoning directly about this problem, the model response of the LLM may deviate from the first principle of the ideal gas law. ideal gas law).

Likewise, the question “Estella Leopold went to which school between Aug 1954 and Nov 1954?” cannot be addressed directly due to the constraints of a specific time frame. The questions are extremely challenging.

Figure 5: Schematic diagram of the STEP-BACK PROMPTING technical process, using concepts and principles to guide the two steps of abstraction and reasoning. Top: Example of MMLU high school physics course content, abstracting to first principles of the ideal gas law. Bottom: Some specific information (such as details of someone’s educational experience) is abstracted and summarized by the TimeQA system into the higher-level concept of “education history”. Left: PaLM-2L failed to answer the original question posed by the user. Chain-of-Thought prompting errors occur (marked in red) during intermediate steps of the reasoning process. Right: PaLM-2L successfully answered the question via STEP-BACK PROMPTING technology. Source: TAKE A STEP BACK: EVOKING REASONING VIA ABSTRACTION IN LARGE LANGUAGE MODELS[10].

In both cases, asking a broader question can help the model answer the specific query efficiently. Instead of directly asking “Which school did Estela Leopold attend at a specific time,” we can ask “Estela Leopold’s educational history.”

This broader question summarizes the original question asked by the user and provides all the information necessary to infer “Which school Estela Leopold attended at a specific time.” It’s worth noting that these broader questions are often easier to answer than the original specific question.

The chain of thought derived from these abstractions helps prevent errors in the intermediate “chain of thought” steps shown in Figure 5 (left).

In summary, STEP-BACK PROMPTING consists of two basic steps:

  • Abstraction : First, ask the LLM to ask a broad question about a high-level concept or principle instead of answering the query directly. Then retrieve facts related to the above concept or principle.
  • Reasoning : LLM can derive answers to original questions posed by users based on these facts about high-level concepts or principles. We call this abstract reasoning.

To illustrate how step-back prompting affects the performance of context retrieval and prediction, this step is implemented using Langchain [11].

Environment configuration and import of related libraries:

import os

from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplateFewShotChatMessagePromptTemplate
from langchain_core.runnables import RunnableLambda
from langchain_openai import ChatOpenAI
from langchain_community.utilities import DuckDuckGoSearchAPIWrapper

Build a process chain that handles queries and execute some simple queries to test the function and effect of the process chain:

def june_print(msg, res):
 print('-' * 100)

question = "was chatgpt around while trump was president?"

base_prompt_template = """You are an expert of world knowledge. I am going to ask you a question. Your response should be comprehensive and not contradicted with the following context if they are relevant. Otherwise, ignore them if they are not relevant.


Original Question: {question}

base_prompt = ChatPromptTemplate.from_template(base_prompt_template)

search = DuckDuckGoSearchAPIWrapper(max_results=4)
def retriever(query):

base_chain = (
 # Retrieve context using the normal question (only the first 3 results)
 "normal_context": RunnableLambda(lambda x: x["question"]) | retriever,
 # Pass on the question
 "question": lambda x: x["question"],
 | base_prompt
 | ChatOpenAI(temperature=0)
 | StrOutputParser()

june_print('The searched contexts of the original question:', retriever(question))
june_print('The result of base_chain:', base_chain.invoke({"question": question}) )

The running results are as follows:

(langchain) Florian:~ Florian$ python /Users/Florian/Documents/ 
The searched contexts of the original question:
While impressive in many respects, ChatGPT also has some major flaws. ... [President's Name]," refused to write a poem about ex-President Trump, but wrote one about President Biden ... The company said GPT-4 recently passed a simulated law school bar exam with a score around the top 10% of test takers. By contrast, the prior version, GPT-3.5, scored around the bottom 10%. The ... These two moments show how Twitter's choices helped former President Trump. ... With ChatGPT, which launched to the public in late November, users can generate essays, stories and song lyrics ... Donald Trump is asked a question—say, whether he regrets his actions on Jan. 6—and he answers with something like this: " Let me tell you, there's nobody who loves this country more than me ...
The result of base_chain:
Yes, ChatGPT was around while Trump was president. ChatGPT is an AI language model developed by OpenAI and was launched to the public in late November. It has the capability to generate essays, stories, and song lyrics. While it may have been used to write a poem about President Biden, it also has the potential to be used in various other contexts, including generating responses from hypothetical scenarios involving former President Trump.

The result is obviously incorrect. Start building step_back_question_chain and step_back_chain to get correct model output.

# Few Shot Examples
examples = [
 "input": "Could the members of The Police perform lawful arrests?",
 "output": "what can the members of The Police do?",
 "input": "Jan Sindel’s was born in what country?",
 "output": "what is Jan Sindel’s personal history?",
# We now transform these to example messages
example_prompt = ChatPromptTemplate.from_messages(
 ("human", "{input}"),
 ("ai", "{output}"),
few_shot_prompt = FewShotChatMessagePromptTemplate(

step_back_prompt = ChatPromptTemplate.from_messages(
 """You are an expert at world knowledge. Your task is to step back and paraphrase a question to a more generic step-back question, which is easier to answer. Here are a few examples:""",
 # Few shot examples
 # New question
 ("user", "{question}"),
step_back_question_chain = step_back_prompt | ChatOpenAI(temperature=0) | StrOutputParser()
june_print('The step-back question:', step_back_question_chain.invoke({"question": question}))
june_print('The searched contexts of the step-back question:', retriever(step_back_question_chain.invoke({"question": question})) )

response_prompt_template = """You are an expert of world knowledge. I am going to ask you a question. Your response should be comprehensive and not contradicted with the following context if they are relevant. Otherwise, ignore them if they are not relevant.


Original Question: {question}
response_prompt = ChatPromptTemplate.from_template(response_prompt_template)

step_back_chain = (
 # Retrieve context using the normal question
 "normal_context": RunnableLambda(lambda x: x["question"]) | retriever,
 # Retrieve context using the step-back question
 "step_back_context": step_back_question_chain | retriever,
 # Pass on the question
 "question": lambda x: x["question"],
 | response_prompt
 | ChatOpenAI(temperature=0)
 | StrOutputParser()

june_print('The result of step_back_chain:', step_back_chain.invoke({"question": question}) )

The running results are as follows:

The step-back question:
When did ChatGPT become available?
The searched contexts of the step-back question:
OpenAI released an early demo of ChatGPT on November 30, 2022, and the chatbot quickly went viral on social media as users shared examples of what it could do. Stories and samples included ... March 14, 2023 - Anthropic launched Claude, its ChatGPT alternative. March 20, 2023 - A major ChatGPT outage affects all users for several hours. March 21, 2023 - Google launched Bard, its ... The same basic models had been available on the API for almost a year before ChatGPT came out. In another sense, we made it more aligned with what humans want to do with it. A paid ChatGPT Plus subscription is available. (Image credit: OpenAI) ChatGPT is based on a language model from the GPT-3.5 series, which OpenAI says finished its training in early 2022.
The result of step_back_chain:
No, ChatGPT was not around while Trump was president. ChatGPT was released to the public in late November, after Trump's presidency had ended. The references to ChatGPT in the context provided are all dated after Trump's presidency, such as the release of an early demo on November 30, 2022, and the launch of ChatGPT Plus subscription. Therefore, it is safe to say that ChatGPT was not around during Trump's presidency.

We can see that by “stepping back” the initial query given by the user (Translator’s Note: moving the question from a specific or detailed level to a more abstract or general level.) to a more abstract one problem, and using both the abstracted query and the non-abstracted query for retrieval, LLM improves its ability to solve problems along the correct reasoning path.

As Edsger W. Dijkstra said, “Through abstraction, we do not aim to make things vague or uncertain, but to extract the essence of the problem and obtain more accurate understanding and expression at a higher semantic level. ( The purpose of abstraction is not to be vague, but to create a new semantic level in which one can be absolutely precise)”

04 Query2doc

“Query2doc: Query Expansion with Large Language Models” [12] introduces the query2doc method. LLMs are guided by prompt words to generate pseudo-documents, and then these documents are combined with the original query to create a new query , as shown in Figure 6:

Figure 6. Schematic diagram of query2doc’s few-shot prompting technology. Due to space reasons, the figure does not show all the content in the context. Source: Query2doc: Query Expansion with Large Language Models[12].

In Dense Retrieval (Translator’s Note: Compared with traditional retrieval-based methods, this method uses pre-trained language models (such as BERT, RoBERTa, etc.) to encode documents and queries, and calculate the similarity between them.) In this method, the new query is represented by q+, which is a simple concatenation of the original query (q) and pseudo-documents (d’), separated by [SEP]: q+ = concat(q, [SEP], d’) .

Query2doc believes that HyDE implicitly assumes that real documents (groundtruth documents) and pseudo-documents use different vocabulary to express the same semantics, which may not be true for some queries.

Another difference between Query2doc and HyDE is that Query2doc trains a supervised dense retriever (Translator’s Note: In this retriever, annotated training data is usually used to train the model and learn how to associate the query with related documents Matching. Supervised learning methods can help the model better understand the semantic relationship between the query and the document, thereby improving the accuracy and efficiency of retrieval), as described in this paper [12].

Currently, there are no similar techniques or tools found for query2doc in Langchain or LlamaIndex.


ITER-RETGEN This approach uses the generated content to guide the retrieval process. It repeatedly uses “retrieval-enhanced generation” technology and “generation-enhanced retrieval” technology in the “Retrieve-Read-Retrieve-Read” process.

Figure 7: ITER-RETGEN iteratively performs retrieval and generation. In each iteration, ITER-RETGEN uses the model output of the previous iteration as context to help retrieve more relevant knowledge. This approach helps improve the response generated by the model (such as the corrected Hesse Hogan’s height case in this figure ). For the sake of brevity, only two iterations are shown in this figure. The solid arrows connect the query and the retrieved knowledge, and the dashed arrows represent the retrieval enhancement generation process. Source: Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy[13].

As shown in Figure 7, for a given question q and a retrieval corpus D = {d} (where d represents a document paragraph), ITER-RETGEN will continuously perform T retrieval generation.

In each iteration t, the generated result yt-1 of the previous iteration is first used, combined with q, and the first k paragraphs are retrieved. The LLM model M is then guided to generate a model output yt that incorporates the retrieved document paragraph (denoted as Dyt-1||q) and q into the prompt words. Therefore, each iteration can be expressed as follows:

The final output yt will be produced as the final model response.

Similar to Query2doc, no similar techniques or tools for ITER-RETGEN are currently found in Langchain or LlamaIndex.

05 Conclusion

This article introduces various query rewriting techniques and provides some code demonstrations.

In practice, all of these query rewriting methods can be tried. Which method or combination of methods is used depends on the specific effect.

However, no matter which query rewriting method is used, calling LLM will bring some performance issues (Translator’s Note: For example, slower speed, higher resource consumption, etc.), which need to be considered in actual use.

In addition, there are some methods, such as query routing, decomposing queries into multiple sub-problems, etc. They do not belong to query rewriting technology, but to pre-retrieval methods (pre-retrieval methods). I will introduce these in subsequent articles. method.

If you are interested in RAG technology, feel free to read the other articles in this series! If you have any questions, please ask in the comments section.

Thanks for reading!

You May Also Like

More From Author

+ There are no comments

Add yours