SearchEngine
A component that makes it possible to search the web. You can specify the provider, such as Google or Bing, you want to use for search. Or you can use it with a PromptNode to construct a fully-fledged search system.
A search engine can act as one of the Agent's tools, but you can also use it independently or in your pipelines.
Position in a Pipeline | Replaces a Retriever in a query pipeline |
Input | Query |
Output | Documents |
Classes | WebSearch |
SearchEngine searches the web to find the results that best answer the query.
- It uses page snippets to find the answers, not the whole pages. (Snippets are pieces of text displayed under the page title in search results.) If you want your search to run on the entire pages, use WebRetriever in the
documents
orpreprocessed_documents
mode instead. - SearchEngine doesn't filter or sort the results. If you want to make sure it fetches the best results, use SearchEngine together with TopPSampler.
WebSearch
Searches the Internet using a search engine provider you specify. You can choose providers such as GoogleAPI, SerpAPI, SerperDev, or BingAPI. You must have an active API key for the search engine provider to be able to use it.
WebSearch returns multiple results that fit the query. It doesn't arrive at one final answer. You need Sampler to limit the number of results WebSearch returns.
Usage
You can use the search engine as an Agent's tool, stand-alone, or in a pipeline.
As the Agent's Tool
from haystack.nodes import WebSearch, PromptTemplate
from haystack.nodes.search_engine import WebSearch
from haystack.agents import Agent, Tool
# Let's configure the web search to user SerperDev as the search engine provider
# SerperDev is the default provider, so we just need the API key
search = WebSearch(api_key=serperdev_api_key)
# You could configure the PromptNode to use a custom model and stop words
# We're skipping this step in this example
agent = Agent(prompt_node=prompt_node)
agent.add_tool(
Tool(
name="WebSearch"
pipeline_or_node=search
description="Useful when you need to find answers to questions on the internet."
))
For more information, see Agent.
Stand-Alone
from haystack.nodes.search_engine import WebSearch
# This search uses the default SerperDev provider, so we just need the API key
ws = WebSearch(api_key=serperdev_api_key)
documents: List[Documents] = ws.run(query="What's the meaning of life?")
In a Pipeline
SearchEngine can replace the Retriever in a pipeline:
import os
from haystack import Pipeline
from haystack.nodes import PromptNode, PromptTemplate, Shaper, WebSearch
search_key = os.environ.get("SERPERDEV_API_KEY")
if not search_key:
raise ValueError("Set the SERPERDEV_API_KEY environment variable")
openai_key = os.environ.get("OPENAI_API_KEY")
if not search_key:
raise ValueError("Set the OPENAI_API_KEY environment variable")
ws = WebSearch(api_key=search_key)
prompt_text = """
Synthesize a comprehensive answer from the following most relevant paragraphs and the given question.
Provide a clear and concise response that summarizes the key points and information presented in the paragraphs.
Your answer should be in your own words and be no longer than 50 words.
\n\n Paragraphs: $documents \n\n Question: $query \n\n Answer:
"""
pn = PromptNode(
"gpt-3.5-turbo-instruct",
default_prompt_template=PromptTemplate("lfqa", prompt_text=prompt_text),
api_key=openai_key,
max_length=256,
)
# Shaper helps us concatenate most relevant docs that we want to use as the context for the generated answer
shaper = Shaper(func="join_documents", inputs={"documents": "documents"}, outputs=["documents"])
pipe = Pipeline()
pipe.add_node(component=ws, name="web_search", inputs=["Query"])
pipe.add_node(component=shaper, name="shaper", inputs=["web_retriever"])
pipe.add_node(component=pn, name="prompt_node", inputs=["shaper"])
questions = ["What is the meaning of life?"]
for q in questions:
response = pipe.run(query=q)
print(f"{q} - {response['results'][0]}")
Changing the Provider
By default, the provider in WebSearch is SerperDev. However, you can change it to one of the other supported providers.
Here's an example of how to use the GoogleAPI:
from haystack.nodes.search_engine.providers import GoogleAPI
# see https://github.com/deepset-ai/haystack/blob/main/haystack/nodes/search_engine/providers.py#L305
search_engine = GoogleAPI (api_key="my_key", top_k=5, engine_id="engineID")
ws = WebSearch(api_key="my_key", search_engine_provider=search_engine, top_k=3)
Updated about 1 year ago