OllamaTextEmbedder
This component computes the embeddings of a string using embedding models compatible with the Ollama Library.
Name | OllamaTextEmbedder |
Source | https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/ollama |
Most common position in a pipeline | Before an embedding Retriever in a query/RAG pipeline |
Mandatory input variables | “text”: A string |
Output variables | “embedding”: A list of float numbers (vectors) “meta”: A dictionary of metadata strings |
OllamaDocumentEmbedder
computes the embeddings of a list of documents and stores the obtained vectors in the embedding field of each document. It uses embedding models compatible with the Ollama Library.
The vectors computed by this component are necessary to perform embedding retrieval on a collection of documents. At retrieval time, the vector that represents the query is compared with those of the documents to find the most similar or relevant documents.
Overview
OllamaTextEmbedder
should be used to embed a string. For embedding a list of documents, use the OllamaDocumentEmbedder
.
The component uses http\://localhost:11434/api/embeddings
as the default URL as most available setups (Mac, Linux, Docker) default to port 11434.
Compatible Models
Unless specified otherwise while initializing this component, the default embedding model is "orca-mini". See other possible pre-built models in Ollama's library. To load your own custom model, follow the instructions from Ollama.
Installation
To start using this integration with Haystack, install the package with:
pip install Ollama-haystack
Make sure that you have a running Ollama model (either through a docker container, or locally hosted). No other configuration is necessary as Ollama has the embedding API built in.
Embedding Metadata
Most embedded metadata contains information about the model name and type. You can pass optional arguments, such as temperature, top_p, and others, to the Ollama generation endpoint.
The name of the model used will be automatically appended as part of the metadata. An example payload using the orca-mini model will look like this:
{'meta': {'model': 'orca-mini'}}
Usage
On its own
from haystack_integrations.components.embedders.ollama import OllamaTextEmbedder
embedder = OllamaTextEmbedder()
result = embedder.run(text="What do llamas say once you have thanked them? No probllama!")
print(result['embedding'])
In a pipeline
from haystack import Document
from haystack import Pipeline
from haystack.document_stores.in_memory import InMemoryDocumentStore
from cohere_haystack.embedders.text_embedder import OllamaTextEmbedder
from cohere_haystack.embedders.document_embedder import OllamaDocumentEmbedder
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
document_store = InMemoryDocumentStore(embedding_similarity_function="cosine")
documents = [Document(content="My name is Wolfgang and I live in Berlin"),
Document(content="I saw a black horse running"),
Document(content="Germany has many big cities")]
document_embedder = OllamaDocumentEmbedder()
documents_with_embeddings = document_embedder.run(documents)['documents']
document_store.write_documents(documents_with_embeddings)
query_pipeline = Pipeline()
query_pipeline.add_component("text_embedder", OllamaTextEmbedder())
query_pipeline.add_component("retriever", InMemoryEmbeddingRetriever(document_store=document_store))
query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
query = "Who lives in Berlin?"
result = query_pipeline.run({"text_embedder":{"text": query}})
print(result['retriever']['documents'][0])
Updated 8 months ago