DocumentationAPI Reference📓 Tutorials🧑‍🍳 Cookbook🤝 Integrations💜 Discord🎨 Studio (Waitlist)
API Reference

Extracts predefined entities out of a piece of text.

Module named_entity_extractor

NamedEntityExtractorBackend

class NamedEntityExtractorBackend(Enum, metaclass=_BackendEnumMeta)

NLP backend to use for Named Entity Recognition.

HUGGING_FACE

Uses an Hugging Face model and pipeline.

SPACY

Uses a spaCy model and pipeline.

NamedEntityAnnotation

@dataclass
class NamedEntityAnnotation()

Describes a single NER annotation.

Arguments:

  • entity: Entity label.
  • start: Start index of the entity in the document.
  • end: End index of the entity in the document.
  • score: Score calculated by the model.

NamedEntityExtractor

@component
class NamedEntityExtractor()

Annotates named entities in a collection of documents.

The component supports two backends: Hugging Face and spaCy. The former can be used with any sequence classification model from the Hugging Face model hub, while the latter can be used with any spaCy model that contains an NER component. Annotations are stored as metadata in the documents.

Usage example:

from haystack import Document
from haystack.components.extractors.named_entity_extractor import NamedEntityExtractor

documents = [
    Document(content="I'm Merlin, the happy pig!"),
    Document(content="My name is Clara and I live in Berkeley, California."),
]
extractor = NamedEntityExtractor(backend="hugging_face", model="dslim/bert-base-NER")
extractor.warm_up()
results = extractor.run(documents=documents)["documents"]
annotations = [NamedEntityExtractor.get_stored_annotations(doc) for doc in results]
print(annotations)

NamedEntityExtractor.__init__

def __init__(*,
             backend: Union[str, NamedEntityExtractorBackend],
             model: str,
             pipeline_kwargs: Optional[Dict[str, Any]] = None,
             device: Optional[ComponentDevice] = None) -> None

Create a Named Entity extractor component.

Arguments:

  • backend: Backend to use for NER.
  • model: Name of the model or a path to the model on the local disk. Dependent on the backend.
  • pipeline_kwargs: Keyword arguments passed to the pipeline. The pipeline can override these arguments. Dependent on the backend.
  • device: The device on which the model is loaded. If None, the default device is automatically selected. If a device/device map is specified in pipeline_kwargs, it overrides this parameter (only applicable to the HuggingFace backend).

NamedEntityExtractor.warm_up

def warm_up()

Initialize the component.

Raises:

  • ComponentError: If the backend fails to initialize successfully.

NamedEntityExtractor.run

@component.output_types(documents=List[Document])
def run(documents: List[Document], batch_size: int = 1) -> Dict[str, Any]

Annotate named entities in each document and store

the annotations in the document's metadata.

Arguments:

  • documents: Documents to process.
  • batch_size: Batch size used for processing the documents.

Raises:

  • ComponentError: If the backend fails to process a document.

Returns:

Processed documents.

NamedEntityExtractor.to_dict

def to_dict() -> Dict[str, Any]

Serializes the component to a dictionary.

Returns:

Dictionary with serialized data.

NamedEntityExtractor.from_dict

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "NamedEntityExtractor"

Deserializes the component from a dictionary.

Arguments:

  • data: Dictionary to deserialize from.

Returns:

Deserialized component.

NamedEntityExtractor.initialized

@property
def initialized() -> bool

Returns if the extractor is ready to annotate text.

NamedEntityExtractor.get_stored_annotations

@classmethod
def get_stored_annotations(
        cls, document: Document) -> Optional[List[NamedEntityAnnotation]]

Returns the document's named entity annotations stored

in its metadata, if any.

Arguments:

  • document: Document whose annotations are to be fetched.

Returns:

The stored annotations.