Module named_entity_extractor

NamedEntityExtractorBackend

class NamedEntityExtractorBackend(Enum, metaclass=_BackendEnumMeta)

NLP backend to use for Named Entity Recognition.

HUGGING_FACE

Uses an Hugging Face model and pipeline.

SPACY

Uses a spaCy model and pipeline.

NamedEntityAnnotation

@dataclass
class NamedEntityAnnotation()

Describes a single NER annotation.

Arguments:

entity: Entity label.
start: Start index of the entity in the document.
end: End index of the entity in the document.
score: Score calculated by the model.

NamedEntityExtractor

@component
class NamedEntityExtractor()

Annotates named entities in a collection of documents.

The component supports two backends: Hugging Face and spaCy. The
former can be used with any sequence classification model from the
Hugging Face model hub, while the
latter can be used with any spaCy model
that contains an NER component. Annotations are stored as metadata
in the documents.

Usage example:

from haystack import Document
from haystack.components.extractors.named_entity_extractor import NamedEntityExtractor

documents = [
    Document(content="I'm Merlin, the happy pig!"),
    Document(content="My name is Clara and I live in Berkeley, California."),
]
extractor = NamedEntityExtractor(backend="hugging_face", model="dslim/bert-base-NER")
extractor.warm_up()
results = extractor.run(documents=documents)["documents"]
annotations = [NamedEntityExtractor.get_stored_annotations(doc) for doc in results]
print(annotations)

NamedEntityExtractor.init

def __init__(*,
             backend: Union[str, NamedEntityExtractorBackend],
             model: str,
             pipeline_kwargs: Optional[Dict[str, Any]] = None,
             device: Optional[ComponentDevice] = None) -> None

Create a Named Entity extractor component.

Arguments:

backend: Backend to use for NER.
model: Name of the model or a path to the model on
the local disk. Dependent on the backend.
pipeline_kwargs: Keyword arguments passed to the pipeline. The
pipeline can override these arguments. Dependent on the backend.
device: The device on which the model is loaded. If None,
the default device is automatically selected. If a
device/device map is specified in pipeline_kwargs,
it overrides this parameter (only applicable to the
HuggingFace backend).

NamedEntityExtractor.warm_up

def warm_up()

Initialize the component.

Raises:

ComponentError: If the backend fails to initialize successfully.

NamedEntityExtractor.run

@component.output_types(documents=List[Document])
def run(documents: List[Document], batch_size: int = 1) -> Dict[str, Any]

Annotate named entities in each document and store

the annotations in the document's metadata.

Arguments:

documents: Documents to process.
batch_size: Batch size used for processing the documents.

Raises:

ComponentError: If the backend fails to process a document.

Returns:

Processed documents.

NamedEntityExtractor.to_dict

def to_dict() -> Dict[str, Any]

Serializes the component to a dictionary.

Returns:

Dictionary with serialized data.

NamedEntityExtractor.from_dict

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "NamedEntityExtractor"

Deserializes the component from a dictionary.

Arguments:

data: Dictionary to deserialize from.

Returns:

Deserialized component.

NamedEntityExtractor.initialized

@property
def initialized() -> bool

Returns if the extractor is ready to annotate text.

NamedEntityExtractor.get_stored_annotations

@classmethod
def get_stored_annotations(
        cls, document: Document) -> Optional[List[NamedEntityAnnotation]]

Returns the document's named entity annotations stored

in its metadata, if any.

Arguments:

document: Document whose annotations are to be fetched.

Returns:

The stored annotations.

Module named_entity_extractor

NamedEntityExtractorBackend

HUGGING_FACE

SPACY

NamedEntityAnnotation

NamedEntityExtractor

NamedEntityExtractor.__init__

NamedEntityExtractor.warm_up

NamedEntityExtractor.run

NamedEntityExtractor.to_dict

NamedEntityExtractor.from_dict

NamedEntityExtractor.initialized

NamedEntityExtractor.get_stored_annotations

NamedEntityExtractor.init