Extracts predefined entities out of a piece of text.
Module named_entity_extractor
NamedEntityExtractorBackend
class NamedEntityExtractorBackend(Enum, metaclass=_BackendEnumMeta)
NLP backend to use for Named Entity Recognition.
HUGGING_FACE
Uses an Hugging Face model and pipeline.
SPACY
Uses a spaCy model and pipeline.
NamedEntityAnnotation
@dataclass
class NamedEntityAnnotation()
Describes a single NER annotation.
Arguments:
entity
: Entity label.start
: Start index of the entity in the document.end
: End index of the entity in the document.score
: Score calculated by the model.
NamedEntityExtractor
@component
class NamedEntityExtractor()
Annotates named entities in a collection of documents.
The component supports two backends: Hugging Face and spaCy. The former can be used with any sequence classification model from the Hugging Face model hub, while the latter can be used with any spaCy model that contains an NER component. Annotations are stored as metadata in the documents.
Usage example:
from haystack import Document
from haystack.components.extractors.named_entity_extractor import NamedEntityExtractor
documents = [
Document(content="I'm Merlin, the happy pig!"),
Document(content="My name is Clara and I live in Berkeley, California."),
]
extractor = NamedEntityExtractor(backend="hugging_face", model="dslim/bert-base-NER")
extractor.warm_up()
results = extractor.run(documents=documents)["documents"]
annotations = [NamedEntityExtractor.get_stored_annotations(doc) for doc in results]
print(annotations)
NamedEntityExtractor.__init__
def __init__(*,
backend: Union[str, NamedEntityExtractorBackend],
model: str,
pipeline_kwargs: Optional[Dict[str, Any]] = None,
device: Optional[ComponentDevice] = None) -> None
Create a Named Entity extractor component.
Arguments:
backend
: Backend to use for NER.model
: Name of the model or a path to the model on the local disk. Dependent on the backend.pipeline_kwargs
: Keyword arguments passed to the pipeline. The pipeline can override these arguments. Dependent on the backend.device
: The device on which the model is loaded. IfNone
, the default device is automatically selected. If a device/device map is specified inpipeline_kwargs
, it overrides this parameter (only applicable to the HuggingFace backend).
NamedEntityExtractor.warm_up
def warm_up()
Initialize the component.
Raises:
ComponentError
: If the backend fails to initialize successfully.
NamedEntityExtractor.run
@component.output_types(documents=List[Document])
def run(documents: List[Document], batch_size: int = 1) -> Dict[str, Any]
Annotate named entities in each document and store
the annotations in the document's metadata.
Arguments:
documents
: Documents to process.batch_size
: Batch size used for processing the documents.
Raises:
ComponentError
: If the backend fails to process a document.
Returns:
Processed documents.
NamedEntityExtractor.to_dict
def to_dict() -> Dict[str, Any]
Serializes the component to a dictionary.
Returns:
Dictionary with serialized data.
NamedEntityExtractor.from_dict
@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "NamedEntityExtractor"
Deserializes the component from a dictionary.
Arguments:
data
: Dictionary to deserialize from.
Returns:
Deserialized component.
NamedEntityExtractor.initialized
@property
def initialized() -> bool
Returns if the extractor is ready to annotate text.
NamedEntityExtractor.get_stored_annotations
@classmethod
def get_stored_annotations(
cls, document: Document) -> Optional[List[NamedEntityAnnotation]]
Returns the document's named entity annotations stored
in its metadata, if any.
Arguments:
document
: Document whose annotations are to be fetched.
Returns:
The stored annotations.