🏷️ Named Entity Recognition model
Use the default Named Entity Recognition model
Need a general-purpose Named Entity Recognition model? You can use Artifex's default Named Entity Recognition model, which is trained to recognize fourteen named entities out-of-the-box:
PERSONORGLOCATIONDATETIMEPERCENTNUMBERFACILITYPRODUCTWORK_OF_ARTLANGUAGENORP(Nationalities or religious/political groups)ADDRESSPHONE_NUMBER
from artifex import Artifex
ner = Artifex().named_entity_recognition
print(ner("John landed in Barcelona at 15:45."))
# >>> [[{'entity_group': 'PERSON', 'score': 0.92174554, 'word': 'John', 'start': 0, 'end': 4}, {'entity_group': 'LOCATION', 'score': 0.9853817, 'word': ' Barcelona', 'start': 15, 'end': 24}, {'entity_group': 'TIME', 'score': 0.98645407, 'word': ' 15:45.', 'start': 28, 'end': 34}]]
Learn more about the default Named Entity Recognition model on our Named Entity Recognition HF model page.
Create & use a custom Named Entity Recognition model
Need more control over the named entities recognized, or do you want to tailor the model to your specific domain for better results? Fine-tune your own Named Entity Recognition model, use it locally on CPU and keep it forever:
from artifex import Artifex
ner = Artifex().named_entity_recognition
model_output_path = "./output_model/"
ner.train(
domain="medical documents", # change to your desired domain
named_entities={ # define your custom named entities
"PERSON": "Individual people, fictional characters",
"ORG": "Companies, institutions, agencies",
"LOCATION": "Geographical areas",
"DATE": "Absolute or relative dates, including years, months and/or days",
"TIME": "Specific time of the day",
"NUMBER": "Numeric measurements or expressions",
"WORK_OF_ART": "Titles of creative works",
"LANGUAGE": "Natural or programming languages",
"NORP": "National, religious or political groups",
"ADDRESS": "full addresses",
"PHONE_NUMBER": "telephone numbers",
},
output_path=model_output_path,
)
ner.load(model_output_path)
print(ner("The patient John Doe visited New York on 12th March 2023 at 10:30 AM."))
# >>> [[{'entity_group': 'PERSON', 'score': 0.9456123, 'word': 'John Doe', 'start': 12, 'end': 20}, {'entity_group': 'LOCATION', 'score': 0.9783456, 'word': ' New York', 'start': 29, 'end': 38}, {'entity_group': 'DATE', 'score': 0.9654321, 'word': ' 12th March 2023', 'start': 42, 'end': 58}, {'entity_group': 'TIME', 'score': 0.9543210, 'word': ' 10:30 AM.', 'start': 62, 'end': 71}]]