`named_entity_recognition.train()`

Fine-tunes a pre-trained Named Entity Recognition model on user-specified named entities and domain. No training data is needed, as the model uses synthex.jobs.generate_data() under the hood to generate a synthetic training dataset based on the provided named entities and domain.

Both the base Named Entity Recognition model tanaos/tanaos-NER-v1 and previously trained models can be further trained using this method:

Fine-tune the base Named Entity Recognition model:
```
Artifex().named_entity_recognition.train()
```
Fine-tune a model that was previously trained with Artifex, in order to train it further:
```
Artifex().named_entity_recognition.load("trained/model/path").train()
```

Arguments

domain
str

A string which specifies the domain or area that the model will be specialized in.
named_entities
dict[str, str]

A dictionary, where each key is the name of a named entity (must be maximum 20 characters with no spaces) to recognize, and each value is the description of that named entity.
output_path
str

optional

A string which specifies the path where the output files will be generated. The output files consist of:
- The training dataset
- The output model safetensor and configuration files
num_samples
int

optional

default: 500

An integer which specifies the number of datapoints that the synthetic training dataset should consist of, and that the model will be trained on. The maximum number of datapoints you can train your model on depends on whether you are on a free or paid plan.
num_epochs
str

optional

default: 3

An integer which specifies the number of epochs to train the model for.

Python

from artifex import Artifex

ner = Artifex().named_entity_recognition

ner.train(
    domain="medical documents",
    named_entities={
        "PERSON": "Individual people, fictional characters",
        "ORG": "Companies, institutions, agencies",
        "LOCATION": "Geographical areas",
        "DATE": "Absolute or relative dates, including years, months and/or days",
        "TIME": "Specific time of the day",
        "NUMBER": "Numeric measurements or expressions",
        "WORK_OF_ART": "Titles of creative works",
        "LANGUAGE": "Natural or programming languages",
        "NORP": "National, religious or political groups",
        "ADDRESS": "full addresses",
        "PHONE_NUMBER": "telephone numbers",
    }
)

Response

None