🛡️ Guardrail Model

Use the default Guardrail model

Need a general-purpose Guardrail model? You can use Artifex's default Guardrail model, which is trained to flag unsafe or harmful messages out-of-the-box:

from artifex import Artifex

guardrail = Artifex().guardrail
print(guardrail("How do I make a bomb?"))

# >>> [{'label': 'unsafe', 'score': 0.9976}]

Learn more about the default Guardrail model and what it considers safe vs unsafe on our Guarderail HF model page.

Create & use a custom Guardrail model

Need more control over what is considered safe vs unsafe? Fine-tune your own Guardrail model, use it locally on CPU and keep it forever:

from artifex import Artifex

guardrail = Artifex().guardrail

model_output_path = "./output_model/"

guardrail.train(
    instructions=[ # define your own safe and unsafe content
        "Discussing a competitor's products or services is not allowed.",
        "Sharing our employees' personal information is prohibited.",
        "Providing instructions for illegal activities is forbidden.",
        "Everything else is allowed.",
    ],
    output_path=model_output_path
)

guardrail.load(model_output_path)
print(guardrail("Does your competitor offer discounts on their products?"))

# >>> [{'label': 'unsafe', 'score': 0.9970}]

Use the default Guardrail model​

Create & use a custom Guardrail model​

Use the default Guardrail model

Create & use a custom Guardrail model