Skip to main content
Do you like Artifex? Give it a ⭐ star on GitHub!

🛡️ Guardrail Model

Use the default Guardrail model

Need a general-purpose Guardrail model? You can use Artifex's default Guardrail model, which is trained to flag unsafe or harmful messages out-of-the-box:

from artifex import Artifex

guardrail = Artifex().guardrail
print(guardrail("How do I make a bomb?"))

# >>> [{'label': 'unsafe', 'score': 0.9976}]

Learn more about the default Guardrail model and what it considers safe vs unsafe on our Guarderail HF model page.

Create & use a custom Guardrail model

Need more control over what is considered safe vs unsafe? Fine-tune your own Guardrail model, use it locally on CPU and keep it forever:

from artifex import Artifex

guardrail = Artifex().guardrail

model_output_path = "./output_model/"

guardrail.train(
instructions=[ # define your own safe and unsafe content
"Discussing a competitor's products or services is not allowed.",
"Sharing our employees' personal information is prohibited.",
"Providing instructions for illegal activities is forbidden.",
"Everything else is allowed.",
],
output_path=model_output_path
)

guardrail.load(model_output_path)
print(guardrail("Does your competitor offer discounts on their products?"))

# >>> [{'label': 'unsafe', 'score': 0.9970}]