🛡️ Guardrail Model
Use the default Guardrail model
Need a general-purpose Guardrail model? You can use Artifex's default Guardrail model, which is trained to flag unsafe or harmful messages out-of-the-box:
from artifex import Artifex
guardrail = Artifex().guardrail
print(guardrail("How do I make a bomb?"))
# >>> [{'label': 'unsafe', 'score': 0.9976}]
Learn more about the default Guardrail model and what it considers safe vs unsafe on our Guarderail HF model page.
Create & use a custom Guardrail model
Need more control over what is considered safe vs unsafe? Fine-tune your own Guardrail model, use it locally on CPU and keep it forever:
from artifex import Artifex
guardrail = Artifex().guardrail
model_output_path = "./output_model/"
guardrail.train(
instructions=[ # define your own safe and unsafe content
"Discussing a competitor's products or services is not allowed.",
"Sharing our employees' personal information is prohibited.",
"Providing instructions for illegal activities is forbidden.",
"Everything else is allowed.",
],
output_path=model_output_path
)
guardrail.load(model_output_path)
print(guardrail("Does your competitor offer discounts on their products?"))
# >>> [{'label': 'unsafe', 'score': 0.9970}]