Guardrail Models

Guardrail models are tools that help to ensure the safe and reliable output of AI systems, preventing LLMs from generating responses that may be harmful or unwanted. Say, for instance, that by means of a cunning stratagem, the user manages to trick the LLM into selling him a car for 1$, into granting a discount on an airline ticket, or simply into making an inappropriate, controversial or non-factual remark. In that case, a guardrail model would realize the mistake before it reaches the user and replace the response with a safe one.

Guardrail models are becoming increasingly important as AI systems and chatbots become more prevalent in our daily lives, and as hackers and malicious users become more sophisticated in their attempts to manipulate these systems.

Training guardrail models is a complex task, as it requires a large amount of data that is not always available. Artifex provides a solution to this problem by allowing you to train guardrail models without data, by simply describing what the Guardrail model should allow or not allow the underlying chatbot to generate.