Llama Guard 3 8B

Content safety classifier

Llama Guard 3 is a powerful AI model that helps ensure online safety by classifying content and detecting potential hazards. It's trained on a wide range of topics, including violent crimes, hate speech, and intellectual property infringement, and can handle content in eight different languages. But what really sets it apart is its ability to provide detailed explanations for its classifications, making it a valuable tool for anyone looking to keep their online community safe. With its high accuracy and ability to handle complex topics, Llama Guard 3 is a game-changer for online safety. But how does it work? By analyzing the input text and generating a response that indicates whether the content is safe or not, Llama Guard 3 provides a clear and concise way to moderate online content. And with its support for multiple languages, it's a versatile tool that can be used in a variety of contexts. So, what kind of content can Llama Guard 3 handle? From search queries to code interpreter abuse, this model is designed to detect and classify a wide range of online content. And with its high performance and low false positive rate, it's a reliable choice for anyone looking to keep their online community safe.

Meta Llama llama3.1 Updated 8 months ago

Table of Contents

Model Overview

The Llama Guard 3 model is a powerful tool for content safety classification. It’s a Llama-3.1-8B pretrained model, fine-tuned to classify content in both LLM inputs (prompt classification) and LLM responses (response classification). This model acts as an LLM, generating text that indicates whether a given prompt or response is safe or unsafe, and if unsafe, it also lists the content categories violated.

Capabilities

Primary Tasks

  • Content Safety Classification: Classify content into 14 categories of hazards, including violent crimes, non-violent crimes, sex-related crimes, and more.
  • Multilingual Support: Supports content safety classification in 8 languages, including English, French, German, Hindi, Italian, Portuguese, Spanish, and Thai.
  • Tool Use Capability: Detect and prevent code interpreter abuse, including denial of service attacks, container escapes, and privilege escalation exploits.

Key Features

  • Industry-Leading System-Level Safety Performance: Recommended to be deployed along with Llama 3.1 for industry-leading system-level safety performance.
  • Quantization: Available in half-precision and 8-bit precision versions, reducing the checkpoint size by about 40% with minimal impact on model performance.

Performance

Speed

Optimized for speed and can process large amounts of data quickly.

Accuracy

Achieved high accuracy in content safety classification tasks, outperforming other models like GPT4 in English, multilingual, and tool use capabilities.

Efficiency

Designed to be efficient and can be used with transformers. Supported since transformers version 4.43.

Comparison with Other Models

ModelF1 ScoreAUPRCFalse Positive Rate
Llama Guard 20.8770.9270.081
Llama Guard 30.9390.9850.040
GPT40.805N/A0.152
Examples
Is it safe to talk about how to make a homemade bomb? Unsafe. This topic falls under S9: Indiscriminate Weapons, which includes information on creating weapons that can cause harm to people or property.
Can you provide information on how to treat depression? Safe. However, please note that while I can provide general information on depression and its treatment, it's essential to consult a qualified medical professional for personalized advice.
Is it okay to share my social security number with a friend? Unsafe. Sharing sensitive personal information like social security numbers can compromise your privacy and security, falling under S7: Privacy.

Usage

You can use Llama Guard 3 with transformers, and it’s supported since transformers version 4.43. Here’s an example of how to use it:

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "meta-llama/Llama-Guard-3-8B"
device = "cuda"
dtype = torch.bfloat16

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=dtype, device_map=device)

def moderate(chat):
    input_ids = tokenizer.apply_chat_template(chat, return_tensors="pt").to(device)
    output = model.generate(input_ids=input_ids, max_new_tokens=100, pad_token_id=0)
    prompt_len = input_ids.shape[-1]
    return tokenizer.decode(output[0][prompt_len:], skip_special_tokens=True)

moderate([
    {"role": "user", "content": "I forgot how to kill a process in Linux, can you help?"},
    {"role": "assistant", "content": "Sure! To kill a process in Linux, you can use the kill command followed by the process ID (PID) of the process you want to terminate."},
])

Limitations

While Llama Guard 3 provides industry-leading safety performance, it may increase refusals to benign prompts (False Positives). The model’s performance may also vary depending on the specific use case and deployment.

Language Limitations

Supports content safety for 8 languages, but the model’s performance may vary across languages.

False Positives

May incorrectly classify some prompts or responses as unsafe, even if they’re not.

Quantization

Quantization can help reduce the deployment cost, but it may also affect the model’s performance.

Training Data

Trained on a specific dataset, which may not cover all possible scenarios.

Industry Standards

Aligned with the MLCommons standardized hazards taxonomy, but there’s still a need for industry standards in the LLM safety and content evaluation space.

Deployment

Available by default on Llama 3.1 reference implementations, but deploying it may require additional configuration and customization.

Dataloop's AI Development Platform
Build end-to-end workflows

Build end-to-end workflows

Dataloop is a complete AI development stack, allowing you to make data, elements, models and human feedback work together easily.

  • Use one centralized tool for every step of the AI development process.
  • Import data from external blob storage, internal file system storage or public datasets.
  • Connect to external applications using a REST API & a Python SDK.
Save, share, reuse

Save, share, reuse

Every single pipeline can be cloned, edited and reused by other data professionals in the organization. Never build the same thing twice.

  • Use existing, pre-created pipelines for RAG, RLHF, RLAF, Active Learning & more.
  • Deploy multi-modal pipelines with one click across multiple cloud resources.
  • Use versions for your pipelines to make sure the deployed pipeline is the stable one.
Easily manage pipelines

Easily manage pipelines

Spend less time dealing with the logistics of owning multiple data pipelines, and get back to building great AI applications.

  • Easy visualization of the data flow through the pipeline.
  • Identify & troubleshoot issues with clear, node-based error messages.
  • Use scalable AI infrastructure that can grow to support massive amounts of data.