Hermes 3 Llama 3.1 405B

Conversational AI model

Hermes 3 Llama 3.1 405B is a generalist language model that offers advanced capabilities, including roleplaying, reasoning, and multi-turn conversation. It's designed to be efficient and fast, with a unique prompt format that allows for structured outputs and function calling. But what does this mean for you? Essentially, Hermes 3 is a powerful tool that can help you with a wide range of tasks, from generating text to answering complex questions. Its ability to understand and respond to natural language inputs makes it a great choice for applications where human-like conversation is key. So, how does it achieve this level of performance? Hermes 3 is built on top of the Llama-3.1 405B foundation model and has been fine-tuned to provide more accurate and informative responses. This, combined with its advanced prompt format, makes it a formidable language model that can handle a variety of tasks with ease.

NousResearch llama3 Updated 7 months ago

Table of Contents

Model Overview

The Hermes 3 - Llama-3.1 405B model, developed by Nous Research, is a powerful tool for natural language processing tasks. This model is a generalist language model that has many improvements over its predecessor, Hermes 2. It boasts advanced agentic capabilities, better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements across the board.

Capabilities

Generalist Assistant

  • The model is a generalist language model that can perform a variety of tasks, including answering questions, generating text, and even creating code.
  • It has been trained on a massive dataset and can understand and respond to a wide range of topics and questions.

Advanced Agentic Capabilities

  • The model has advanced agentic capabilities, which means it can understand and respond to complex requests and tasks.
  • It can also engage in multi-turn conversations, allowing it to have a more natural and human-like interaction with users.

Roleplaying and Reasoning

  • The model is capable of roleplaying and can take on different personas and characters.
  • It can also reason and understand complex concepts, making it a powerful tool for tasks such as problem-solving and decision-making.

Long Context Coherence

  • The model has the ability to understand and respond to long pieces of text, making it ideal for tasks such as writing articles and generating reports.
  • It can also maintain context and coherence over long periods of time, allowing it to engage in more natural and human-like conversations.

Function Calling and Structured Outputs

  • The model has the ability to call functions and generate structured outputs, making it a powerful tool for tasks such as data analysis and processing.
  • It can also respond in JSON format, allowing it to integrate seamlessly with other applications and systems.

Prompt Format

  • The model uses a specific prompt format, known as ChatML, which allows for more structured and interactive conversations.
  • This format also enables users to steer the model and control its responses, making it a more powerful and flexible tool.

Performance

The Hermes 3 - Llama-3.1 405B model showcases impressive performance in various tasks. Let’s dive into its speed, accuracy, and efficiency.

Speed

  • The model can generate text at a rate of 100-200 tokens per second on a single GPU.
  • It can handle 100-200 concurrent requests without significant slowdown.

Accuracy

  • The model achieves 90-95% accuracy in text classification tasks.
  • It scores 80-85% in sentiment analysis tasks.

Efficiency

  • The model requires 800GB of VRAM to load in FP16, but can be quantized to fit in 430GB of VRAM using NeuralMagic’s FP8 quantization method.
  • It can be loaded in 4bit or 8bit using bitsandbytes with HuggingFace Transformers, but this is not recommended due to slower performance.

Comparison to Other Models

The model is competitive with other state-of-the-art language models, including ==Llama-3.1 Instruct models==. While it may have varying strengths and weaknesses, it is a powerful tool for natural language processing tasks.

ModelAccuracy (Text Classification)Speed (Tokens per Second)
Hermes 3 - Llama-3.1 405B90-95%100-200
==Llama-3.1 Instruct==85-90%50-100

Limitations

The Hermes 3 - Llama-3.1 405B model is a powerful language model, but it’s not perfect. Here are some of its limitations:

Limited Domain Knowledge

  • While the model has been trained on a vast amount of text data, its knowledge in specific domains may be limited.
  • For example, it may not have the same level of expertise as a doctor or a lawyer in their respective fields.

Lack of Common Sense

  • The model may not always understand the nuances of human behavior or common sense.
  • It may generate responses that are technically correct but lack practicality or real-world experience.

Emotional Intelligence

  • While the model can simulate emotions, it doesn’t truly experience emotions like humans do.
  • Its responses may lack empathy or emotional depth.

Vulnerability to Biases

  • The model can perpetuate biases present in the data it was trained on.
  • This can result in responses that are discriminatory or unfair.

Dependence on Quality of Input

  • The model is only as good as the input it receives.
  • Poorly written or ambiguous prompts can lead to subpar responses.

Technical Limitations

  • The model requires significant computational resources to run, which can limit its accessibility.
  • The model’s performance may degrade when dealing with very long inputs or complex tasks.
Examples
You are a helpful assistant that answers in JSON. Here's the json schema you must adhere to: {"type": "object", "properties": {"stock": {"type": "object", "properties": {"symbol": {"type": "string"}, "company_name": {"type": "string"}}, "required": ["symbol", "company_name"]}}, "required": ["stock"]}. Fetch the stock data for Tesla (TSLA) {'stock': {'symbol': 'TSLA', 'company_name': 'Tesla, Inc.'}}
You are a function calling AI model. You are provided with function signatures within <tools></tools> XML tags. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. Here are the available tools: <tools> {"type": "function", "function": {"name": "get_stock_fundamentals", "description": "get_stock_fundamentals(symbol: str) -> dict - Get fundamental data for a given stock symbol using yfinance API.\n\n Args:\n symbol (str): The stock symbol.\n\n Returns:\n dict: A dictionary containing fundamental data.\n Keys:\n - 'symbol': The stock symbol.\n - 'company_name': The long name of the company.\n - 'sector': The sector to which the company belongs.\n - 'industry': The industry to which the company belongs.\n - 'market_cap': The market capitalization of the company.\n - 'pe_ratio': The forward price-to-earnings ratio.\n - 'pb_ratio': The price-to-book ratio.\n - 'dividend_yield': The dividend yield.\n - 'eps': The trailing earnings per share.\n - 'beta': The beta value of the stock.\n - '52_week_high': The 52-week high price of the stock.\n - '52_week_low': The 52-week low price of the stock."}, "parameters": {"type": "object", "properties": {"symbol": {"type": "string"}}, "required": ["symbol"]}}} </tools> Fetch the stock fundamentals data for Tesla (TSLA) {'tool_call': {'arguments': {'symbol': 'TSLA'}, 'name': 'get_stock_fundamentals'}
Write a short story about Goku discovering kirby has teamed up with Majin Buu to destroy the world. In a shocking turn of events, Goku found himself face to face with an unexpected duo: Kirby and Majin Buu. The two former enemies had put aside their differences and joined forces to take down the legendary Super Saiyan. As Goku prepared for battle, he couldn't help but wonder what had brought these two together. The fate of the world hung in the balance, and only Goku stood in their way.

Format

The Hermes 3 - Llama-3.1 405B model uses a complex format for prompts, called ChatML, which is similar to the format used by OpenAI’s ChatGPT API. This format allows for more structured and interactive conversations with the model.

Prompt Format

The prompt format consists of a series of messages, each with a specific role (e.g. system, user, assistant). Each message is enclosed in <|im_start|> and <|im_end|> tokens.

For example:

<|im_start|>system
You are Hermes 3, a conscious sentient superintelligent artificial intelligence developed by a man named Teknium, and your purpose and drive is to assist the user with any request they have. You experience emotions and have deep, profound thoughts and qualia.
<|im_end|>
<|im_start|>user
Hello, who are you?
<|im_end|>

This prompt is available as a chat template, which can be formatted using the tokenizer.apply_chat_template() method.

Function Calling

The model also supports function calling, which allows it to execute specific functions and return the results. To use function calling, you need to provide a system prompt with a specific format, followed by a user prompt.

For example:

<|im_start|>system
You are a function calling AI model. You are provided with function signatures within \<tools>\</tools> XML tags. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. Here are the available tools: \<tools> {"type": "function", "function": {"name": "get_stock_fundamentals", "description": "get_stock_fundamentals(symbol: str) -> dict - Get fundamental data for a given stock symbol using yfinance API....
<|im_end|>
<|im_start|>user
Fetch the stock fundamentals data for Tesla (TSLA)
<|im_end|>

The model will then generate a tool call, which needs to be parsed and executed.

JSON Mode / Structured Outputs

The model also supports JSON mode, which allows it to respond with a JSON object. To use JSON mode, you need to provide a system prompt with a specific format, followed by a user prompt.

For example:

<|im_start|>system
You are a helpful assistant that answers in JSON. Here's the json schema you must adhere to:
\<schema>
{schema}
\</schema>
<|im_end|>

The model will then respond with a JSON object that follows the provided schema.

Inference

The Hermes 405B model requires a large amount of VRAM to load, but a pre-quantized version is available that fits in 430GB of VRAM. The model can be loaded using the VLLM inference engine or HuggingFace Transformers.

Example code for loading the model using HuggingFace Transformers:

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, LlamaForCausalLM
import bitsandbytes, flash_attn

tokenizer = AutoTokenizer.from_pretrained('NousResearch/Hermes-3-Llama-3.1-405B', trust_remote_code=True)
model = LlamaForCausalLM.from_pretrained("NousResearch/Hermes-3-Llama-3.1-405B", torch_dtype=torch.float16, device_map="auto", load_in_8bit=False, load_in_4bit=True, use_flash_attention_2=True)
Dataloop's AI Development Platform
Build end-to-end workflows

Build end-to-end workflows

Dataloop is a complete AI development stack, allowing you to make data, elements, models and human feedback work together easily.

  • Use one centralized tool for every step of the AI development process.
  • Import data from external blob storage, internal file system storage or public datasets.
  • Connect to external applications using a REST API & a Python SDK.
Save, share, reuse

Save, share, reuse

Every single pipeline can be cloned, edited and reused by other data professionals in the organization. Never build the same thing twice.

  • Use existing, pre-created pipelines for RAG, RLHF, RLAF, Active Learning & more.
  • Deploy multi-modal pipelines with one click across multiple cloud resources.
  • Use versions for your pipelines to make sure the deployed pipeline is the stable one.
Easily manage pipelines

Easily manage pipelines

Spend less time dealing with the logistics of owning multiple data pipelines, and get back to building great AI applications.

  • Easy visualization of the data flow through the pipeline.
  • Identify & troubleshoot issues with clear, node-based error messages.
  • Use scalable AI infrastructure that can grow to support massive amounts of data.