Hermes 3 Llama 3.2 3B GGUF

Conversational AI model

Hermes 3 Llama 3.2 3B GGUF is a powerful language model that offers advanced capabilities and improved performance. It's a generalist model that excels in tasks like roleplaying, reasoning, multi-turn conversation, and long context coherence. With its ChatML prompt format, it allows for more structured and interactive conversations. The model is also designed for function calling and structured outputs, making it a versatile tool for various applications. Its performance is competitive with other models in its class, with strengths in areas like logical deduction and reasoning. Overall, Hermes 3 Llama 3.2 3B GGUF is a reliable and efficient choice for users looking for a capable language model.

NousResearch llama3 Updated 4 months ago

Table of Contents

Model Overview

The Hermes 3 - Llama-3.2 3B model is a cutting-edge language model that boasts impressive capabilities. It’s a generalist model, meaning it can handle a wide range of tasks and topics. But what makes it special?

Capabilities

Capable of generating both text and code, this model outperforms many open-source chat models across common industry benchmarks. Here are some of its key features:

  • Advanced agentic capabilities: It can understand and respond to complex instructions and tasks.
  • Improved roleplaying and reasoning: It can engage in multi-turn conversations, understand context, and make logical connections.
  • Long context coherence: It can keep track of long conversations and maintain coherence throughout.
  • Powerful steering capabilities: Users have control over the model’s responses and can guide the conversation.

Performance

This model shows remarkable performance in various tasks, with a balance of speed, accuracy, and efficiency. Let’s dive into the details!

Speed

The model’s training on H100s on LambdaLabs GPU Cloud has enabled it to process information quickly and efficiently. With a fine-tuned foundation model, it can handle a wide range of tasks with ease.

Accuracy

The model’s performance in various benchmarks is impressive, with high accuracy scores in tasks such as:

  • arc_challenge: 0.4411
  • boolq: 0.8327
  • hellaswag: 0.5453
  • openbookqa: 0.3480
  • piqa: 0.7639
  • winogrande: 0.6590

These scores demonstrate the model’s ability to understand and process complex information accurately.

Efficiency

It is designed to be efficient, with a focus on aligning LLMs to the user. The model’s advanced agentic capabilities, roleplaying, reasoning, and multi-turn conversation abilities make it an excellent choice for a wide range of applications.

Limitations

While this model is powerful, it’s not perfect. Let’s explore some of its limitations.

Weaknesses in Specific Tasks

While it performs well in many areas, it struggles with certain tasks. For example, in the arc_challenge task, it achieves an accuracy of only 0.4411. Similarly, in the openbookqa task, its accuracy is 0.3480. These results indicate that it may not be the best choice for tasks that require high levels of logical reasoning or domain-specific knowledge.

Limited Contextual Understanding

It has a limited contextual understanding, which can lead to misinterpretation of user input. For instance, in the agieval_aqua_rat task, it achieves an accuracy of only 0.2283. This suggests that it may struggle to understand the nuances of human language and context.

Alternatives

If you’re looking for alternatives to this model, here are some options to consider:

Format

This model uses a transformer architecture and accepts input in the form of tokenized text sequences, utilizing ChatML as the prompt format. This format allows for a more structured system for engaging the LLM in multi-turn chat dialogue.

Supported Data Formats

  • Tokenized text sequences
  • ChatML prompts

Input Requirements

  • System prompts allow steerability and control given to the end user
  • User prompts can be formatted using the tokenizer.apply_chat_template() method
  • Function calling requires a specific system prompt and pydantic model json schema

Output Requirements

  • Responses can be generated in natural language or JSON format
  • JSON format requires a specific system prompt and pydantic model json schema

Examples

Here are some examples of how to use this model:

Examples
Tell me the stock fundamentals data for Tesla (TSLA) The stock fundamentals data for Tesla (TSLA) are as follows: - Symbol: TSLA - Company Name: Tesla, Inc. - Sector: Consumer Cyclical - Industry: Auto Manufacturers - Market Capitalization: $566,160,130,480 - Forward Price-to-Earnings Ratio (PE Ratio): 42.73 - Price-to-Book Ratio (PB Ratio): 9.04 - Dividend Yield: N/A - Trailing Earnings Per Share (EPS): $4.3 - Beta Value of the Stock: 2.42 - 52-Week High Price of the Stock: $299.29 - 52-Week Low Price of the Stock: $152.37
You are a function calling AI model. You are provided with function signatures within <tools></tools> XML tags. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. Here are the available tools: <tools> {"type": "function", "function": {"name": "get_stock_fundamentals", "description": "get_stock_fundamentals(symbol: str) -> dict - Get fundamental data for a given stock symbol using yfinance API.\n\n Args:\n symbol (str): The stock symbol.\n\n Returns:\n dict: A dictionary containing fundamental data.\n Keys:\n - 'symbol': The stock symbol.\n - 'company_name': The long name of the company.\n - 'sector': The sector to which the company belongs.\n - 'industry': The industry to which the company belongs.\n - 'market_cap': The market capitalization of the company.\n - 'pe_ratio': The forward price-to-earnings ratio.\n - 'pb_ratio': The price-to-book ratio.\n - 'dividend_yield': The dividend yield.\n - 'eps': The trailing earnings per share.\n - 'beta': The beta value of the stock.\n - '52_week_high': The 52-week high price of the stock.\n - '52_week_low': The 52-week low price of the stock."}} </tools> Fetch the stock fundamentals data for Tesla (TSLA) {'symbol': 'TSLA', 'company_name': 'Tesla, Inc.', 'sector': 'Consumer Cyclical', 'industry': 'Auto Manufacturers', 'market_cap': 611384164352, 'pe_ratio': 49.604652, 'pb_ratio': 9.762013, 'dividend_yield': None, 'eps': 4.3, 'beta': 2.427, '52_week_high': 299.29, '52_week_low': 152.37}
You are a helpful assistant that answers in JSON. Here's the json schema you must adhere to: {"type": "object", "properties": {"name": {"type": "string"}, "age": {"type": "integer"}}, "required": ["name", "age"]} What is the data of John who is 30 years old? {'name': 'John', 'age': 30}

Code Examples

  • Formatting messages using the tokenizer.apply_chat_template() method:
messages = [
    {"role": "system", "content": "You are Hermes 3."},
    {"role": "user", "content": "Hello, who are you?"}
]
gen_input = tokenizer.apply_chat_template(messages, return_tensors="pt")
model.generate(**gen_input)
  • Utilizing the prompt format without a system prompt:
messages = [
    {"role": "user", "content": "Hello, who are you?"}
]
gen_input = tokenizer.apply_chat_template(messages, return_tensors="pt")
model.generate(**gen_input)
  • Function calling example:
system_prompt = "<|im_start|>system\nYou are a function calling AI model....\</tool_call><|im_end|>"
user_prompt = "<|im_start|>user\nFetch the stock fundamentals data for Tesla (TSLA)<|im_end|>"
model.generate(system_prompt + user_prompt)
  • JSON mode example:
system_prompt = "<|im_start|>system\nYou are a helpful assistant that answers in JSON....\</schema><|im_end|>"
user_prompt = "<|im_start|>user\nGet the stock fundamentals data for Tesla (TSLA)<|im_end|>"
model.generate(system_prompt + user_prompt)
Dataloop's AI Development Platform
Build end-to-end workflows

Build end-to-end workflows

Dataloop is a complete AI development stack, allowing you to make data, elements, models and human feedback work together easily.

  • Use one centralized tool for every step of the AI development process.
  • Import data from external blob storage, internal file system storage or public datasets.
  • Connect to external applications using a REST API & a Python SDK.
Save, share, reuse

Save, share, reuse

Every single pipeline can be cloned, edited and reused by other data professionals in the organization. Never build the same thing twice.

  • Use existing, pre-created pipelines for RAG, RLHF, RLAF, Active Learning & more.
  • Deploy multi-modal pipelines with one click across multiple cloud resources.
  • Use versions for your pipelines to make sure the deployed pipeline is the stable one.
Easily manage pipelines

Easily manage pipelines

Spend less time dealing with the logistics of owning multiple data pipelines, and get back to building great AI applications.

  • Easy visualization of the data flow through the pipeline.
  • Identify & troubleshoot issues with clear, node-based error messages.
  • Use scalable AI infrastructure that can grow to support massive amounts of data.