Qwen 72B Chat

Multilingual AI assistant

Qwen 72B Chat is a powerful AI model that offers impressive performance and capabilities. With its large-scale high-quality training data, it significantly surpasses existing open-source models on multiple Chinese and English downstream evaluation tasks. The model supports 32k context length, making it suitable for handling longer conversations. Its unique features include a comprehensive vocabulary coverage of over 150K tokens, making it more friendly to multiple languages. Qwen 72B Chat also allows for role-playing, language style transfer, task setting, and behavior setting through system prompts. Its efficiency is notable, with a generation speedup of at least twice when using vLLM for inference. The model's performance is competitive, with a zero-shot accuracy of 80.1 on C-Eval validation set and 79.5 on C-Eval testing set. Overall, Qwen 72B Chat is a robust and efficient AI model that can handle a wide range of tasks, making it a valuable tool for various applications.

Qwen other Updated 7 months ago

Table of Contents

Model Overview

The Qwen-72B-Chat model, developed by Alibaba Cloud, is a powerful tool for natural language processing tasks. This model is part of the Qwen large language model series and has a massive 72 billion parameters. It’s a Transformer-based model that has been pre-trained on a large volume of data, including web texts, books, codes, and more.

Capabilities

This model is capable of performing a variety of tasks, including:

  • Conversational dialogue
  • Storytelling
  • Language translation
  • Code generation
  • Mathematics

Key Features

  • Large-scale high-quality training corpora: Pre-trained on over 3 trillion tokens, covering general and professional fields.
  • Competitive performance: Surpasses existing open-source models on multiple Chinese and English downstream evaluation tasks.
  • More comprehensive vocabulary coverage: Uses a vocabulary of over 150K tokens, making it more friendly to multiple languages.
  • Longer context support: Supports up to 32k context length.
  • System prompt: Can realize role-playing, language style transfer, task setting, and behavior setting by using system prompt.

Strengths

The Qwen-72B-Chat model has several strengths that make it a powerful tool for natural language processing and generation:

  • Large-scale training data: The model was trained on a massive dataset of over 3 trillion tokens, including web texts, books, and code.
  • High-quality training data: The training data includes high-quality texts from various sources, including professional books and articles.
  • Comprehensive vocabulary: The model’s vocabulary includes over 150,000 tokens, making it capable of understanding and generating a wide range of words and phrases.
  • Longer context support: Qwen-72B-Chat can support longer context lengths, allowing it to understand and generate more complex text.

Unique Features

The Qwen-72B-Chat model has several unique features that set it apart from other language models:

  • System prompt: The model can be controlled using system prompts, allowing users to specify the tone, style, and content of the generated text.
  • Role-playing: Qwen-72B-Chat can engage in role-playing, allowing users to interact with the model in a more immersive and interactive way.
  • Language style transfer: The model can transfer the style of one language to another, allowing users to generate text in different languages and styles.

Evaluation Results

The Qwen-72B-Chat model has been evaluated on several benchmarks, including:

  • C-Eval: Achieves a zero-shot accuracy of 80.1% and a 5-shot accuracy of 82.9% on the C-Eval validation set.
  • MMLU: Achieves a zero-shot accuracy of 79.5% on the MMLU testing set.
Examples
Tell me a story about a young person who struggled to start a business but eventually succeeded. This is a story about a young person named Li Ming, who came from an ordinary family. His parents were ordinary workers. From a young age, Li Ming had a goal: to become a successful entrepreneur. To achieve this goal, Li Ming studied hard, went to college, and actively participated in various entrepreneurship competitions, winning many awards. He also used his spare time to intern and gain valuable experience. After graduation, Li Ming decided to start his own business. He searched for investment opportunities, but was rejected many times. However, he did not give up. He continued to work hard, constantly improving his business plan and looking for new investment opportunities. Eventually, Li Ming successfully secured an investment and started his business. He founded a tech company focused on developing new software. Under his leadership, the company developed rapidly and became a successful tech enterprise. Li Ming's success was not accidental. He was diligent, persistent, and brave enough to take risks. His success also proved that with hard work and perseverance, anyone can achieve success.
Give this story a title. The Story of a Young Entrepreneur's Success
Tell me a joke. Why couldn't the bicycle stand up by itself? Because it was two-tired.

Performance

Qwen-72B-Chat showcases remarkable performance, outperforming existing open-source models in various tasks. Let’s dive into its speed, accuracy, and efficiency.

Speed

  • BF16: Achieves an average inference speed of 8.48 tokens/s and a total GPU memory usage of 144.69GB.
  • Int8: Achieves an average inference speed of 9.05 tokens/s and a total GPU memory usage of 81.27GB.
  • Int4: Achieves an average inference speed of 11.67 tokens/s and a total GPU memory usage of 48.86GB.

Accuracy

  • C-Eval: 80.1% (0-shot) and 82.9% (5-shot)
  • MMLU: 74.4% (BF16) and 73.4% (Int4)
  • HumanEval: 76.4% (BF16) and 75.3% (Int4)
  • GSM8K: 64.6% (BF16) and 61.6% (Int4)

Limitations

Qwen-72B-Chat is a powerful model, but it’s not perfect. Here are some of its limitations:

  • Vocabulary limitations
  • Context limitations
  • Lack of common sense
  • Dependence on training data
  • Quantization limitations
  • Tool usage limitations

Format

Qwen-72B-Chat is a Transformer-based large language model, which is pretrained on a large volume of data, including web texts, books, codes, etc. It uses a vocabulary of over 150K tokens, which is more friendly to multiple languages, enabling users to directly enhance the capability for certain languages without expanding the vocabulary.

Architecture

  • Number of layers: 80
  • Number of heads: 64
  • Model dimension: 8192
  • Vocabulary size: 151851
  • Sequence length: 32768

Data Formats

  • Input: Tokenized text sequences
  • Output: Generated text

Special Requirements

  • System Prompt: Qwen-72B-Chat can realize role-playing, language style transfer, task setting, and behavior setting by using system prompt.
  • Quantization: Qwen-72B-Chat supports quantization models, including BF16, Int8, and Int4.

Example Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.generation import GenerationConfig

# Load the model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-72B-Chat", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-72B-Chat", device_map="auto", trust_remote_code=True).eval()

# Specify hyperparameters for generation
model.generation_config = GenerationConfig.from_pretrained("Qwen/Qwen-72B-Chat", trust_remote_code=True)

# First dialogue turn
response, history = model.chat(tokenizer, "你好", history=None)
print(response)

# Second dialogue turn
response, history = model.chat(tokenizer, "给我讲一个年轻人奋斗创业最终取得成功的故事。", history=history)
print(response)

# Third dialogue turn
response, history = model.chat(tokenizer, "给这个故事起一个标题", history=history)
print(response)
Dataloop's AI Development Platform
Build end-to-end workflows

Build end-to-end workflows

Dataloop is a complete AI development stack, allowing you to make data, elements, models and human feedback work together easily.

  • Use one centralized tool for every step of the AI development process.
  • Import data from external blob storage, internal file system storage or public datasets.
  • Connect to external applications using a REST API & a Python SDK.
Save, share, reuse

Save, share, reuse

Every single pipeline can be cloned, edited and reused by other data professionals in the organization. Never build the same thing twice.

  • Use existing, pre-created pipelines for RAG, RLHF, RLAF, Active Learning & more.
  • Deploy multi-modal pipelines with one click across multiple cloud resources.
  • Use versions for your pipelines to make sure the deployed pipeline is the stable one.
Easily manage pipelines

Easily manage pipelines

Spend less time dealing with the logistics of owning multiple data pipelines, and get back to building great AI applications.

  • Easy visualization of the data flow through the pipeline.
  • Identify & troubleshoot issues with clear, node-based error messages.
  • Use scalable AI infrastructure that can grow to support massive amounts of data.