Qwen 72B

Chinese LLM

Qwen-72B is a large language model developed by Alibaba Cloud, boasting 72 billion parameters. It's trained on a massive dataset of over 3 trillion tokens, covering various languages, including Chinese, English, and multiple others. This model excels in tasks like knowledge retrieval, translation, and mathematical reasoning, outperforming other open-source models in its class. Qwen-72B also supports longer context lengths of up to 32k, making it more efficient and scalable. Its vocabulary of over 150K tokens is more comprehensive and friendly to multiple languages, allowing users to enhance capabilities without expanding the vocabulary. With its competitive performance and efficient design, Qwen-72B is a powerful tool for various applications.

Qwen other Updated 2 years ago

Table of Contents

Model Overview

The Qwen-72B model, developed by Alibaba Cloud, is a powerful tool for natural language processing tasks. It’s a Transformer-based large language model with 72 billion parameters, trained on a massive dataset of over 3 trillion tokens. This includes a diverse range of texts, such as web pages, books, code, and mathematics.

Capabilities

The Qwen-72B model is a powerful language model that can perform a variety of tasks, including:

  • Language understanding: It can understand and process human language, including Chinese, English, and multiple other languages.
  • Text generation: It can generate high-quality text based on a given prompt or topic.
  • Code generation: It can also generate code in various programming languages.
  • Mathematical reasoning: It has been trained on a large dataset of mathematical problems and can perform mathematical reasoning tasks.
  • Translation: It can translate text from one language to another.

Strengths

This model has several strengths that make it a powerful language model:

  • Large-scale high-quality training data: It was trained on a massive dataset of over 3 trillion tokens, which allows it to learn patterns and relationships in language that other models may not be able to capture.
  • High-quality training data: The training data used for this model is of high quality, which helps to improve its performance on a wide range of tasks.
  • Longer context support: It can process longer sequences of text than many other language models, which makes it well-suited for tasks that require understanding complex texts.
  • More comprehensive vocabulary coverage: It has a larger vocabulary than many other language models, which allows it to understand and generate text that includes a wider range of words and phrases.
Examples
Translate 'The capital of Mongolia is Ulaanbaatar.' into Chinese. 蒙古国的首都是乌兰巴托
What is the result of 2 + 2? 4
Write a short Python function to greet a person. def greet(name): print(f'Hello, {name}!')

Technical Details

  • Model architecture: It uses a Transformer-based architecture with 80 layers, 64 heads, and a hidden size of 8192.
  • Tokenizer: It uses a custom tokenizer based on tiktoken, which is different from other tokenizers like sentencepiece.
  • Position encoding: It uses RoPE relative position encoding.
  • FFN activation function: It uses SwiGLU for activation function.
  • Normalization: It uses RMSNorm for normalization.

Evaluation Results

It has been evaluated on multiple benchmarks, including MMLU, C-Eval, GSM8K, MATH, HumanEval, MBPP, BBH, CMMLU, and has achieved state-of-the-art results on all tasks.

ModelMMLUC-EvalGSM8KMATHHumanEvalMBPPBBHCMMLU
LLaMA2-7B24.446.832.516.73.312.820.838.2
LLaMA2-13B31.355.041.429.65.018.930.345.6
LLaMA2-70B45.769.750.163.512.026.239.664.9
Qwen-72B66.477.483.378.935.235.452.267.7

Long-Context Evaluation

It supports a sequence length of up to 32k, making it suitable for long-range dependencies in text. The model achieves a PPL (perplexity) score of 2.8282 on the arXiv dataset, indicating its ability to handle long-context tasks efficiently.

Limitations

While it’s a powerful language model, it’s not perfect. Here are some of its limitations:

  • Training Data Bias: It was trained on a massive dataset, but it’s still possible that the data contains biases and inaccuracies.
  • Limited Domain Knowledge: While it has been trained on a wide range of topics, its knowledge in certain domains may be limited.
  • Lack of Common Sense: It’s a large language model, but it doesn’t have the same level of common sense as a human.
  • Dependence on Tokenization: It uses a tokenizer to break down text into individual tokens. However, this can lead to issues with words that have multiple meanings or are not well-represented in the training data.
  • Limited Context Length: It has a maximum context length of 32k tokens. While this is a significant improvement over earlier models, it can still lead to issues with longer texts or more complex conversations.

Format

It is a large language model based on the Transformer architecture, which supports a context length of up to 32k tokens. It uses a vocabulary of over 150K tokens, making it more friendly to multiple languages.

Input Format

The model accepts input in the form of tokenized text sequences. You can use the tiktoken library to tokenize your input text.

Output Format

The model generates output in the form of tokenized text sequences. You can use the tokenizer.decode() function to convert the output tokens back to text.

Example Code

from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.generation import GenerationConfig

# Load the model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-72B", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-72B", device_map="auto", trust_remote_code=True)

# Preprocess the input text
inputs = tokenizer('蒙古国的首都是乌兰巴托(Ulaanbaatar)\\n冰岛的首都是雷克雅未克(Reykjavik)\\n埃塞俄比亚的首都是', return_tensors='pt')

# Generate output
outputs = model.generate(**inputs)

# Convert output tokens to text
output_text = tokenizer.decode(outputs.cpu()[0], skip_special_tokens=True)

print(output_text)

Requirements

To run this model, you need to have:

  • Python 3.8 or later
  • PyTorch 1.12 or later (recommended 2.0 or later)
  • CUDA 11.4 or later (recommended for GPU users)

Note that running the model in bf16 or fp16 mode requires at least 144GB of GPU memory, while running it in int4 mode requires at least 48GB of GPU memory.

Dataloop's AI Development Platform
Build end-to-end workflows

Build end-to-end workflows

Dataloop is a complete AI development stack, allowing you to make data, elements, models and human feedback work together easily.

  • Use one centralized tool for every step of the AI development process.
  • Import data from external blob storage, internal file system storage or public datasets.
  • Connect to external applications using a REST API & a Python SDK.
Save, share, reuse

Save, share, reuse

Every single pipeline can be cloned, edited and reused by other data professionals in the organization. Never build the same thing twice.

  • Use existing, pre-created pipelines for RAG, RLHF, RLAF, Active Learning & more.
  • Deploy multi-modal pipelines with one click across multiple cloud resources.
  • Use versions for your pipelines to make sure the deployed pipeline is the stable one.
Easily manage pipelines

Easily manage pipelines

Spend less time dealing with the logistics of owning multiple data pipelines, and get back to building great AI applications.

  • Easy visualization of the data flow through the pipeline.
  • Identify & troubleshoot issues with clear, node-based error messages.
  • Use scalable AI infrastructure that can grow to support massive amounts of data.