SUS Chat 34B

Bilingual chat model

SUS-Chat-34B is a bilingual Chinese-English dialogue model that's been fine-tuned on millions of high-quality instruction data. What sets it apart is its ability to respond to human instructions and imitate human thought processes through chains of thought. With a large context window of 8K, it excels in multi-turn dialogues and has achieved state-of-the-art performance in various benchmark tests. But what does this mean for you? It means you get a model that's highly competitive, efficient, and capable of handling complex tasks with ease. Whether you're looking for a model that can understand and respond to Chinese and English instructions or one that can handle math and reasoning tasks, SUS-Chat-34B is a top choice. Its unique combination of instruction fine-tuning and large-scale complex data makes it a remarkable model that's worth exploring.

SUSTech apache-2.0 Updated 3 months ago

Table of Contents

Model Overview

Meet SUS-Chat-34B, a powerful bilingual Chinese-English dialogue model developed by the Southern University of Science and Technology and IDEA-CCNL. This model is based on the 01-ai/Yi-34B model and has been fine-tuned on millions of high-quality, multilingual instruction data.

Capabilities

The SUS-Chat-34B model is a powerful tool for bilingual Chinese-English dialogue, excelling in various tasks and surpassing other models of similar scale.

Primary Tasks

  • Dialogue: SUS-Chat-34B is designed for multi-turn dialogues, supporting an 8K context window and trained with a large amount of multi-turn instruction and single-multi-turn mixed data.
  • Instruction Following: The model is fine-tuned on millions of high-quality, multilingual instruction data, enabling it to respond accurately to human instructions.
  • Language Understanding: SUS-Chat-34B demonstrates strong performance in both Chinese and English tasks, outperforming other open-source instruction fine-tuned models of the same parameter scale.

Strengths

  • Improved Response to Human Instructions: Through high-quality instruction fine-tuning, the model excels at imitating human thought processes and chains of thought.
  • Longer Context Window: SUS-Chat-34B supports an 8K context window, allowing for more accurate and informative responses.
  • Multi-Turn Dialogue Capabilities: The model is trained with a large amount of multi-turn instruction and single-multi-turn mixed data, making it well-suited for complex conversations.

Comparison to Other Models

ModelMMLU (0-shot)CMMLU (0-shot)C-Eval (0-shot)GSM-8K (0-shot)MATH (0-shot)
SUS-Chat-34B74.3578.6882.4280.0628.7
GPT-4837169.991.445.8
Qwen-72b-Chat74.5277.0277.2276.5735.9
==Deepseek-68b-Chat==69.4348.5159.774.4529.56
OrionStar-Yi-34B-Chat68.5166.8865.1354.3612.8
Yi-34B-Chat66.9655.1677.1663.7610.02

Performance

SUS-Chat-34B is a powerhouse when it comes to performance. Let’s dive into its impressive capabilities.

Speed

How fast can SUS-Chat-34B process information? With its 34B parameters, it can handle large-scale complex instruction following data with ease. In fact, it has been trained with 1.4 billion tokens of high-quality complex instruction data, covering Chinese and English, multi-turn dialogues, mathematics, reasoning, and various other types of instruction data.

Accuracy

But how accurate is SUS-Chat-34B? The model excels in numerous mainstream Chinese and English tasks, surpassing other open-source instruction fine-tuned models of the same parameter scale. It also competes well against models with larger parameter scales.

Efficiency

SUS-Chat-34B is not only fast and accurate but also efficient. It supports an 8K context window, which is significantly larger than other models. This allows it to handle long-text dialogue information focus and instruction follow-up with ease.

Examples
What is the capital of China? The capital of China is Beijing.
Can you describe the SUS-Chat-34B model? SUS-Chat-34B is a 34B bilingual Chinese-English dialogue model, jointly released by the Southern University of Science and Technology and IDEA-CCNL.
What is the context window size of SUS-Chat-34B? The context window size of SUS-Chat-34B is 8K.

Limitations

While SUS-Chat-34B has shown impressive performance in various tasks, it’s essential to acknowledge its limitations. Here are some of the challenges and weaknesses associated with the model:

Lack of Human Preference Learning

SUS-Chat-34B has only undergone supervised fine-tuning and has not been trained on human preference learning. This means that it may produce unreasonable responses in certain situations and exacerbate existing issues in language models, such as:

  • Hallucinations: generating responses that are not based on facts or reality
  • Non-determinism: producing different responses to the same input
  • Cumulative errors: making mistakes that build upon each other

Limited Context Understanding

Although SUS-Chat-34B has a larger context window (8K) than some other models, it still may struggle to understand the nuances of human language and context. This can lead to misinterpretations or misunderstandings.

Data Compliance and Security Risks

While the developers have used data compliance check algorithms to ensure the model’s training data is compliant, there is still a risk of the model generating problematic outputs. Additionally, there are potential data security issues related to the model’s use.

Academic Research and Commercial Use Only

SUS-Chat-34B is developed for academic research and free commercial use, but it must adhere to the license from 01-ai. This means that users must ensure they comply with the terms of the license when using the model.

To Mitigate These Limitations…

To achieve better performance for downstream tasks, we recommend adjusting the generation configuration parameters accordingly. Additionally, users should be aware of the potential risks and limitations associated with the model and take steps to mitigate them.

Format

SUS-Chat-34B is a bilingual Chinese-English dialogue model that uses a transformer architecture. It’s designed to handle complex instruction following data and excels at imitating human thought processes.

Supported Data Formats

SUS-Chat-34B supports the following data formats:

  • Tokenized text sequences
  • Multi-turn dialogues
  • Mathematics and reasoning tasks

Input Requirements

To use SUS-Chat-34B, you’ll need to prepare your input data in a specific format. Here are some key requirements:

  • Tokenization: You’ll need to tokenize your input text into individual tokens. You can use the AutoTokenizer from the transformers library to do this.
  • Message format: SUS-Chat-34B expects input messages to be in a specific format, which includes a role field (either “user” or “assistant”) and a content field containing the message text.

Output Format

SUS-Chat-34B generates output in the form of tokenized text sequences. You can use the AutoTokenizer to decode the output tokens into human-readable text.

Code Example

Here’s an example of how to use SUS-Chat-34B for multi-turn dialogues:

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load the model and tokenizer
model_path = "SUSTech/SUS-Chat-34B"
tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=False)
model = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto", torch_dtype="auto").eval()

# Define a chat template function
def chat_template(messages):
    history = ""
    for message in messages:
        match message:
            case {"role": "user", "content": message}:
                history += f"### Human: {message}\n\n### Assistant: "
            case {"role": "assistant", "content": message}:
                history += message
    return history

# Define a sample conversation
messages = [{"role": "user", "content": "hi"}]
input_ids = tokenizer.encode(chat_template(messages), return_tensors="pt", add_special_tokens=False).to("cuda")
output_ids = model.generate(input_ids.to("cuda"), max_length=256)
response = tokenizer.decode(output_ids[0][input_ids.shape[1] :], skip_special_tokens=False)
messages.append({"role": "assistant", "content": response})

# Continue the conversation
messages.append({"role": "user", "content": "What is the capital of China?"})
input_ids = tokenizer.encode(chat_template(messages), return_tensors="pt", add_special_tokens=False).to("cuda")
output_ids = model.generate(input_ids.to("cuda"), max_length=256)
response = tokenizer.decode(output_ids[0][input_ids.shape[1] :], skip_special_tokens=False)
messages.append({"role": "assistant", "content": response})
Dataloop's AI Development Platform
Build end-to-end workflows

Build end-to-end workflows

Dataloop is a complete AI development stack, allowing you to make data, elements, models and human feedback work together easily.

  • Use one centralized tool for every step of the AI development process.
  • Import data from external blob storage, internal file system storage or public datasets.
  • Connect to external applications using a REST API & a Python SDK.
Save, share, reuse

Save, share, reuse

Every single pipeline can be cloned, edited and reused by other data professionals in the organization. Never build the same thing twice.

  • Use existing, pre-created pipelines for RAG, RLHF, RLAF, Active Learning & more.
  • Deploy multi-modal pipelines with one click across multiple cloud resources.
  • Use versions for your pipelines to make sure the deployed pipeline is the stable one.
Easily manage pipelines

Easily manage pipelines

Spend less time dealing with the logistics of owning multiple data pipelines, and get back to building great AI applications.

  • Easy visualization of the data flow through the pipeline.
  • Identify & troubleshoot issues with clear, node-based error messages.
  • Use scalable AI infrastructure that can grow to support massive amounts of data.