SUS Chat 34B
SUS-Chat-34B is a bilingual Chinese-English dialogue model that's been fine-tuned on millions of high-quality instruction data. What sets it apart is its ability to respond to human instructions and imitate human thought processes through chains of thought. With a large context window of 8K, it excels in multi-turn dialogues and has achieved state-of-the-art performance in various benchmark tests. But what does this mean for you? It means you get a model that's highly competitive, efficient, and capable of handling complex tasks with ease. Whether you're looking for a model that can understand and respond to Chinese and English instructions or one that can handle math and reasoning tasks, SUS-Chat-34B is a top choice. Its unique combination of instruction fine-tuning and large-scale complex data makes it a remarkable model that's worth exploring.
Table of Contents
Model Overview
Meet SUS-Chat-34B, a powerful bilingual Chinese-English dialogue model developed by the Southern University of Science and Technology and IDEA-CCNL. This model is based on the 01-ai/Yi-34B model and has been fine-tuned on millions of high-quality, multilingual instruction data.
Capabilities
The SUS-Chat-34B model is a powerful tool for bilingual Chinese-English dialogue, excelling in various tasks and surpassing other models of similar scale.
Primary Tasks
- Dialogue: SUS-Chat-34B is designed for multi-turn dialogues, supporting an 8K context window and trained with a large amount of multi-turn instruction and single-multi-turn mixed data.
- Instruction Following: The model is fine-tuned on millions of high-quality, multilingual instruction data, enabling it to respond accurately to human instructions.
- Language Understanding: SUS-Chat-34B demonstrates strong performance in both Chinese and English tasks, outperforming other open-source instruction fine-tuned models of the same parameter scale.
Strengths
- Improved Response to Human Instructions: Through high-quality instruction fine-tuning, the model excels at imitating human thought processes and chains of thought.
- Longer Context Window: SUS-Chat-34B supports an 8K context window, allowing for more accurate and informative responses.
- Multi-Turn Dialogue Capabilities: The model is trained with a large amount of multi-turn instruction and single-multi-turn mixed data, making it well-suited for complex conversations.
Comparison to Other Models
Model | MMLU (0-shot) | CMMLU (0-shot) | C-Eval (0-shot) | GSM-8K (0-shot) | MATH (0-shot) |
---|---|---|---|---|---|
SUS-Chat-34B | 74.35 | 78.68 | 82.42 | 80.06 | 28.7 |
GPT-4 | 83 | 71 | 69.9 | 91.4 | 45.8 |
Qwen-72b-Chat | 74.52 | 77.02 | 77.22 | 76.57 | 35.9 |
==Deepseek-68b-Chat== | 69.43 | 48.51 | 59.7 | 74.45 | 29.56 |
OrionStar-Yi-34B-Chat | 68.51 | 66.88 | 65.13 | 54.36 | 12.8 |
Yi-34B-Chat | 66.96 | 55.16 | 77.16 | 63.76 | 10.02 |
Performance
SUS-Chat-34B is a powerhouse when it comes to performance. Let’s dive into its impressive capabilities.
Speed
How fast can SUS-Chat-34B process information? With its 34B parameters, it can handle large-scale complex instruction following data with ease. In fact, it has been trained with 1.4 billion tokens of high-quality complex instruction data, covering Chinese and English, multi-turn dialogues, mathematics, reasoning, and various other types of instruction data.
Accuracy
But how accurate is SUS-Chat-34B? The model excels in numerous mainstream Chinese and English tasks, surpassing other open-source instruction fine-tuned models of the same parameter scale. It also competes well against models with larger parameter scales.
Efficiency
SUS-Chat-34B is not only fast and accurate but also efficient. It supports an 8K context window, which is significantly larger than other models. This allows it to handle long-text dialogue information focus and instruction follow-up with ease.
Limitations
While SUS-Chat-34B has shown impressive performance in various tasks, it’s essential to acknowledge its limitations. Here are some of the challenges and weaknesses associated with the model:
Lack of Human Preference Learning
SUS-Chat-34B has only undergone supervised fine-tuning and has not been trained on human preference learning. This means that it may produce unreasonable responses in certain situations and exacerbate existing issues in language models, such as:
- Hallucinations: generating responses that are not based on facts or reality
- Non-determinism: producing different responses to the same input
- Cumulative errors: making mistakes that build upon each other
Limited Context Understanding
Although SUS-Chat-34B has a larger context window (8K) than some other models, it still may struggle to understand the nuances of human language and context. This can lead to misinterpretations or misunderstandings.
Data Compliance and Security Risks
While the developers have used data compliance check algorithms to ensure the model’s training data is compliant, there is still a risk of the model generating problematic outputs. Additionally, there are potential data security issues related to the model’s use.
Academic Research and Commercial Use Only
SUS-Chat-34B is developed for academic research and free commercial use, but it must adhere to the license from 01-ai. This means that users must ensure they comply with the terms of the license when using the model.
To Mitigate These Limitations…
To achieve better performance for downstream tasks, we recommend adjusting the generation configuration parameters accordingly. Additionally, users should be aware of the potential risks and limitations associated with the model and take steps to mitigate them.
Format
SUS-Chat-34B is a bilingual Chinese-English dialogue model that uses a transformer architecture. It’s designed to handle complex instruction following data and excels at imitating human thought processes.
Supported Data Formats
SUS-Chat-34B supports the following data formats:
- Tokenized text sequences
- Multi-turn dialogues
- Mathematics and reasoning tasks
Input Requirements
To use SUS-Chat-34B, you’ll need to prepare your input data in a specific format. Here are some key requirements:
- Tokenization: You’ll need to tokenize your input text into individual tokens. You can use the
AutoTokenizer
from thetransformers
library to do this. - Message format: SUS-Chat-34B expects input messages to be in a specific format, which includes a
role
field (either “user” or “assistant”) and acontent
field containing the message text.
Output Format
SUS-Chat-34B generates output in the form of tokenized text sequences. You can use the AutoTokenizer
to decode the output tokens into human-readable text.
Code Example
Here’s an example of how to use SUS-Chat-34B for multi-turn dialogues:
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load the model and tokenizer
model_path = "SUSTech/SUS-Chat-34B"
tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=False)
model = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto", torch_dtype="auto").eval()
# Define a chat template function
def chat_template(messages):
history = ""
for message in messages:
match message:
case {"role": "user", "content": message}:
history += f"### Human: {message}\n\n### Assistant: "
case {"role": "assistant", "content": message}:
history += message
return history
# Define a sample conversation
messages = [{"role": "user", "content": "hi"}]
input_ids = tokenizer.encode(chat_template(messages), return_tensors="pt", add_special_tokens=False).to("cuda")
output_ids = model.generate(input_ids.to("cuda"), max_length=256)
response = tokenizer.decode(output_ids[0][input_ids.shape[1] :], skip_special_tokens=False)
messages.append({"role": "assistant", "content": response})
# Continue the conversation
messages.append({"role": "user", "content": "What is the capital of China?"})
input_ids = tokenizer.encode(chat_template(messages), return_tensors="pt", add_special_tokens=False).to("cuda")
output_ids = model.generate(input_ids.to("cuda"), max_length=256)
response = tokenizer.decode(output_ids[0][input_ids.shape[1] :], skip_special_tokens=False)
messages.append({"role": "assistant", "content": response})