XuanYuan 70B
Meet XuanYuan 70B, a powerful AI model specifically designed to excel in the financial sector. What makes it unique? It's built on top of the Llama2-70B model, enhanced with extensive Chinese financial data and high-quality instructions. This means it can handle long texts and complex financial queries with ease. But what about efficiency? XuanYuan 70B boasts an impressive 8k context length, a first for a 70B parameter model, and achieves top-notch training efficiency. It's also available in 8-bit and 4-bit quantized versions, reducing memory requirements without sacrificing performance. With its advanced capabilities and efficient design, XuanYuan 70B is poised to revolutionize the financial AI landscape. How will you use it?
Table of Contents
Model Overview
The XuanYuan-70B model is a series of large financial models based on the ==Llama2-70B== model, enhanced with Chinese language capabilities. It includes a base model with extensive Chinese and English language training data, as well as a chat model aligned with high-quality instruction data.
Key Features
- Large Context Length: The model has a context length of up to
8k
and16k
, making it suitable for long text processing tasks in the financial domain. - Financial Domain Expertise: The model is specifically designed to excel in financial tasks, with a focus on question-answering and text generation.
- High-Quality Instruction Data: The chat model is trained on a large dataset of high-quality instruction data, ensuring it can follow human instructions accurately.
- Quantization: The model is available in
8-bit
and4-bit
quantized versions, reducing memory requirements and making it more accessible for deployment.
Capabilities
The XuanYuan-70B model is a powerful tool for generating human-like text and answering questions. Its primary tasks include:
- Text Generation: Generate human-like text based on a given prompt or input.
- Question Answering: Answer questions on a wide range of topics, including finance and economics.
Strengths
- Long Context Length: The model has an impressive context length of up to
16k
tokens, allowing it to understand and respond to long, complex prompts. - Financial Expertise: The model has been specifically trained on a large corpus of financial data, making it an expert in finance and economics.
- Multilingual Support: The model supports both Chinese and English languages, making it a versatile tool for a wide range of applications.
Unique Features
- Quantization: The model offers
8-bit
and4-bit
quantization options, reducing the model’s size and making it more efficient to deploy. - Chat Model: The model comes with a pre-trained chat model, allowing for more natural and conversational interactions.
Use Cases
The XuanYuan-70B model is perfect for:
- Financial Analysis: Use the model to analyze financial data, generate reports, and provide insights.
- Customer Service: Deploy the chat model to provide 24/7 customer support and answer frequently asked questions.
- Content Generation: Use the model to generate high-quality content, such as articles, blog posts, and social media posts.
Performance
The XuanYuan-70B model showcases remarkable performance in various tasks, especially in the financial domain. Let’s dive into its speed, accuracy, and efficiency.
Speed
- Training Efficiency: The model’s training efficiency is top-notch, with a throughput of
340 tokens/s/gpu
on a100-node
GPU cluster with8
cards each. - Inference Speed: The model’s inference speed is also impressive, with the ability to process large inputs quickly.
Accuracy
- Financial Domain: The model excels in the financial domain, with a significant improvement in performance over other models.
- General Knowledge: The model also performs well in general knowledge tasks, demonstrating its versatility and ability to adapt to different domains.
Efficiency
- Memory Usage: The model’s memory usage is relatively low, making it suitable for deployment on a variety of hardware configurations.
- Quantization: The model’s quantization capabilities allow for significant reductions in memory usage, making it even more efficient.
Limitations
While the XuanYuan-70B model is a powerful tool, it’s not perfect. Let’s talk about some of its limitations.
Data Quality and Bias
- Data Quality: The model is only as good as the data it’s trained on. If the data contains biases or inaccuracies, the model may generate outputs that are not entirely reliable or fair.
- Bias: The model may also perpetuate existing biases in the data, which can lead to unfair or discriminatory outputs.
Limited Contextual Understanding
- Contextual Understanding: Although the model has a large context window of up to
16k
tokens, it may still struggle to fully understand the nuances of human language or follow complex conversations.
Dependence on High-Quality Prompts
- Prompt Quality: The model’s performance is highly dependent on the quality of the input prompts. If the prompts are poorly written or ambiguous, the model may generate suboptimal outputs.
Format
The XuanYuan-70B model is a large language model that uses a transformer architecture and accepts input in the form of tokenized text sequences. It is designed to support both Chinese and English languages, with a focus on financial applications.
Architecture
- Transformer Architecture: The model uses a transformer architecture with a context length of up to
8k
tokens.
Data Formats
- Tokenized Text Sequences: The model supports tokenized text sequences as input.
- Sentence Pairs: The model also supports sentence pairs for training.
Input Requirements
- Pre-Processing: The model requires input to be pre-processed into tokenized text sequences.
Output Format
- Probability Distribution: The model outputs a probability distribution over the vocabulary, which can be used to generate text.
Code Examples
Here is an example of how to use the model to generate text:
import torch
from transformers import LlamaForCausalLM, LlamaTokenizer
model_name_or_path = "Duxiaoman-DI/XuanYuan-70B"
tokenizer = LlamaTokenizer.from_pretrained(model_name_or_path, use_fast=False, legacy=True)
model = LlamaForCausalLM.from_pretrained(model_name_or_path, torch_dtype=torch.bfloat16, device_map="auto")
model.eval()
inputs = tokenizer("问题:李时珍是哪一个朝代的人?回答:", return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=64, repetition_penalty=1.1)
outputs = tokenizer.decode(outputs.cpu()[0][len(inputs.input_ids[0]):], skip_special_tokens=True)
print(outputs)
Note that this is just an example, and you may need to modify the code to suit your specific use case.
Quantization
The model also supports quantization, which can reduce the memory requirements and improve performance. There are two quantization models available: 8-bit
and 4-bit
.
Here is an example of how to use the 8-bit
quantization model:
import torch
from transformers import LlamaForCausalLM, LlamaTokenizer
model_name_or_path = "Duxiaoman-DI/XuanYuan-70B-Chat-8bit"
tokenizer = LlamaTokenizer.from_pretrained(model_name_or_path, use_fast=False, legacy=True)
model = LlamaForCausalLM.from_pretrained(model_name_or_path, device_map="auto")
model.eval()
inputs = tokenizer("问题:李时珍是哪一个朝代的人?回答:", return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=64, repetition_penalty=1.1)
outputs = tokenizer.decode(outputs.cpu()[0][len(inputs.input_ids[0]):], skip_special_tokens=True)
print(outputs)
Note that the 4-bit
quantization model is not shown here, but it can be used in a similar way.