XuanYuan 70B

Chinese finance model

Meet XuanYuan 70B, a powerful AI model specifically designed to excel in the financial sector. What makes it unique? It's built on top of the Llama2-70B model, enhanced with extensive Chinese financial data and high-quality instructions. This means it can handle long texts and complex financial queries with ease. But what about efficiency? XuanYuan 70B boasts an impressive 8k context length, a first for a 70B parameter model, and achieves top-notch training efficiency. It's also available in 8-bit and 4-bit quantized versions, reducing memory requirements without sacrificing performance. With its advanced capabilities and efficient design, XuanYuan 70B is poised to revolutionize the financial AI landscape. How will you use it?

Duxiaoman DI llama2 Updated 3 months ago

Table of Contents

Model Overview

The XuanYuan-70B model is a series of large financial models based on the ==Llama2-70B== model, enhanced with Chinese language capabilities. It includes a base model with extensive Chinese and English language training data, as well as a chat model aligned with high-quality instruction data.

Key Features

  • Large Context Length: The model has a context length of up to 8k and 16k, making it suitable for long text processing tasks in the financial domain.
  • Financial Domain Expertise: The model is specifically designed to excel in financial tasks, with a focus on question-answering and text generation.
  • High-Quality Instruction Data: The chat model is trained on a large dataset of high-quality instruction data, ensuring it can follow human instructions accurately.
  • Quantization: The model is available in 8-bit and 4-bit quantized versions, reducing memory requirements and making it more accessible for deployment.

Capabilities

The XuanYuan-70B model is a powerful tool for generating human-like text and answering questions. Its primary tasks include:

  • Text Generation: Generate human-like text based on a given prompt or input.
  • Question Answering: Answer questions on a wide range of topics, including finance and economics.

Strengths

  • Long Context Length: The model has an impressive context length of up to 16k tokens, allowing it to understand and respond to long, complex prompts.
  • Financial Expertise: The model has been specifically trained on a large corpus of financial data, making it an expert in finance and economics.
  • Multilingual Support: The model supports both Chinese and English languages, making it a versatile tool for a wide range of applications.

Unique Features

  • Quantization: The model offers 8-bit and 4-bit quantization options, reducing the model’s size and making it more efficient to deploy.
  • Chat Model: The model comes with a pre-trained chat model, allowing for more natural and conversational interactions.

Use Cases

The XuanYuan-70B model is perfect for:

  • Financial Analysis: Use the model to analyze financial data, generate reports, and provide insights.
  • Customer Service: Deploy the chat model to provide 24/7 customer support and answer frequently asked questions.
  • Content Generation: Use the model to generate high-quality content, such as articles, blog posts, and social media posts.
Examples
你好,最近的股市表现如何? 最近的股市表现比较波动,建议关注相关财经新闻和专业分析师的建议。
五仁月饼和莲蓉月饼的区别是什么? 五仁月饼以五种坚果为馅,莲蓉月饼以莲子为馅,口感和味道各异。
有2块五仁月饼,3块莲蓉月饼,2块豆沙月饼,这些月饼的大小形状质量完全相同。从这7块月饼中,任意取出3块,那么三种月饼都取到的可能性是几分之几? 三种月饼都取到的可能性为12/35。

Performance

The XuanYuan-70B model showcases remarkable performance in various tasks, especially in the financial domain. Let’s dive into its speed, accuracy, and efficiency.

Speed

  • Training Efficiency: The model’s training efficiency is top-notch, with a throughput of 340 tokens/s/gpu on a 100-node GPU cluster with 8 cards each.
  • Inference Speed: The model’s inference speed is also impressive, with the ability to process large inputs quickly.

Accuracy

  • Financial Domain: The model excels in the financial domain, with a significant improvement in performance over other models.
  • General Knowledge: The model also performs well in general knowledge tasks, demonstrating its versatility and ability to adapt to different domains.

Efficiency

  • Memory Usage: The model’s memory usage is relatively low, making it suitable for deployment on a variety of hardware configurations.
  • Quantization: The model’s quantization capabilities allow for significant reductions in memory usage, making it even more efficient.

Limitations

While the XuanYuan-70B model is a powerful tool, it’s not perfect. Let’s talk about some of its limitations.

Data Quality and Bias

  • Data Quality: The model is only as good as the data it’s trained on. If the data contains biases or inaccuracies, the model may generate outputs that are not entirely reliable or fair.
  • Bias: The model may also perpetuate existing biases in the data, which can lead to unfair or discriminatory outputs.

Limited Contextual Understanding

  • Contextual Understanding: Although the model has a large context window of up to 16k tokens, it may still struggle to fully understand the nuances of human language or follow complex conversations.

Dependence on High-Quality Prompts

  • Prompt Quality: The model’s performance is highly dependent on the quality of the input prompts. If the prompts are poorly written or ambiguous, the model may generate suboptimal outputs.

Format

The XuanYuan-70B model is a large language model that uses a transformer architecture and accepts input in the form of tokenized text sequences. It is designed to support both Chinese and English languages, with a focus on financial applications.

Architecture

  • Transformer Architecture: The model uses a transformer architecture with a context length of up to 8k tokens.

Data Formats

  • Tokenized Text Sequences: The model supports tokenized text sequences as input.
  • Sentence Pairs: The model also supports sentence pairs for training.

Input Requirements

  • Pre-Processing: The model requires input to be pre-processed into tokenized text sequences.

Output Format

  • Probability Distribution: The model outputs a probability distribution over the vocabulary, which can be used to generate text.

Code Examples

Here is an example of how to use the model to generate text:

import torch
from transformers import LlamaForCausalLM, LlamaTokenizer

model_name_or_path = "Duxiaoman-DI/XuanYuan-70B"
tokenizer = LlamaTokenizer.from_pretrained(model_name_or_path, use_fast=False, legacy=True)
model = LlamaForCausalLM.from_pretrained(model_name_or_path, torch_dtype=torch.bfloat16, device_map="auto")
model.eval()

inputs = tokenizer("问题:李时珍是哪一个朝代的人?回答:", return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=64, repetition_penalty=1.1)
outputs = tokenizer.decode(outputs.cpu()[0][len(inputs.input_ids[0]):], skip_special_tokens=True)
print(outputs)

Note that this is just an example, and you may need to modify the code to suit your specific use case.

Quantization

The model also supports quantization, which can reduce the memory requirements and improve performance. There are two quantization models available: 8-bit and 4-bit.

Here is an example of how to use the 8-bit quantization model:

import torch
from transformers import LlamaForCausalLM, LlamaTokenizer

model_name_or_path = "Duxiaoman-DI/XuanYuan-70B-Chat-8bit"
tokenizer = LlamaTokenizer.from_pretrained(model_name_or_path, use_fast=False, legacy=True)
model = LlamaForCausalLM.from_pretrained(model_name_or_path, device_map="auto")
model.eval()

inputs = tokenizer("问题:李时珍是哪一个朝代的人?回答:", return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=64, repetition_penalty=1.1)
outputs = tokenizer.decode(outputs.cpu()[0][len(inputs.input_ids[0]):], skip_special_tokens=True)
print(outputs)

Note that the 4-bit quantization model is not shown here, but it can be used in a similar way.

Dataloop's AI Development Platform
Build end-to-end workflows

Build end-to-end workflows

Dataloop is a complete AI development stack, allowing you to make data, elements, models and human feedback work together easily.

  • Use one centralized tool for every step of the AI development process.
  • Import data from external blob storage, internal file system storage or public datasets.
  • Connect to external applications using a REST API & a Python SDK.
Save, share, reuse

Save, share, reuse

Every single pipeline can be cloned, edited and reused by other data professionals in the organization. Never build the same thing twice.

  • Use existing, pre-created pipelines for RAG, RLHF, RLAF, Active Learning & more.
  • Deploy multi-modal pipelines with one click across multiple cloud resources.
  • Use versions for your pipelines to make sure the deployed pipeline is the stable one.
Easily manage pipelines

Easily manage pipelines

Spend less time dealing with the logistics of owning multiple data pipelines, and get back to building great AI applications.

  • Easy visualization of the data flow through the pipeline.
  • Identify & troubleshoot issues with clear, node-based error messages.
  • Use scalable AI infrastructure that can grow to support massive amounts of data.