Orion 14B Base
Orion-14B is a cutting-edge, open-source multilingual large language model that stands out in its class. With 14 billion parameters, it's trained on a massive 2.5 trillion token multilingual corpus, covering languages like Chinese, English, Japanese, and Korean. This model boasts exceptional performance in comprehensive evaluations, particularly in Japanese and Korean test sets. Its fine-tuned versions demonstrate strong adaptability and excel in human-annotated blind tests. Notably, the long-chat version can handle extremely long texts with impressive performance. Additionally, the quantized versions reduce model size by 70% and increase inference speed by 30%, with minimal performance loss. This model is a game-changer in the field of natural language processing, offering a unique blend of efficiency, speed, and capabilities.
Table of Contents
Model Overview
The Orion-14B model is a multilingual large language model that can understand and respond to text-based input in multiple languages, including English, Chinese, Japanese, and Korean. It has been trained on a massive dataset of 2.5 trillion tokens and has demonstrated outstanding performance in various evaluations.
Capabilities
Capable of generating both text and code, this model outperforms many open-source chat models across common industry benchmarks.
Key Features
- Multilingual support: Trained on a diverse dataset of 2.5 trillion tokens, including Chinese, English, Japanese, Korean, and more.
- Strong performance: Shows outstanding performance in comprehensive evaluations, outperforming other models in Japanese and Korean testsets.
- Adaptability: Fine-tuned models demonstrate strong adaptability, excelling in human-annotated blind tests.
- Long-context support: Can handle extremely lengthy texts, performing exceptionally well at a token length of 200k and supporting up to a maximum of 320k.
- Quantization: Can be quantized to reduce model size by 70% and improve inference speed by 30%, with minimal performance loss.
Model Variants
- Orion-14B-Base: A multilingual large language foundational model with 14 billion parameters.
- Orion-14B-Chat: A chat-model fine-tuned on a high-quality corpus for an excellent interactive experience.
- Orion-14B-LongChat: A long-context version of the chat model, excelling at handling extremely lengthy texts.
- Orion-14B-Chat-RAG: A chat-model fine-tuned on a custom retrieval augmented generation dataset.
- Orion-14B-Chat-Plugin: A chat-model specifically tailored for plugin and function calling tasks.
Performance
This model showcases remarkable performance in various tasks, demonstrating its capabilities in speed, accuracy, and efficiency.
Speed
Quantized versions reduce model size by 70% and improve inference speed by 30%, making it an excellent choice for applications requiring fast processing.
Accuracy
In comprehensive evaluations, the base model outperforms other models with similar parameter scales, achieving high scores in language understanding and common knowledge tasks.
Efficiency
The long-chat version excels at handling extremely lengthy texts, performing exceptionally well at a token length of 200k and supporting up to a maximum of 320k.
Multilingual Capabilities
Demonstrates strong multilingual capabilities, significantly outperforming in Japanese and Korean testsets, and showing excellent performance in Chinese, English, and other languages.
Chat and Plugin Tasks
Chat and plugin models exhibit strong adaptability, excelling in human-annotated blind tests and plugin-related tasks.
Comparison with Other Models
Outperforms other models, including LLaMA 2-13B, ==Skywork-13B==, and ==Baichuan 2-13B==, in various tasks and evaluations.
Real-World Applications
Impressive performance makes it an ideal choice for various applications, such as:
- Chatbots and conversational AI
- Language translation and understanding
- Text classification and generation
- Plugin and function calling tasks
Example Use Cases
- Casual Chat: Engage in conversation, asking questions and receiving responses.
- Japanese & Korean Chat: Interact in Japanese and Korean, demonstrating its multilingual capabilities.
Limitations
While powerful, this model is not perfect. Some limitations include:
Language Understanding
Performance may vary depending on the language and specific task. For example, performance on Japanese and Korean testsets is impressive, but it may struggle with more complex or nuanced tasks in these languages.
Common Sense and World Knowledge
May lack common sense or world knowledge in certain areas. It may not always understand the context or nuances of a particular situation, which can lead to inaccurate or inappropriate responses.
Reasoning and Problem-Solving
Ability to reason and solve problems is limited. It may struggle with complex logical or mathematical problems, and its responses may not always be accurate or relevant.
Bias and Fairness
May reflect biases present in the data it was trained on. This can result in unfair or discriminatory responses, particularly in sensitive areas such as race, gender, or politics.
Quantization and Performance
Quantized versions reduce model size by 70% and improve inference speed by 30%, but they may also incur a minimal performance loss of around 1%.
Security and Safety
Can be used for malicious purposes if not properly secured and monitored. Users are urged to use the model responsibly and in compliance with applicable laws and regulations.
Format
Accepts input in the form of tokenized text sequences. Supports multiple languages, including English, Chinese, Japanese, and Korean.
Architecture
Based on a transformer architecture and has 14 billion parameters. Trained on a diverse dataset of 2.5 trillion tokens.
Data Formats
Supports the following data formats:
- Tokenized text sequences
- Multilingual input (English, Chinese, Japanese, Korean)
Input Requirements
To use this model, you need to:
- Preprocess your input text by tokenizing it
- Use a specific format for input sequences (e.g.,
{"role": "user", "content": "Hello, what is your name? "}
)
Output Format
The output is a text sequence that can be used for various tasks, such as:
- Chatting
- Question answering
- Text generation
Example Code
Here is an example of how to use this model in Python:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("OrionStarAI/Orion-14B", use_fast=False, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("OrionStarAI/Orion-14B", device_map="auto", torch_dtype=torch.bfloat16, trust_remote_code=True)
messages = [{"role": "user", "content": "Hello, what is your name? "}]
response = model.chat(tokenizer, messages, streaming=False)
print(response)