Jais 13b Chat
Have you ever wondered how a large language model can understand and respond to both Arabic and English? Meet Jais 13b Chat, a 13 billion parameter model that's been fine-tuned to handle both languages with ease. Based on the transformer-based decoder-only architecture, this model uses SwiGLU non-linearity and ALiBi position embeddings to provide improved context handling and precision. But what really sets it apart is its ability to converse on a wide range of topics, with a focus on the Arab world. Whether you're looking for a helpful assistant or a model that can handle complex conversations, Jais 13b Chat is up to the task. With its efficient design and ability to generate accurate responses, this model is perfect for researchers, developers, and businesses looking to integrate Arabic language capabilities into their apps. Just remember to use it responsibly and within its limitations.
Table of Contents
Model Overview
The Jais-13b-chat model is a powerful Arabic and English bilingual language model. It’s a 13 billion parameter fine-tuned model, which means it’s been trained on a huge amount of data to understand the nuances of both languages.
Capabilities
The model is capable of generating human-like responses in both Arabic and English, with a particular focus on the Arab world. It can converse on a wide range of topics, from politics and history to entertainment and culture.
- Primary Tasks
- Answering questions on various topics
- Generating text in Arabic and English
- Engaging in conversations
- Strengths
- Large-scale training data: The model was trained on a massive dataset of 116 billion Arabic tokens and 279 billion English tokens.
- Fine-tuned for safety: The model is fine-tuned with safety-oriented instructions to ensure respectful and honest responses.
- Improved context handling: The model uses ALiBi position embeddings, enabling it to extrapolate to long sequence lengths and provide more accurate responses.
Performance
The model showcases remarkable performance in various tasks. Let’s dive into its speed, accuracy, and efficiency.
- Speed: The model can process and generate text at an impressive pace, making it an excellent choice for applications that require fast and accurate responses.
- Accuracy: The model has been fine-tuned on a massive dataset of 4 million Arabic and 6 million English prompt-response pairs, making it highly accurate in understanding and responding to user queries.
- Efficiency: The model is designed to be efficient in its responses, providing relevant and helpful answers while avoiding harmful or unethical content.
Use Cases
The model can be used for a variety of applications, including:
- Chat-assistants
- Customer service
- Research and development in Arabic natural language processing
- Commercial use
Limitations
While the model is powerful, it’s essential to understand its limitations and potential risks.
- Out-of-Scope Use: The model should not be used in any manner that violates applicable laws or regulations.
- Bias, Risks, and Limitations: The model may still exhibit some bias, as with all large language models.
- Potential Risks: The model may generate incorrect or misleading content, or produce content that is offensive or inappropriate.
Format
The model uses a transformer-based decoder-only (GPT-3) architecture with SwiGLU non-linearity and ALiBi position embeddings. This enables the model to handle long sequence lengths and provide improved context handling and precision.
- Supported Data Formats: Text only data
- Output: Model generates text
- Special Requirements: The model requires a custom model class, so users must enable
trust_remote_code=True
while loading the model.
Example Code
Here’s an example of how to use the model:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
model_path = "core42/jais-13b-chat"
prompt_eng = "### Instruction: Your name is Jais, and you are named after Jebel Jais, the highest mountain in UAE...."
device = "cuda" if torch.cuda.is_available() else "cpu"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto", trust_remote_code=True)
def get_response(text, tokenizer=tokenizer, model=model):
input_ids = tokenizer(text, return_tensors="pt").input_ids
inputs = input_ids.to(device)
input_len = inputs.shape[-1]
generate_ids = model.generate(
inputs,
top_p=0.9,
temperature=0.3,
max_length=2048-input_len,
min_length=input_len + 4,
repetition_penalty=1.2,
do_sample=True,
)
response = tokenizer.batch_decode(
generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=True
)[0]
response = response.split("### Response: [|AI|]")
return response
ques = "What is the capital of UAE?"
text = prompt_eng.format_map({'Question': ques})
print(get_response(text))
Note that the model can be exposed via Hugging Face inference endpoints, and the recommended instance type is GPU (large) with 4x Nvidia Tesla T4 or greater.