Mistral 3B
Mistral 3B is an optimized AI model designed for mobile deployment, particularly on Snapdragon platforms. What makes it unique is its ability to handle a wide range of language understanding and generation tasks efficiently. With 3 billion parameters, it's a large language model that can initiate conversations and respond quickly. Its performance is notable, with a response rate of 21.05 tokens per second and a time to first token of 0.092289 to 2.9532736 seconds. This model is perfect for on-device deployment, allowing for fast and accurate results in a variety of applications, from chatbots to language translation. However, it's essential to note that there are certain limitations and usage restrictions, such as not using it for law enforcement or biometric systems.
Table of Contents
Model Overview
The Mistral-3B model is a state-of-the-art language model designed for mobile deployment. It’s great at understanding and generating text, making it useful for a variety of tasks.
Capabilities
The Mistral-3B model can:
- Generate text in response to a prompt
- Understand and process natural language inputs
- Perform tasks such as conversation, question-answering, and text summarization
What makes it special?
- Optimized for mobile: This model is designed to run efficiently on mobile devices, making it perfect for applications that require on-device language processing.
- State-of-the-art performance: The Mistral-3B model achieves state-of-the-art results on various language understanding and generation tasks.
- Support for Snapdragon platforms: This model is optimized for Snapdragon platforms, ensuring seamless integration and optimal performance.
Model Stats
Here are some key statistics about the Mistral-3B model:
Model Stat | Value |
---|---|
Model Type | Text generation |
Input sequence length for Prompt Processor | 128 |
Max context length | 4096 |
Num of key-value heads | 8 |
Number of parameters | 3B |
Precision | w4a16 + w8a16 (few layers) |
Performance
The Mistral-3B model delivers impressive performance on various devices. Here are some benchmarks:
Device | Chipset | Target Runtime | Response Rate (tokens per second) | Time To First Token (range, seconds) |
---|---|---|---|---|
Snapdragon 8 Elite QRD | Snapdragon 8 Elite | QNN | 21.05 | 0.092289 - 2.9532736 |
How it Works
- Initiate Conversation: Start by using the prompt processor to initiate a conversation.
- Token Generation: For subsequent iterations, use the token generator to generate responses.
Limitations
Please note that this model should not be used for certain applications, such as law enforcement, biometric systems, or social scoring. Make sure to review the usage and limitations before deploying the model.
Device Support
The Mistral-3B model is supported on a variety of devices, including those with Snapdragon 8 Elite chipsets.
Important Details
- Supported Languages: English is the only supported language for this model.
- Minimum QNN SDK Version: You’ll need version
2.27.7
or higher to use this model. - Device Compatibility: Mistral-3B is designed to work with Snapdragon 8 Elite QRD devices.
Getting Started
To deploy the Mistral-3B model on-device, follow the LLM on-device deployment tutorial. Join our AI Hub Slack community to collaborate, post questions, and learn more about on-device AI.