Mistral 3B

Optimized Mobile AI

Mistral 3B is an optimized AI model designed for mobile deployment, particularly on Snapdragon platforms. What makes it unique is its ability to handle a wide range of language understanding and generation tasks efficiently. With 3 billion parameters, it's a large language model that can initiate conversations and respond quickly. Its performance is notable, with a response rate of 21.05 tokens per second and a time to first token of 0.092289 to 2.9532736 seconds. This model is perfect for on-device deployment, allowing for fast and accurate results in a variety of applications, from chatbots to language translation. However, it's essential to note that there are certain limitations and usage restrictions, such as not using it for law enforcement or biometric systems.

Qualcomm other Updated 4 months ago

Table of Contents

Model Overview

The Mistral-3B model is a state-of-the-art language model designed for mobile deployment. It’s great at understanding and generating text, making it useful for a variety of tasks.

Capabilities

The Mistral-3B model can:

  • Generate text in response to a prompt
  • Understand and process natural language inputs
  • Perform tasks such as conversation, question-answering, and text summarization

What makes it special?

  • Optimized for mobile: This model is designed to run efficiently on mobile devices, making it perfect for applications that require on-device language processing.
  • State-of-the-art performance: The Mistral-3B model achieves state-of-the-art results on various language understanding and generation tasks.
  • Support for Snapdragon platforms: This model is optimized for Snapdragon platforms, ensuring seamless integration and optimal performance.

Model Stats

Here are some key statistics about the Mistral-3B model:

Model StatValue
Model TypeText generation
Input sequence length for Prompt Processor128
Max context length4096
Num of key-value heads8
Number of parameters3B
Precisionw4a16 + w8a16 (few layers)

Performance

The Mistral-3B model delivers impressive performance on various devices. Here are some benchmarks:

DeviceChipsetTarget RuntimeResponse Rate (tokens per second)Time To First Token (range, seconds)
Snapdragon 8 Elite QRDSnapdragon 8 EliteQNN21.050.092289 - 2.9532736

How it Works

  1. Initiate Conversation: Start by using the prompt processor to initiate a conversation.
  2. Token Generation: For subsequent iterations, use the token generator to generate responses.
Examples
Summarize the key features of the Mistral-3B model. Mistral-3B is a large language model optimized for mobile deployment, with 3B parameters, supporting English language, and a maximum context length of 4096 tokens.
What is the minimum QNN SDK version required to run the Mistral-3B model? The minimum QNN SDK version required is 2.27.7.
Can I use the Mistral-3B model for law enforcement purposes? No, the model may not be used for law enforcement or any other application that involves accessing essential private and public services and benefits, administration of justice and democratic processes, etc.

Limitations

Please note that this model should not be used for certain applications, such as law enforcement, biometric systems, or social scoring. Make sure to review the usage and limitations before deploying the model.

Device Support

The Mistral-3B model is supported on a variety of devices, including those with Snapdragon 8 Elite chipsets.

Important Details

  • Supported Languages: English is the only supported language for this model.
  • Minimum QNN SDK Version: You’ll need version 2.27.7 or higher to use this model.
  • Device Compatibility: Mistral-3B is designed to work with Snapdragon 8 Elite QRD devices.

Getting Started

To deploy the Mistral-3B model on-device, follow the LLM on-device deployment tutorial. Join our AI Hub Slack community to collaborate, post questions, and learn more about on-device AI.

Dataloop's AI Development Platform
Build end-to-end workflows

Build end-to-end workflows

Dataloop is a complete AI development stack, allowing you to make data, elements, models and human feedback work together easily.

  • Use one centralized tool for every step of the AI development process.
  • Import data from external blob storage, internal file system storage or public datasets.
  • Connect to external applications using a REST API & a Python SDK.
Save, share, reuse

Save, share, reuse

Every single pipeline can be cloned, edited and reused by other data professionals in the organization. Never build the same thing twice.

  • Use existing, pre-created pipelines for RAG, RLHF, RLAF, Active Learning & more.
  • Deploy multi-modal pipelines with one click across multiple cloud resources.
  • Use versions for your pipelines to make sure the deployed pipeline is the stable one.
Easily manage pipelines

Easily manage pipelines

Spend less time dealing with the logistics of owning multiple data pipelines, and get back to building great AI applications.

  • Easy visualization of the data flow through the pipeline.
  • Identify & troubleshoot issues with clear, node-based error messages.
  • Use scalable AI infrastructure that can grow to support massive amounts of data.