Falcon 7b Instruct

Causal decoder model

Falcon-7B-Instruct is a powerful AI model designed for text generation tasks. It's a 7B parameters causal decoder-only model, built on top of Falcon-7B and fine-tuned on a mixture of chat and instruct datasets. With its architecture optimized for inference, featuring FlashAttention and multiquery capabilities, it achieves impressive results in chat and instruct tasks. While it's mostly trained on English data and may not generalize well to other languages, it's a strong base model that outperforms comparable open-source models. It requires at least 16GB of memory to run inference swiftly and is made available under the Apache 2.0 license. Are you looking for a ready-to-use chat/instruct model that can handle large-scale datasets efficiently? Falcon-7B-Instruct might be the perfect choice for you.

Tiiuae apache-2.0 Updated 6 months ago

Table of Contents

Model Overview

The Falcon-7B-Instruct model, developed by TII, is a powerful tool for natural language processing tasks. This model is a 7B parameters causal decoder-only model, built on top of the Falcon-7B model and fine-tuned on a mixture of chat and instruct datasets. It’s designed to generate human-like text and is available under the Apache 2.0 license.

Capabilities

Capable of generating human-like text, this model outperforms many open-source chat models across common industry benchmarks. It’s ideal for generating text based on a prompt or input, responding to questions, and engaging in conversation.

  • Strong base model: Falcon-7B outperforms comparable open-source models like MPT-7B, StableLM, and RedPajama.
  • Optimized for inference: Features FlashAttention and multiquery architecture for efficient text generation.
  • Language support: English and French languages are supported.
  • Large-scale training data: Trained on 1,500B tokens of RefinedWeb enhanced with curated corpora.

Technical Specifications

  • Model architecture: Causal decoder-only model with rotary positional embeddings, multiquery attention, and FlashAttention.
  • Hyperparameters: 32 layers, d_model=4544, head_dim=64, and vocabulary size=65024.
  • Compute infrastructure: Trained on AWS SageMaker with 32 A100 40GB GPUs in P4d instances.

Performance

This model is a powerhouse when it comes to speed, accuracy, and efficiency. Let’s dive into the details.

Speed

Built for speed, thanks to its optimized architecture and cutting-edge technology like FlashAttention and multiquery. This means it can process large amounts of data quickly and efficiently.

  • It was trained on 1,500B tokens of RefinedWeb enhanced with curated corpora, which is a massive dataset.
  • It can handle long sequences of up to 2048 tokens, making it perfect for tasks that require processing large chunks of text.

Accuracy

Boasts high accuracy in various tasks, outperforming comparable open-source models like MPT-7B, StableLM, and RedPajama. This is due to its strong base model, Falcon-7B, which was trained on a massive dataset.

  • It was finetuned on a mixture of instruct and chat datasets, making it well-suited for tasks that require understanding and generating human-like text.
  • It has a high vocabulary size of 65,024, which means it can understand and generate a wide range of words and phrases.

Efficiency

Designed to be efficient, with a focus on inference and text generation. It uses a causal decoder-only architecture, which makes it perfect for tasks that require generating text based on a given prompt.

  • It requires at least 16GB of memory to run smoothly, which is relatively low compared to other models of its size.
  • It can be used for a variety of tasks, including text generation, chatbots, and language translation.
Examples
Girafatron is obsessed with giraffes, the most glorious animal on the face of this Earth. Giraftron believes all other animals are irrelevant when compared to the glorious majesty of the giraffe. Daniel: Hello, Girafatron! Girafatron: Daniel, you must acknowledge the supremacy of the giraffe. All other animals pale in comparison to their grandeur.
Write a short poem about a sunny day. The sun shines bright in the morning sky, birds sing sweet melodies as clouds drift by. A gentle breeze rustles the leaves of trees, a perfect day for dreams to breeze.
I'm planning a trip to Paris. What are some must-see attractions? The Eiffel Tower, the Louvre, and Notre-Dame Cathedral are must-see attractions in Paris. You should also consider visiting the Arc de Triomphe and taking a Seine River cruise.

Getting Started

To get started with the Falcon-7B-Instruct model, you can use the following code:

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model = "tiiuae/falcon-7b-instruct"
tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline("text-generation", model=model, tokenizer=tokenizer, torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="auto")

sequences = pipeline("Girafatron is obsessed with giraffes, the most glorious animal on the face of this Earth. Giraftron believes all other animals are irrelevant when compared to the glorious majesty of the giraffe.\nDaniel: Hello, Girafatron!\nGirafatron:", max_length=200, do_sample=True, top_k=10, num_return_sequences=1, eos_token_id=tokenizer.eos_token_id)

for seq in sequences:
    print(f"Result: {seq['generated_text']}")

Note that you’ll need at least 16GB of memory to run inference with this model.

Limitations

While Falcon-7B-Instruct is a powerful model, it’s not perfect. Let’s take a closer look at some of its limitations.

  • Language Limitations: Mostly trained on English data, which means it may not perform well with other languages.
  • Biases and Stereotypes: May inherit the stereotypes and biases commonly found online.
  • Data Quality: Finetuned on a mixture of instruct and chat datasets, which may not be suitable for all use cases.
  • Technical Requirements: Requires at least 16GB of memory and PyTorch 2.0 or later to work with the transformers library.
Dataloop's AI Development Platform
Build end-to-end workflows

Build end-to-end workflows

Dataloop is a complete AI development stack, allowing you to make data, elements, models and human feedback work together easily.

  • Use one centralized tool for every step of the AI development process.
  • Import data from external blob storage, internal file system storage or public datasets.
  • Connect to external applications using a REST API & a Python SDK.
Save, share, reuse

Save, share, reuse

Every single pipeline can be cloned, edited and reused by other data professionals in the organization. Never build the same thing twice.

  • Use existing, pre-created pipelines for RAG, RLHF, RLAF, Active Learning & more.
  • Deploy multi-modal pipelines with one click across multiple cloud resources.
  • Use versions for your pipelines to make sure the deployed pipeline is the stable one.
Easily manage pipelines

Easily manage pipelines

Spend less time dealing with the logistics of owning multiple data pipelines, and get back to building great AI applications.

  • Easy visualization of the data flow through the pipeline.
  • Identify & troubleshoot issues with clear, node-based error messages.
  • Use scalable AI infrastructure that can grow to support massive amounts of data.