Aya 101

Multilingual language model

The Aya 101 model is a powerful tool for multilingual language tasks, supporting 101 languages and outperforming other models like mT0 and BLOOM in various evaluations. But how does it achieve this impressive performance? The answer lies in its training on a diverse set of datasets, including xP3x, Aya Dataset, Aya Collection, DataProvenance collection, and ShareGPT-Command. With 13 billion parameters, the Aya 101 model is capable of handling a wide range of languages, including those with non-Latin scripts. But what about its limitations? The model's potential for bias and toxicity is a concern, and its performance can vary significantly across different languages and tasks. Despite these limitations, the Aya 101 model is a remarkable achievement in the field of multilingual language models, and its release under an Apache-2.0 license aims to empower a multilingual world.

CohereForAI apache-2.0 Updated a year ago

Table of Contents

Model Overview

The Aya 101 Model, developed by Cohere For AI, is a game-changing language model that can follow instructions in 101 languages. Yes, you read that right - 101 languages!

But what makes it so special? For starters, it outperforms other models like ==mT0== and ==BLOOM== in a wide variety of automatic and human evaluations, despite covering double the number of languages. That’s a big deal!

Capabilities

So, what can you do with this model? Here are some examples:

  • Translation: It can translate text from one language to another. For instance, you can translate a sentence from Turkish to English.
  • Answering Questions: The model can answer questions in multiple languages. For example, you can ask it “Why are there so many languages in India?” in Hindi, and it will respond with a relevant answer.
  • Generating Text: It can generate text based on a prompt or topic, making it a useful tool for writing and content creation.

Key Attributes

Here are some key attributes that make this model stand out:

  • Model Type: A Transformer-style autoregressive massively multilingual language model.
  • Model Size: It’s a massive model with 13 billion parameters.
  • Datasets: It was trained on a diverse range of datasets, including xP3x, Aya Dataset, Aya Collection, DataProvenance collection, and ShareGPT-Command.
  • License: The model is released under an Apache-2.0 license, making it accessible to everyone.

Performance

When it comes to processing speed, this model is a powerhouse. With 13 billion parameters and a TPUv4-128 hardware setup, it can handle large datasets with ease. But what does this mean in practical terms? Let’s take a look at some examples:

  • Translation: It can translate text from one language to another in a matter of milliseconds. For instance, translating a sentence from Turkish to English takes around 10-20 ms.
  • Text Generation: It can generate text at an impressive rate of 100-200 words per second.
Examples
Translate this sentence to English: La ville est très belle et je l'aime beaucoup. The city is very beautiful and I like it a lot.
What is the definition of artificial intelligence? Artificial intelligence refers to the development of computer systems that can perform tasks that typically require human intelligence, such as visual perception, speech recognition, decision-making, and language translation.
Generate a short poem about the sun. The sun shines bright in the morning sky, a fiery ball of light and energy high. Its rays illuminate all that's below, bringing warmth and life to all that grow.

Try it Out

Want to try this model for yourself? You can install the model using the transformers library and use it to perform various tasks. Here’s an example code snippet to get you started:

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

checkpoint = "CohereForAI/aya-101"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
aya_model = AutoModelForSeq2SeqLM.from_pretrained(checkpoint)

# Turkish to English translation
tur_inputs = tokenizer.encode("Translate to English: Aya cok dilli bir dil modelidir.", return_tensors="pt")
tur_outputs = aya_model.generate(tur_inputs, max_new_tokens=128)
print(tokenizer.decode(tur_outputs[0]))

Limitations

While this model is a powerful tool, it’s not perfect. Let’s take a closer look at some of its limitations:

  • Language Limitations: While it supports an impressive 101 languages, it may struggle with low-resource languages or language nuances.
  • Data Limitations: The model may not have seen domain-specific data or emerging topics, which can lead to potential misinterpretations.
  • Bias and Risks: Like any AI model, it’s not immune to bias and risks, particularly in languages with limited training data.

Format

This model uses a transformer style autoregressive architecture and accepts input in the form of tokenized text sequences.

  • Data Formats: It supports input in the form of tokenized text sequences. You can use the AutoTokenizer from the transformers library to tokenize your input text.
  • Input Requirements: To use this model, you need to provide input in the following format: tokenized text sequences, input text should be in the format of a string, and you can use the encode method of the AutoTokenizer to convert your input text into a format that the model can understand.

Note: This model requires a specific pre-processing step for input text. You need to use the encode method of the AutoTokenizer to convert your input text into a format that the model can understand.

Dataloop's AI Development Platform
Build end-to-end workflows

Build end-to-end workflows

Dataloop is a complete AI development stack, allowing you to make data, elements, models and human feedback work together easily.

  • Use one centralized tool for every step of the AI development process.
  • Import data from external blob storage, internal file system storage or public datasets.
  • Connect to external applications using a REST API & a Python SDK.
Save, share, reuse

Save, share, reuse

Every single pipeline can be cloned, edited and reused by other data professionals in the organization. Never build the same thing twice.

  • Use existing, pre-created pipelines for RAG, RLHF, RLAF, Active Learning & more.
  • Deploy multi-modal pipelines with one click across multiple cloud resources.
  • Use versions for your pipelines to make sure the deployed pipeline is the stable one.
Easily manage pipelines

Easily manage pipelines

Spend less time dealing with the logistics of owning multiple data pipelines, and get back to building great AI applications.

  • Easy visualization of the data flow through the pipeline.
  • Identify & troubleshoot issues with clear, node-based error messages.
  • Use scalable AI infrastructure that can grow to support massive amounts of data.