Wmt23 Cometkiwi Da Xxl

Reference-free MT evaluation

Meet Wmt23 Cometkiwi Da Xxl, a powerful AI model designed for reference-free machine translation evaluation. With 10.5 billion parameters, this model is built on top of XLM-R XXL and requires a minimum of 44GB of GPU memory. It's capable of evaluating translations in over 100 languages, including many underrepresented languages. Given a source text and its translation, the model outputs a single score between 0 and 1, where 1 represents a perfect translation and 0 a random translation. This model is a game-changer for those looking to assess translation quality efficiently and accurately. But keep in mind, results for language pairs containing uncovered languages may be unreliable.

Unbabel cc-by-nc-sa-4.0 Updated a month ago

Table of Contents

Model Overview

The Current Model is a powerful tool for evaluating machine translations. But what makes it special? Let’s take a closer look.

Key Attributes

  • Large and Powerful: This model has 10.5 billion parameters, making it a robust tool for evaluating translations.
  • Memory-Intensive: It requires a minimum of 44GB of GPU memory to run, so make sure you have a powerful machine to handle it.

How Does it Work?

The model takes two inputs:

  • The original source text
  • The translated text

It then outputs a single score between 0 and 1, where:

  • 1 represents a perfect translation
  • 0 represents a random translation

This score gives you a clear idea of how good the translation is.

What Can It Do?

This model is designed for reference-free MT evaluation. What does that mean? It can evaluate the quality of a machine translation without needing a reference translation.

Strengths

The Current Model has several strengths that make it a valuable tool for machine translation evaluation:

  • Large Language Coverage: The model is built on top of XLM-R XXL, which covers a wide range of languages, including many languages that are not commonly supported by other models.
  • High Accuracy: The model has been trained on a large dataset and has achieved high accuracy in evaluating machine translations.

Languages Covered

This model supports a wide range of languages, including:

LanguageSupported
Afrikaans
Albanian
Amharic
Arabic

Note that if you try to use this model with language pairs that contain uncovered languages, the results may be unreliable.

Performance

But how well does the Current Model do its job? Let’s dive into its performance.

Speed

The model is built on top of XLM-R XXL, which has 10.5 billion parameters. This means it requires a lot of computing power to run, needing at least 44GB of GPU memory. However, this power allows it to process large amounts of text quickly.

Accuracy

When evaluating the quality of a machine translation, the Current Model outputs a score between 0 and 1, where 1 is a perfect translation and 0 is a random translation. This score gives you a clear idea of how good the translation is.

Efficiency

So, how efficient is the Current Model? It’s designed to be used for reference-free MT evaluation, which means it doesn’t need a reference translation to evaluate the quality of a machine translation. This makes it much more efficient than models that require a reference translation.

Examples

Let’s take a look at some examples of how the Current Model can be used:

Language PairExample TranslationScore
English-Spanish”Hello, how are you?” -> “Hola, ¿cómo estás?”0.9
French-German”Bonjour, comment allez-vous?” -> “Guten Tag, wie geht es Ihnen?”0.8
Chinese-English"" -> “Hello, I’m fine, thank you.”0.7

As you can see, the Current Model is able to evaluate the quality of machine translations with high accuracy, even for languages that are very different from each other.

Limitations

While the Current Model is a powerful tool for reference-free MT evaluation, it’s not perfect. Let’s take a closer look at some of its limitations.

Language Coverage

While the Current Model covers a wide range of languages, it’s not exhaustive. If you’re working with language pairs that include uncovered languages, the results might not be reliable.

GPU Memory Requirements

The Current Model requires a minimum of 44GB of GPU memory to function properly. This can be a challenge if you’re working with limited resources or older hardware.

Format

The Current Model is a machine learning model that uses the XLM-R XXL architecture. This means it has a massive 10.5 billion parameters and needs at least 44GB of GPU memory to run.

Architecture

This model is based on the XLM-R XXL architecture, which is a type of transformer model. It’s designed to handle multiple languages and tasks.

Data Formats

The Current Model accepts input in the form of JSON data, with the following structure:

{
  "src": "source text",
  "mt": "machine translation"
}

For example:

[
  {
    "src": "The output signal provides constant sync so the display never glitches.",
    "mt": "Das Ausgangssignal bietet eine konstante Synchronisation, so dass die Anzeige nie stört."
  },
  {
    "src": "Kroužek ilustrace je určen všem milovníkům umění ve věku od 10 do 15 let.",
    "mt": "Кільце ілюстрації призначене для всіх любителів мистецтва у віці від 10 до 15 років."
  }
]

Input Requirements

To use this model, you’ll need to provide a source text and its machine translation. The model will then output a score between 0 and 1, where 1 represents a perfect translation and 0 a random translation.

Output

The model’s output is a single score between 0 and 1. For example:

model_output = model.predict(data, batch_size=8, gpus=1)
print(model_output)

Using the Model

To use this model, you’ll need to install the unbabel-comet library and download the model using the following command:

pip install --upgrade pip
pip install "unbabel-comet>=2.1.0"
comet-score -s {source-input}.txt -t {translation-output}.txt --model Unbabel/wmt23-cometkiwi-da-xxl

Alternatively, you can use the model in Python:

from comet import download_model, load_from_checkpoint
model_path = download_model("Unbabel/wmt23-cometkiwi-da-xxl")
model = load_from_checkpoint(model_path)
data = [...]  # your input data
model_output = model.predict(data, batch_size=8, gpus=1)
print(model_output)
Dataloop's AI Development Platform
Build end-to-end workflows

Build end-to-end workflows

Dataloop is a complete AI development stack, allowing you to make data, elements, models and human feedback work together easily.

  • Use one centralized tool for every step of the AI development process.
  • Import data from external blob storage, internal file system storage or public datasets.
  • Connect to external applications using a REST API & a Python SDK.
Save, share, reuse

Save, share, reuse

Every single pipeline can be cloned, edited and reused by other data professionals in the organization. Never build the same thing twice.

  • Use existing, pre-created pipelines for RAG, RLHF, RLAF, Active Learning & more.
  • Deploy multi-modal pipelines with one click across multiple cloud resources.
  • Use versions for your pipelines to make sure the deployed pipeline is the stable one.
Easily manage pipelines

Easily manage pipelines

Spend less time dealing with the logistics of owning multiple data pipelines, and get back to building great AI applications.

  • Easy visualization of the data flow through the pipeline.
  • Identify & troubleshoot issues with clear, node-based error messages.
  • Use scalable AI infrastructure that can grow to support massive amounts of data.