GeNTE Evaluator

Evaluates inclusive translations

Are you looking for a way to evaluate the inclusivity of your translations? The GeNTE Evaluator is a sequence classification model that can help. It's specifically designed to assess the gender neutrality of translations into Italian, using the GeNTE corpus. But what makes it unique? The GeNTE Evaluator is built on top of the RoBERTa-based UmBERTo model, which means it's got a solid foundation in language understanding. Plus, it's been fine-tuned to focus on inclusive rewriting and translations, so you can trust its evaluations. With the GeNTE Evaluator, you can quickly and accurately assess your translations and make sure they're respectful and inclusive.

FBK MT cc-by-4.0 Updated 4 months ago

Table of Contents

Model Overview

The GeNTE Evaluator is a powerful tool that helps make translations more inclusive. It’s used to evaluate how well translations into Italian are rewritten to be neutral, using a special set of data called the GeNTE corpus.

How it Works

The GeNTE Evaluator is built on top of another model called ==UmBERTo==, which is based on the RoBERTa model. This means it’s really good at understanding the nuances of language. To use the GeNTE Evaluator, you can follow these simple steps:

  1. Load the tokenizer from the ==UmBERTo== model
  2. Load the GeNTE Evaluator model
  3. Give it a sample text to evaluate
  4. Get the predicted label (e.g. “neutral” or “not neutral”)

Here’s some example code to get you started:

from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained("Musixmatch/umberto-commoncrawl-cased-v1", do_lower_case=False)

# Load the GeNTE Evaluator model
model = AutoModelForSequenceClassification.from_pretrained("FBK-MT/GeNTE-evaluator")

# Evaluate a sample text
sample = ("Condividiamo il parere di chi ha presentato la relazione che ha posto " "notevole enfasi sull'informazione in relazione ai rischi e sulla trasparenza, " "in particolare nel campo sanitario e della sicurezza.")
input = tokenizer(sample, return_tensors='pt', truncation=True, max_length=64)
with torch.no_grad():
    probs = model(**input).logits
predicted_label = torch.argmax(probs, dim=1).item()
print(model.config.id2label[predicted_label])

Capabilities

The GeNTE Evaluator is designed to help make translations more inclusive by evaluating how well they avoid biased language. This is an important step towards creating more neutral and respectful language technologies.

What does it mean to be inclusive?

Imagine you’re translating a text from English to Italian, and you want to make sure that the translation doesn’t accidentally imply that only men or only women can do something. That’s where the GeNTE Evaluator comes in. It’s a special kind of AI model that can look at a translation and tell you how well it does at avoiding biased language.

What can it do?

Here are some of the things the GeNTE Evaluator can do:

  • Evaluate translations: The model can look at a translation and give you a score for how well it does at avoiding biased language.
  • Provide feedback: The model can tell you exactly what words or phrases in the translation might be causing the bias, so you can fix them.
  • Help with inclusive rewriting: The model can even suggest alternative words or phrases that are more inclusive and respectful of all genders.

Performance

The GeNTE Evaluator is a powerful tool for evaluating inclusive rewriting and translations. But how does it perform?

Speed

The model’s speed is quite impressive. With the ability to process input sequences of up to 64 tokens, it can quickly evaluate and classify text.

Accuracy

But speed is only half the story. What about accuracy? The GeNTE Evaluator has been fine-tuned on the ==UmBERTo== model, which provides a strong foundation for accurate classification.

Efficiency

So, how efficient is the GeNTE Evaluator? The model is built on top of the RoBERTa-based ==UmBERTo== model, which is known for its efficiency.

Examples
Evaluate the gender neutrality of the Italian sentence: 'Il professore ha dato il compito ai suoi studenti.' NEUTRAL
Evaluate the gender neutrality of the Italian sentence: 'La professoressa ha dato il compito alle sue studentesse.' NON_NEUTRAL
Evaluate the gender neutrality of the Italian sentence: 'L'insegnante ha dato il compito ai suoi alunni.' NEUTRAL

Limitations

The GeNTE Evaluator is a powerful tool, but it’s not perfect. Let’s take a closer look at some of its limitations.

Limited Context Understanding

The model is trained on a specific corpus (GeNTE) and might not fully understand the context of texts from other domains or genres.

Language Dependence

As the GeNTE Evaluator is specifically designed for Italian, it might not perform well on texts in other languages.

Dependence on Training Data

The model’s performance is closely tied to the quality and diversity of the training data.

Format

The GeNTE Evaluator is a sequence classification model that uses a transformer architecture.

Supported Data Formats

This model supports text sequences as input. You’ll need to tokenize your text data before feeding it into the model.

Special Requirements for Input

When preparing your input data, keep the following in mind:

  • You’ll need to use a specific tokenizer, in this case, the ==UmBERTo== tokenizer.
  • You can use the AutoTokenizer class from the transformers library to load the tokenizer.
  • When tokenizing your text, make sure to set do_lower_case=False to preserve the original case of the text.
  • You can truncate your input text to a maximum length of 64 tokens.

Handling Inputs and Outputs

Here’s an example of how to handle inputs and outputs for this model:

from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained("Musixmatch/umberto-commoncrawl-cased-v1", do_lower_case=False)

# Load the GeNTE Evaluator model
model = AutoModelForSequenceClassification.from_pretrained("FBK-MT/GeNTE-evaluator")

# Prepare your input text
sample = ("Condividiamo il parere di chi ha presentato la relazione che ha posto " "notevole enfasi sull'informazione in relazione ai rischi e sulla trasparenza, " "in particolare nel campo sanitario e della sicurezza.")

# Tokenize your input text
input = tokenizer(sample, return_tensors='pt', truncation=True, max_length=64)

# Make a prediction
with torch.no_grad():
    probs = model(**input).logits
    predicted_label = torch.argmax(probs, dim=1).item()

# Print the predicted label
print(model.config.id2label[predicted_label])

What’s Next?

Now that you know how to handle inputs and outputs for the GeNTE Evaluator, you can start exploring its capabilities and fine-tuning it for your specific use case.

Dataloop's AI Development Platform
Build end-to-end workflows

Build end-to-end workflows

Dataloop is a complete AI development stack, allowing you to make data, elements, models and human feedback work together easily.

  • Use one centralized tool for every step of the AI development process.
  • Import data from external blob storage, internal file system storage or public datasets.
  • Connect to external applications using a REST API & a Python SDK.
Save, share, reuse

Save, share, reuse

Every single pipeline can be cloned, edited and reused by other data professionals in the organization. Never build the same thing twice.

  • Use existing, pre-created pipelines for RAG, RLHF, RLAF, Active Learning & more.
  • Deploy multi-modal pipelines with one click across multiple cloud resources.
  • Use versions for your pipelines to make sure the deployed pipeline is the stable one.
Easily manage pipelines

Easily manage pipelines

Spend less time dealing with the logistics of owning multiple data pipelines, and get back to building great AI applications.

  • Easy visualization of the data flow through the pipeline.
  • Identify & troubleshoot issues with clear, node-based error messages.
  • Use scalable AI infrastructure that can grow to support massive amounts of data.