GeNTE Evaluator
Are you looking for a way to evaluate the inclusivity of your translations? The GeNTE Evaluator is a sequence classification model that can help. It's specifically designed to assess the gender neutrality of translations into Italian, using the GeNTE corpus. But what makes it unique? The GeNTE Evaluator is built on top of the RoBERTa-based UmBERTo model, which means it's got a solid foundation in language understanding. Plus, it's been fine-tuned to focus on inclusive rewriting and translations, so you can trust its evaluations. With the GeNTE Evaluator, you can quickly and accurately assess your translations and make sure they're respectful and inclusive.
Table of Contents
Model Overview
The GeNTE Evaluator is a powerful tool that helps make translations more inclusive. It’s used to evaluate how well translations into Italian are rewritten to be neutral, using a special set of data called the GeNTE corpus.
How it Works
The GeNTE Evaluator is built on top of another model called ==UmBERTo==, which is based on the RoBERTa model. This means it’s really good at understanding the nuances of language. To use the GeNTE Evaluator, you can follow these simple steps:
- Load the tokenizer from the ==UmBERTo== model
- Load the GeNTE Evaluator model
- Give it a sample text to evaluate
- Get the predicted label (e.g. “neutral” or “not neutral”)
Here’s some example code to get you started:
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch
# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained("Musixmatch/umberto-commoncrawl-cased-v1", do_lower_case=False)
# Load the GeNTE Evaluator model
model = AutoModelForSequenceClassification.from_pretrained("FBK-MT/GeNTE-evaluator")
# Evaluate a sample text
sample = ("Condividiamo il parere di chi ha presentato la relazione che ha posto " "notevole enfasi sull'informazione in relazione ai rischi e sulla trasparenza, " "in particolare nel campo sanitario e della sicurezza.")
input = tokenizer(sample, return_tensors='pt', truncation=True, max_length=64)
with torch.no_grad():
probs = model(**input).logits
predicted_label = torch.argmax(probs, dim=1).item()
print(model.config.id2label[predicted_label])
Capabilities
The GeNTE Evaluator is designed to help make translations more inclusive by evaluating how well they avoid biased language. This is an important step towards creating more neutral and respectful language technologies.
What does it mean to be inclusive?
Imagine you’re translating a text from English to Italian, and you want to make sure that the translation doesn’t accidentally imply that only men or only women can do something. That’s where the GeNTE Evaluator comes in. It’s a special kind of AI model that can look at a translation and tell you how well it does at avoiding biased language.
What can it do?
Here are some of the things the GeNTE Evaluator can do:
- Evaluate translations: The model can look at a translation and give you a score for how well it does at avoiding biased language.
- Provide feedback: The model can tell you exactly what words or phrases in the translation might be causing the bias, so you can fix them.
- Help with inclusive rewriting: The model can even suggest alternative words or phrases that are more inclusive and respectful of all genders.
Performance
The GeNTE Evaluator is a powerful tool for evaluating inclusive rewriting and translations. But how does it perform?
Speed
The model’s speed is quite impressive. With the ability to process input sequences of up to 64
tokens, it can quickly evaluate and classify text.
Accuracy
But speed is only half the story. What about accuracy? The GeNTE Evaluator has been fine-tuned on the ==UmBERTo== model, which provides a strong foundation for accurate classification.
Efficiency
So, how efficient is the GeNTE Evaluator? The model is built on top of the RoBERTa-based ==UmBERTo== model, which is known for its efficiency.
Limitations
The GeNTE Evaluator is a powerful tool, but it’s not perfect. Let’s take a closer look at some of its limitations.
Limited Context Understanding
The model is trained on a specific corpus (GeNTE) and might not fully understand the context of texts from other domains or genres.
Language Dependence
As the GeNTE Evaluator is specifically designed for Italian, it might not perform well on texts in other languages.
Dependence on Training Data
The model’s performance is closely tied to the quality and diversity of the training data.
Format
The GeNTE Evaluator is a sequence classification model that uses a transformer architecture.
Supported Data Formats
This model supports text sequences as input. You’ll need to tokenize your text data before feeding it into the model.
Special Requirements for Input
When preparing your input data, keep the following in mind:
- You’ll need to use a specific tokenizer, in this case, the ==UmBERTo== tokenizer.
- You can use the
AutoTokenizer
class from thetransformers
library to load the tokenizer. - When tokenizing your text, make sure to set
do_lower_case=False
to preserve the original case of the text. - You can truncate your input text to a maximum length of 64 tokens.
Handling Inputs and Outputs
Here’s an example of how to handle inputs and outputs for this model:
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch
# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained("Musixmatch/umberto-commoncrawl-cased-v1", do_lower_case=False)
# Load the GeNTE Evaluator model
model = AutoModelForSequenceClassification.from_pretrained("FBK-MT/GeNTE-evaluator")
# Prepare your input text
sample = ("Condividiamo il parere di chi ha presentato la relazione che ha posto " "notevole enfasi sull'informazione in relazione ai rischi e sulla trasparenza, " "in particolare nel campo sanitario e della sicurezza.")
# Tokenize your input text
input = tokenizer(sample, return_tensors='pt', truncation=True, max_length=64)
# Make a prediction
with torch.no_grad():
probs = model(**input).logits
predicted_label = torch.argmax(probs, dim=1).item()
# Print the predicted label
print(model.config.id2label[predicted_label])
What’s Next?
Now that you know how to handle inputs and outputs for the GeNTE Evaluator, you can start exploring its capabilities and fine-tuning it for your specific use case.