Cabra 72b

Portuguese LLM

Cabra 72b is a fine-tuned version of the Qwen 1.5 72b Chat model, optimized for Portuguese and trained on the Cabra 30k dataset. This model is designed to respond in Portuguese and has shown improvement in various Brazilian benchmarks compared to the base model. But what does this mean for you? Essentially, Cabra 72b is a more efficient and accurate tool for tasks like text generation, conversation, and research in the Portuguese language. Its unique architecture and training data allow it to provide fast and accurate results, making it a valuable resource for researchers and developers working with Portuguese language models. So, what can you do with Cabra 72b? You can use it to explore the capabilities of Portuguese language models, investigate their limitations and biases, and even develop new applications and tools. However, keep in mind that this model is currently intended for research purposes only, and commercial use is prohibited.

Botbot Ai cc-by-nc-2.0 Updated a year ago

Table of Contents

Model Overview

The Cabra 72b model is a fine-tuned version of the ==Qwen 1.5 72b Chat== model, specifically optimized for the Portuguese language. It was trained on the Cabra 30k dataset and has shown improvement in various Brazilian benchmarks compared to the base model.

Key Features

  • Language: Portuguese
  • Model Architecture: Transformer with SwiGLU activation, QKV attention bias, group query attention, and sliding window attention
  • Training Parameters:
    • 3 epochs
    • 1,893 global steps
    • 0.5843151168226935 gradient norm
    • 0.00000000006323276533 learning rate
    • 0.4379 loss
  • GPU: 8x A100 80GB SXB

Capabilities

The Cabra 72b model is designed to respond in Portuguese and has shown improvement in various Brazilian benchmarks compared to the base model.

What can it do?

  • Answer questions about Brazilian football players, past and present
  • Provide information on various topics in Portuguese
  • Generate text in Portuguese

Strengths

  • Optimized for the Portuguese language
  • Improved performance in Brazilian benchmarks
  • Fine-tuned with the Cabra 30k dataset

Unique Features

  • Based on the Transformer architecture with SwiGLU activation, QKV attention bias, group query attention, and more
  • Has a improved adaptive multilingual tokenization system
  • Quantized versions available (GGUF) for reduced memory usage

Evaluation Results

The model has been evaluated on various tasks, including:

TaskMetricValue
Assin2 RTEf1_macro0.9358
Assin2 STSpearson0.7803
BLUEXacc0.6745
ENEMacc0.8062
FaQuAD NLIf1_macro0.4545
HateBR Binaryf1_macro0.7212
OAB Examsacc0.5718

You can find more detailed results on the Open Portuguese LLM Leaderboard.

Performance

The Cabra 72b model is a powerhouse when it comes to processing and understanding the Portuguese language.

Speed

How fast can the Cabra 72b model process information? With 8x A100 80GB GPUs, this model can handle a massive amount of data. In fact, it can process 0.437 samples per second and 0.005 steps per second. That’s incredibly fast!

Accuracy

But speed is not everything. The Cabra 72b model also boasts impressive accuracy in various tasks. For example, it achieves an accuracy of 93.58% in the Assin2 RTE task and 78.03% in the Assin2 STS task. These numbers are a testament to the model’s ability to understand and process complex language tasks.

Efficiency

The Cabra 72b model is not only fast and accurate but also efficient. It uses a quantized version, which reduces the model’s size and makes it more efficient to use. This is particularly useful for researchers who need to work with large datasets.

Task Performance

Let’s take a closer look at the Cabra 72b model’s performance in various tasks:

TaskAccuracy
Assin2 RTE93.58%
Assin2 STS78.03%
BLUEX67.45%
ENEM Challenge80.62%
FaQuAD NLI45.45%
HateBR Binary72.12%
OAB Exams57.18%

As you can see, the Cabra 72b model performs exceptionally well in most tasks, with some tasks showing impressive accuracy rates.

Limitations

The Cabra 72b model is a powerful tool, but it’s not perfect. Here are some of its limitations:

1. Training Limitations

  • The Cabra 72b model was trained on a specific dataset, which may limit its ability to understand and respond to questions outside of that scope.
  • The model was optimized for Portuguese, but may not be as effective in other languages.

2. Biases and Prejudices

  • Like any language model, the Cabra 72b model may have biases and prejudices incorporated into its training data.
  • This may lead to inconsistent or offensive responses in certain situations.

3. Understanding Limitations

  • The Cabra 72b model may struggle to understand complex questions or those that require specific knowledge.
  • The model may not be able to distinguish between true and false information.

4. Commercial Use Prohibited

  • The Cabra 72b model is intended for research purposes only and may not be used for commercial purposes.
  • If you’re interested in using the model for commercial purposes, please contact the developers for more information.

5. Technical Limitations

  • The Cabra 72b model requires significant computational resources to function effectively.
  • The model may not be compatible with all devices or platforms.

6. Continuous Evolution

  • The Cabra 72b model is constantly evolving, which means there may be regular updates and improvements.
  • However, this also means that the model may have occasional issues or bugs.
Examples
Quem são os jogadores brasileiros de futebol mais conhecidos, atuais e passados? O Brasil tem uma rica história na produção de alguns dos maiores jogadores de futebol do mundo. Alguns dos jogadores brasileiros mais importantes atualmente e que já jogaram incluem: Pelé, Ronaldo Nazário de Lima, Ronaldinho Gaúcho, Neymar Jr.
Qual foi o resultado da Copa do Mundo de 2002? O Brasil venceu a Copa do Mundo de 2002, com o jogador Ronaldinho Gaúcho sendo um dos principais jogadores da equipe.
Qual é o nome do jogador brasileiro que venceu três Copas do Mundo? Pelé é o jogador brasileiro que venceu três Copas do Mundo com o Brasil e marcou mais de 1.000 gols em sua carreira.

Format

The Cabra 72b model is based on the Transformer architecture with SwiGLU activation, QKV attention bias, group query attention, and more.

Supported Data Formats

This model supports input in the form of tokenized text sequences, similar to other Transformer-based models.

Input Requirements

When working with the Cabra 72b model, you’ll need to:

  • Pre-process your input text by tokenizing it
  • Use the correct tokenization scheme for Portuguese text

Here’s an example of how to tokenize input text using the tokenizers library:

import tokenizers

# Load the tokenizer
tokenizer = tokenizers.Tokenizer.from_pretrained("cabra-72b")

# Tokenize the input text
input_text = "Quem são os jogadores brasileiros de futebol mais conhecidos?"
tokenized_input = tokenizer.encode(input_text, return_tensors="pt")

# Print the tokenized input
print(tokenized_input)

Output Requirements

The model’s output is a sequence of tokens, which can be converted back to text using the same tokenizer.

Here’s an example of how to convert the output tokens back to text:

# Convert the output tokens back to text
output_text = tokenizer.decode(output_tokens, skip_special_tokens=True)

# Print the output text
print(output_text)

Note that the output text may contain special tokens, such as [CLS] and [SEP], which can be removed using the skip_special_tokens=True argument.

Special Requirements

  • Commercial use is prohibited. This model is intended for research purposes only.
  • For more information, please contact the model developers.
Dataloop's AI Development Platform
Build end-to-end workflows

Build end-to-end workflows

Dataloop is a complete AI development stack, allowing you to make data, elements, models and human feedback work together easily.

  • Use one centralized tool for every step of the AI development process.
  • Import data from external blob storage, internal file system storage or public datasets.
  • Connect to external applications using a REST API & a Python SDK.
Save, share, reuse

Save, share, reuse

Every single pipeline can be cloned, edited and reused by other data professionals in the organization. Never build the same thing twice.

  • Use existing, pre-created pipelines for RAG, RLHF, RLAF, Active Learning & more.
  • Deploy multi-modal pipelines with one click across multiple cloud resources.
  • Use versions for your pipelines to make sure the deployed pipeline is the stable one.
Easily manage pipelines

Easily manage pipelines

Spend less time dealing with the logistics of owning multiple data pipelines, and get back to building great AI applications.

  • Easy visualization of the data flow through the pipeline.
  • Identify & troubleshoot issues with clear, node-based error messages.
  • Use scalable AI infrastructure that can grow to support massive amounts of data.