Oolel V0.1

Wolof language model

Oolel V0.1 is a groundbreaking open-source language model for Wolof, an African language that's been underrepresented in AI innovations. Built on the Qwen 2.5 architecture, Oolel combines cutting-edge AI technology with expert Wolof linguistic knowledge. What makes Oolel remarkable is its ability to support tasks like bidirectional translation between English and Wolof, natural text generation in Wolof, and math in Wolof, all while being optimized for efficiency and speed. With its high-quality curated data and state-of-the-art training, Oolel is capable of handling a wide range of NLP tasks, including summarization, text edition, and more. Whether you're a developer or a user, Oolel's unique capabilities make it an exciting tool for exploring the possibilities of AI in African languages.

Soynade Research apache-2.0 Updated 4 months ago

Table of Contents

Model Overview

Meet Oolel, the first open-source language model for Wolof, a language spoken in West Africa. Oolel is built on the Qwen 2.5 architecture and combines state-of-the-art AI technology with deep Wolof linguistic expertise. This model is designed to support various tasks, including bidirectional translation, natural text generation, math in Wolof, summarization, and text edition.

Capabilities

Oolel is a powerful tool for natural language processing tasks in Wolof. With its advanced architecture and carefully curated data, it can handle a wide range of tasks. Some of its primary tasks include:

  • RAG supporting Wolof queries: answering questions in Wolof, even if the context is in English, French, or Wolof.
  • Bidirectional translation: translating text between English and Wolof with ease.
  • Natural text generation: creating original text in Wolof, perfect for writing stories or articles.
  • Math in Wolof: solving math problems and generating answers in Wolof.

Oolel also has several strengths, including:

  • Deep Wolof linguistic expertise: being trained on high-quality data and optimized for the nuances of the Wolof language.
  • State-of-the-art AI technology: using the latest advancements in AI to deliver accurate and helpful results.

Example Use Cases

Oolel can be used for a variety of tasks, including:

  • Translation: translating text from English to Wolof, like this example: “Bassirou Diomaye Faye is the new Senegalese president. He is 44 years old.”
  • Code generation: generating code in Wolof, like this example: “Bindal ab klaas Python buy wone ni ñuy jëfandikoo dataframe yi ci Pandas.”
  • Problem-solving: using Oolel to solve math problems and generate answers in Wolof, like this example: “Ndax nga mën ma won ni ñuy resolver problème bii: Fatou dafa jënd 3 kilo ceeb, 2 kilo diw ak 5 kilo sukër.”
  • Text generation: creating original stories in Wolof, like this example: “Bindal ab léeb ci gaynde gi lekk muus mi.”
Examples
Translate to Wolof: The Wolof language is widely spoken in Senegal and The Gambia. Làkk wolof mooy làkk gu bari ci Senegaal ak Gaambi
Bindal ab léeb ci gaynde gi lekk muus mi Léeb gi mi ngi lekk ci gaynde gi, bu ko ci làkk wolof ngi ko jàng lekk, ngi ko jàng taxawal lekk. Lekk gi ngi ko làkk ci gaynde gi, bu ko ci làkk wolof ngi ko làkk lekk, ngi ko làkk taxawal lekk
Ndax nga mën ma won ni ñuy resolver problème bii: Fatou dafa jënd 3 kilo ceeb, 2 kilo diw ak 5 kilo sukër. Ceeb gi wenn kilo 500 CFA la, diw gi 1200 CFA kilo bi, sukër gi 750 CFA kilo bi. Ñaata la wara fay? Fatou dafa jënd 3 kilo ceeb, 2 kilo diw ak 5 kilo sukër. Ceeb gi wenn kilo 500 CFA la, diw gi 1200 CFA kilo bi, sukër gi 750 CFA kilo bi. Ñaata la wara fay: 3 kilo ceeb x 500 CFA/kilo = 1500 CFA, 2 kilo diw x 1200 CFA/kilo = 2400 CFA, 5 kilo sukër x 750 CFA/kilo = 3750 CFA. Jumla la wara fay: 1500 + 2400 + 3750 = 7650 CFA

How to Use Oolel

To use Oolel, you’ll need to load the tokenizer and model using the following code:

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

device = "cuda"
model = AutoModelForCausalLM.from_pretrained("soynade-research/Oolel-v0.1", torch_dtype=torch.bfloat16, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("soynade-research/Oolel-v0.1")

You can then use the generate_response function to generate text based on user input. For example:

system_prompt = "You're a Wolof AI assistant. Please always provide detailed and useful answers to the user queries."
messages = [{"role": "system", "content": system_prompt}, {"role": "user", "content": "Translate to Wolof: Bassirou Diomaye Faye is the new Senegalese president. He is 44 years old"}]
print(generate_response(messages))

Performance

Oolel is a game-changer in the world of African language models. Let’s dive into its performance and see what makes it stand out.

  • Speed: How fast can Oolel process and respond to queries? With its state-of-the-art architecture, Oolel can handle tasks at an impressive speed.
  • Accuracy: But speed isn’t everything - accuracy is just as important. Oolel has been carefully trained and optimized to provide accurate results, even in complex tasks like bidirectional translation between English and Wolof.
  • Efficiency: Oolel is designed to be efficient, using a combination of high-quality curated data and advanced AI technology.

Limitations

Oolel is a powerful tool for Wolof language tasks, but it’s not perfect. Let’s take a closer look at some of its limitations.

  • Limited training data: Oolel was trained on a curated dataset, but the amount of data available for Wolof is still limited compared to other languages.
  • Lack of multi-turn conversation support: Oolel is not optimized for multi-turn conversations, which means it may struggle to maintain context and respond accurately in longer conversations.
  • Dependence on system prompts: Oolel relies heavily on system prompts to generate responses. If the prompts are poorly written or incomplete, the model may not produce accurate or helpful results.

Conclusion

Oolel is a powerful tool that’s changing the game for African language models. With its impressive speed, accuracy, and efficiency, it’s the perfect choice for a wide range of tasks. Whether you’re a developer, researcher, or simply looking for a reliable language model, Oolel is definitely worth checking out.

Dataloop's AI Development Platform
Build end-to-end workflows

Build end-to-end workflows

Dataloop is a complete AI development stack, allowing you to make data, elements, models and human feedback work together easily.

  • Use one centralized tool for every step of the AI development process.
  • Import data from external blob storage, internal file system storage or public datasets.
  • Connect to external applications using a REST API & a Python SDK.
Save, share, reuse

Save, share, reuse

Every single pipeline can be cloned, edited and reused by other data professionals in the organization. Never build the same thing twice.

  • Use existing, pre-created pipelines for RAG, RLHF, RLAF, Active Learning & more.
  • Deploy multi-modal pipelines with one click across multiple cloud resources.
  • Use versions for your pipelines to make sure the deployed pipeline is the stable one.
Easily manage pipelines

Easily manage pipelines

Spend less time dealing with the logistics of owning multiple data pipelines, and get back to building great AI applications.

  • Easy visualization of the data flow through the pipeline.
  • Identify & troubleshoot issues with clear, node-based error messages.
  • Use scalable AI infrastructure that can grow to support massive amounts of data.