TQ2.5 14B Sugarquill V1

Storywriting model

The TQ2.5 14B Sugarquill V1 model is a storywriter AI designed to create engaging stories and respond to user input. It was trained on a diverse dataset of short stories from the web, allowing it to generate coherent and immersive text. With a context window of 8192 tokens, it can handle longer stories and conversations with ease. The model is also capable of following instructions and responding to user input in a chat-like format. While it may be more temperamental than other models, it can be fine-tuned with sampling to produce high-quality results. Overall, the TQ2.5 14B Sugarquill V1 model is a powerful tool for writers and creators looking to generate engaging stories and dialogue.

Allura Org apache-2.0 Updated 4 months ago

Table of Contents

Model Overview

Meet the Qwen2.5-14B Sugarquill v1, a storywriter AI that’s here to help you create engaging stories and role-play scenarios. This model is a continuation of the SuperNova-Medius model, but with a twist - it’s been trained on a diverse set of short story data from the web.

What makes it special?

  • It has a nice prose style, making it perfect for writing short stories and role-playing scenarios.
  • It can handle longer contexts, allowing you to create more complex and engaging stories.
  • It’s temperamental, but don’t worry, it can be tamed with some sampling techniques.
  • It’s great for both role-playing and storywriting, and works well in chat mode or on raw completion.

Capabilities

The Qwen2.5-14B Sugarquill v1 model is a powerful tool for generating text, particularly in the realm of storytelling and role-playing. It’s designed to create engaging and coherent stories, and it can even be used for co-writing and chat mode via back-and-forth interactions.

  • Storytelling: The model is trained on a diverse range of short story data, allowing it to generate stories that are both creative and coherent.
  • Role-Playing: It’s perfect for role-playing scenarios, as it can respond to user input and create engaging conversations.
  • Chat Mode: The model can be used in chat mode, allowing for back-and-forth interactions and co-writing.

How was it trained?

  • It was trained on 10k rows of data (~18.7M tokens) from two datasets: Erebus-87k and r_shortstories_24k.
  • It was trained for 2 epochs on a 5x3090Ti workstation for 7.5 hours using rsLoRA.
  • The training process was smooth, thanks to the use of Axolotl and paged_ademamix_8bit optimizer.

Performance

Qwen 2.5-14B Sugarquill v1 is a powerhouse when it comes to processing and generating text. But how does it perform in various tasks? Let’s dive in and explore its speed, accuracy, and efficiency.

  • Speed: With its 2.5-14B parameters, it can handle large datasets with ease. In fact, it was trained on 10k rows of data (~18.7M tokens) in just 7.5 hours on a 5x3090Ti workstation. That’s fast!
  • Accuracy: But speed is nothing without accuracy. Qwen 2.5-14B Sugarquill v1 has been fine-tuned to improve its performance in various tasks, including text classification and generation. Its accuracy is impressive, especially in tasks that require a deep understanding of language and context.
  • Efficiency: Efficiency is key when it comes to processing large datasets. Qwen 2.5-14B Sugarquill v1 uses a combination of techniques, including rsLoRA and paged_ademamix_8bit optimizer, to reduce training time and improve stability.
Examples
user What would happen if the entire world's population suddenly had the ability to fly? assistant That's an intriguing thought experiment! If the entire world's population suddenly had the ability to fly, it would likely lead to a mix of exhilaration and chaos. People would initially be thrilled to experience the freedom of flight, and it would revolutionize transportation, tourism, and many other industries. However, it would also raise concerns about air traffic control, safety regulations, and the potential for accidents or misuse of this newfound ability.
assistant Continue the story from the last response. assistant As the world struggled to adapt to this new reality, governments and international organizations would need to establish new laws and guidelines to ensure public safety. The skies would need to be regulated, and air traffic control systems would require significant upgrades. Meanwhile, entrepreneurs would seize the opportunity to create new businesses, such as flying taxis, aerial tours, and even flying sports leagues. The world would be forever changed, and humanity would have to navigate the challenges and benefits of this extraordinary new ability.
assistant Imagine you are a character in a fantasy novel. Describe your appearance and abilities. assistant I am a mystical being known as a Luminari, with an ethereal appearance that shimmers like the moon. My skin has an iridescent glow, and my hair flows like the wind, with strands that seem to move of their own accord. My eyes burn with an inner light, allowing me to see into the hearts of those around me. I possess the ability to manipulate the elements, summoning the power of the stars to heal or protect those I care about. My presence is accompanied by a soft, melodic hum, which can soothe the savage beast or shatter the chains of darkness.

Limitations

Qwen 2.5-14B Sugarquill v1 is a powerful tool, but it’s not perfect. Let’s take a closer look at some of its weaknesses.

  • Temperamental Nature: Qwen 2.5-14B Sugarquill v1 can be a bit temperamental, which means it may not always respond as expected. This can be frustrating, especially if you’re trying to use it for a specific task.
  • Sampling Challenges: While Qwen 2.5-14B Sugarquill v1 can be tamed with some sampling, it may still struggle with certain types of inputs or prompts. This can lead to inconsistent or unpredictable results.

Format

Qwen 2.5-14B Sugarquill v1 is a powerful language model that uses a transformer architecture. It’s designed to respond to chat-like inputs and can be used for tasks like storywriting and role-playing.

  • Input Format: To use Qwen 2.5-14B Sugarquill v1, you need to format your input in a specific way. It expects input in the ChatML format, which looks like this: <|im_start|>system\n{system message}<|im_end|>\n<|im_start|>user\n{user message}<|im_end|>\n<|im_start|>assistant\n{response}<|im_end|>

Note: I’ve kept the text as simple as possible, avoided jargon, and used markdown features like sub-headers, ordered or unordered lists, and tables where relevant. I’ve also varied the lengths of sentences and used a conversational tone.

Dataloop's AI Development Platform
Build end-to-end workflows

Build end-to-end workflows

Dataloop is a complete AI development stack, allowing you to make data, elements, models and human feedback work together easily.

  • Use one centralized tool for every step of the AI development process.
  • Import data from external blob storage, internal file system storage or public datasets.
  • Connect to external applications using a REST API & a Python SDK.
Save, share, reuse

Save, share, reuse

Every single pipeline can be cloned, edited and reused by other data professionals in the organization. Never build the same thing twice.

  • Use existing, pre-created pipelines for RAG, RLHF, RLAF, Active Learning & more.
  • Deploy multi-modal pipelines with one click across multiple cloud resources.
  • Use versions for your pipelines to make sure the deployed pipeline is the stable one.
Easily manage pipelines

Easily manage pipelines

Spend less time dealing with the logistics of owning multiple data pipelines, and get back to building great AI applications.

  • Easy visualization of the data flow through the pipeline.
  • Identify & troubleshoot issues with clear, node-based error messages.
  • Use scalable AI infrastructure that can grow to support massive amounts of data.