TQ2.5 14B Sugarquill V1
The TQ2.5 14B Sugarquill V1 model is a storywriter AI designed to create engaging stories and respond to user input. It was trained on a diverse dataset of short stories from the web, allowing it to generate coherent and immersive text. With a context window of 8192 tokens, it can handle longer stories and conversations with ease. The model is also capable of following instructions and responding to user input in a chat-like format. While it may be more temperamental than other models, it can be fine-tuned with sampling to produce high-quality results. Overall, the TQ2.5 14B Sugarquill V1 model is a powerful tool for writers and creators looking to generate engaging stories and dialogue.
Table of Contents
Model Overview
Meet the Qwen2.5-14B Sugarquill v1, a storywriter AI that’s here to help you create engaging stories and role-play scenarios. This model is a continuation of the SuperNova-Medius model, but with a twist - it’s been trained on a diverse set of short story data from the web.
What makes it special?
- It has a nice prose style, making it perfect for writing short stories and role-playing scenarios.
- It can handle longer contexts, allowing you to create more complex and engaging stories.
- It’s temperamental, but don’t worry, it can be tamed with some sampling techniques.
- It’s great for both role-playing and storywriting, and works well in chat mode or on raw completion.
Capabilities
The Qwen2.5-14B Sugarquill v1 model is a powerful tool for generating text, particularly in the realm of storytelling and role-playing. It’s designed to create engaging and coherent stories, and it can even be used for co-writing and chat mode via back-and-forth interactions.
- Storytelling: The model is trained on a diverse range of short story data, allowing it to generate stories that are both creative and coherent.
- Role-Playing: It’s perfect for role-playing scenarios, as it can respond to user input and create engaging conversations.
- Chat Mode: The model can be used in chat mode, allowing for back-and-forth interactions and co-writing.
How was it trained?
- It was trained on 10k rows of data (~18.7M tokens) from two datasets: Erebus-87k and r_shortstories_24k.
- It was trained for 2 epochs on a 5x3090Ti workstation for 7.5 hours using rsLoRA.
- The training process was smooth, thanks to the use of Axolotl and paged_ademamix_8bit optimizer.
Performance
Qwen 2.5-14B Sugarquill v1 is a powerhouse when it comes to processing and generating text. But how does it perform in various tasks? Let’s dive in and explore its speed, accuracy, and efficiency.
- Speed: With its
2.5-14B
parameters, it can handle large datasets with ease. In fact, it was trained on10k
rows of data (~18.7M
tokens) in just7.5
hours on a5x3090Ti
workstation. That’s fast! - Accuracy: But speed is nothing without accuracy. Qwen 2.5-14B Sugarquill v1 has been fine-tuned to improve its performance in various tasks, including text classification and generation. Its accuracy is impressive, especially in tasks that require a deep understanding of language and context.
- Efficiency: Efficiency is key when it comes to processing large datasets. Qwen 2.5-14B Sugarquill v1 uses a combination of techniques, including
rsLoRA
andpaged_ademamix_8bit
optimizer, to reduce training time and improve stability.
Limitations
Qwen 2.5-14B Sugarquill v1 is a powerful tool, but it’s not perfect. Let’s take a closer look at some of its weaknesses.
- Temperamental Nature: Qwen 2.5-14B Sugarquill v1 can be a bit temperamental, which means it may not always respond as expected. This can be frustrating, especially if you’re trying to use it for a specific task.
- Sampling Challenges: While Qwen 2.5-14B Sugarquill v1 can be tamed with some sampling, it may still struggle with certain types of inputs or prompts. This can lead to inconsistent or unpredictable results.
Format
Qwen 2.5-14B Sugarquill v1 is a powerful language model that uses a transformer architecture. It’s designed to respond to chat-like inputs and can be used for tasks like storywriting and role-playing.
- Input Format: To use Qwen 2.5-14B Sugarquill v1, you need to format your input in a specific way. It expects input in the ChatML format, which looks like this:
<|im_start|>system\n{system message}<|im_end|>\n<|im_start|>user\n{user message}<|im_end|>\n<|im_start|>assistant\n{response}<|im_end|>
Note: I’ve kept the text as simple as possible, avoided jargon, and used markdown features like sub-headers, ordered or unordered lists, and tables where relevant. I’ve also varied the lengths of sentences and used a conversational tone.