Qwythos-9B Claude Mythos GGUF: 1M‑token, multimodal, uncensored reasoning model
empero-ai/Qwythos-9B-Claude-Mythos-5-1M-GGUF ↗
The **Qwythos-9B-Claude-Mythos-5-1M-GGUF** model, released by Empero AI, is a GGUF‑quantized version of the base Qwythos‑9B Claude Mythos model. It is built on the Qwen3.5‑9B architecture and has been post‑trained on over 500 million tokens of high‑quality Claude Mythos and Claude Fable traces, giving it strong reasoning abilities (evidenced by +34 pts MMLU, +30 pts GSM8K‑strict, and +19 pts GSM8K‑flex over the base). The model supports native function calling per the Qwen3.5 specification and ships with a 1,048,576‑token (1 M) context window enabled by YaRN rope‑scaling.
The repository provides multiple quantizations (Q4_K_M, Q5_K_M, Q6_K, Q8_0, BF16) as well as MTP‑enabled variants for llama.cpp draft speculation. The recommended default is the Q4_K_M quant for a good quality‑size trade‑off. A separate CLIP‑style vision projector (mmproj‑F16.gguf) allows image input, inheriting the vision capabilities of the original Qwen3.5‑9B model such as detailed description, OCR, chart reading, and basic spatial reasoning. Vision functionality works with any of the text GGUF files when paired with the projector.
Qwythos is marketed as an uncensored, agentic model suitable for demanding domains like cybersecurity, biomedical research, and general tool‑use. It emits a `<think>...</think>` block at the start of each response to expose its chain‑of‑thought reasoning, and the README recommends sampling settings (temperature 0.6, top‑p 0.95, top‑k 20, repeat‑penalty 1.05) to avoid repetition loops. The model can be run with llama.cpp, Ollama, LM Studio, Jan, KoboldCpp, or any GGUF‑compatible runtime, and it includes support for OpenAI‑compatible vision APIs when served via llama‑server.
All weights are released under the Apache‑2.0 license, inherited from the Qwen3.5‑9B base model, and the project encourages users to add their own safety layers due to the model's uncensored nature.
Project Ideas
- Create a long‑context research assistant that can ingest and reason over documents up to 1 M tokens, using the built‑in `<think>` chain‑of‑thought output for transparent reasoning.
- Build a cybersecurity red‑team chatbot that leverages native function calling to execute simulated attacks, retrieve CVE data, and suggest mitigation steps.
- Develop a biomedical literature summarizer that combines text generation with the vision projector to extract information from scanned journal figures and tables.
- Implement an agentic coding helper that uses tool calls to run Python snippets, debug code, and provide step‑by‑step explanations via the model's reasoning mode.
- Design a multimodal customer‑support bot that accepts image screenshots, describes the content, and offers troubleshooting advice, using the CLIP‑style vision projector paired with the text model.