model March 14, 2026

Uncensored Power: Qwen3.5-9B Aggressive Model Goes Multimodal

HauhauCS/Qwen3.5-9B-Uncensored-HauhauCS-Aggressive ↗

The Qwen3.5-9B-Uncensored-HauhauCS-Aggressive model is a 9 billion‑parameter language model released by HauhauCS that removes all refusal filters from the original Qwen3.5‑9B architecture. According to the README, it achieves **0/465 refusals**, meaning it will never block a prompt, though it may still add a brief disclaimer at the end of a response. The model retains the full capabilities of the base Qwen3.5‑9B, including its hybrid Gated DeltaNet linear attention and full softmax attention (3:1 ratio), a native context window of 262 K tokens (extendable to 1 M with YaRN), and support for 201 languages across English, Chinese, and many others.

A standout feature is its native multimodal support: a separate `mmproj` vision encoder file enables image and video inputs alongside text, making the model suitable for tasks that combine visual and linguistic understanding. The repository provides several GGUF quantizations (BF16, Q8_0, Q6_K, Q4_K_M) ranging from 5.3 GB to 17 GB, allowing deployment on a variety of hardware, from high‑end GPUs to consumer‑grade CPUs using runtimes such as llama.cpp, LM Studio, Jan, or koboldcpp. The aggressive uncensoring variant is marketed as the most loss‑less uncensored model available, which explains its rapid rise in downloads (173 k) and trending score.

For production or high‑throughput scenarios, the README recommends vLLM, SGLang, or KTransformers, and stresses keeping at least a 128 K context window to preserve the model’s “thinking” capabilities. The model is licensed under Apache‑2.0 and is hosted in the US region, making it endpoint‑compatible for cloud deployments.

Project Ideas

Deploy an always‑on, uncensored chatbot that can answer any user query in English or Chinese, using the default "thinking mode" settings for balanced responses.
Build a multimodal assistant that accepts images or short video clips, runs the provided vision encoder, and generates descriptive captions or analyses combined with text prompts.
Create a long‑context document summarizer that leverages the 262 K (or up to 1 M) token window to ingest whole reports and output concise summaries.
Set up an offline inference pipeline on a laptop using the Q4_K_M GGUF quantization with llama.cpp for low‑resource, private AI applications.
Develop a creative writing tool that generates stories, poetry, or dialogue without any content filters, optionally adding a disclaimer at the end of each output.

← Back to all reports