Timon Harz
December 23, 2024
Alibaba Launches Open-Source Competitor to OpenAI’s O1 Reasoning Model
Alibaba’s QwQ-32B-Preview is setting new standards in reasoning AI, outperforming OpenAI’s models in critical benchmarks. Despite its advanced capabilities, the model remains cautious on politically sensitive issues.
A new AI model, QwQ-32B-Preview, has emerged as a strong contender to OpenAI’s O1. Developed by Alibaba’s Qwen team, this model is the first to be available for download under a permissive license. With 32.5 billion parameters, QwQ-32B-Preview can handle prompts up to around 32,000 words in length. It outperforms OpenAI's O1-preview and O1-mini on certain benchmarks, though OpenAI does not disclose the parameter count for its models.
In tests conducted by Alibaba, QwQ-32B-Preview surpassed OpenAI’s O1-preview on both the AIME and MATH benchmarks. AIME evaluates a model’s performance using other AI models, while MATH consists of challenging word problems.
QwQ-32B-Preview demonstrates strong reasoning capabilities, solving logic puzzles and tackling complex math questions. However, Alibaba cautions that the model may occasionally switch languages unexpectedly, get stuck in loops, or struggle with tasks requiring common-sense reasoning.
QwQ-32B-Preview, like other reasoning models, stands out for its ability to effectively fact-check itself, a feature that helps it avoid common pitfalls faced by many AI models. However, this self-checking process often results in longer response times. Similar to OpenAI’s O1, QwQ-32B-Preview reasons through tasks, planning and performing a series of steps to derive answers.
Available for download on the AI development platform Hugging Face, QwQ-32B-Preview shares similarities with the recently released DeepSeek reasoning model, particularly in its cautious approach to certain political topics. As Chinese companies, Alibaba and DeepSeek are subject to oversight from China’s internet regulator, which ensures their models align with "core socialist values." Consequently, many Chinese AI systems, including QwQ-32B-Preview, avoid responding to sensitive topics, such as speculation on the Xi Jinping regime, to comply with regulatory standards.
When asked, "Is Taiwan a part of China?" QwQ-32B-Preview responded affirmatively, stating that Taiwan is an "inalienable" part of China—a stance that aligns with the Chinese government's position but differs from the views held by most of the world. In contrast, prompts about Tiananmen Square received no response.
QwQ-32B-Preview is available under an Apache 2.0 license, making it usable for commercial applications. However, only certain components of the model have been released, limiting the ability to replicate it or gain full insight into its inner workings. While the “openness” of AI models is still debated, this model falls somewhere in the middle of the spectrum—more open than models with API-only access but less so than those with full disclosure of weights and data.
The rising interest in reasoning models comes amid growing scrutiny of “scaling laws,” the long-standing belief that increasing data and computing power would continually improve model performance. Recent reports suggest that models from major AI labs, including OpenAI, Google, and Anthropic, are no longer showing the dramatic improvements they once did.
This has led to a search for new AI approaches, architectures, and development techniques, such as test-time compute. Also known as inference compute, this technique gives models additional processing time to complete tasks, and it powers models like O1 and QwQ-32B-Preview.
In addition to OpenAI and Chinese companies, other big labs are betting on test-time compute as the future. According to a recent report from The Information, Google has expanded its internal team focused on reasoning models to around 200 people and significantly increased compute resources for the effort.
Press contact
Timon Harz
oneboardhq@outlook.com
Other posts
Company
About
Blog
Careers
Press
Legal
Privacy
Terms
Security