Open-Sourced AI Models May Be More Costly in the Long Run, Study Finds

As more businesses adopt AI, picking which model to go with is a major decision. While open-sourced models may seem cheaper initially, a new study warns that those savings can evaporate fast, due to the extra computing power they require.
In fact, open-source AI models burn through significantly more computing resources than their closed-source rivals when performing the same tasks, according to a study published Thursday by Nous Research.
The researchers tested dozens of AI models, including closed systems from Google and OpenAI, as well as open-source models from DeepSeek and Magistral. They measured how much computing effort each required to complete identical tasks across three categories: simple knowledge questions, math problems, and logic puzzles.
To do this, they used the number of tokens each model used to solve and answer questions as for computing resources consumed.
“Open-weight models use 1.5–4× more tokens than closed ones—and up to 10× for simple knowledge questions—making them sometimes more expensive per query despite lower per-token costs,” the study authors wrote.
Why token efficiency matters
In AI, a token is a piece of text or data—it could be a word, part of a word, or even punctuation—that models use to understand language. Models process and generate text one token at a time, so the more tokens they use, the more computing power and time a task requires.
Since most closed-source models don’t reveal their raw reasoning process or chain of thought (CoT), the researchers measured their computing efficiency by counting the tokens they used instead. Because models are billed by total output tokens used in their reasoning process and outputting the final answer, completion tokens serve as a proxy for the effort needed to produce a response.
This is an important consideration for companies using AI for many reasons.
“First, while hosting open weight models may be cheaper, this cost advantage could be easily offset if they require more tokens to reason about a given problem,” the researchers wrote. “Second, an increased number of tokens will lead to longer generation times and increased latency.”
Closed models were the clear winners
The study found that open models consistently use more tokens than closed models for the same tasks, sometimes three times as many for simple knowledge questions. The gap narrowed to less than twice for math and logic problems.
“Closed models (OpenAI, Grok-4) optimize for fewer tokens to cut costs, while open models (DeepSeek, Qwen) use more tokens, possibly for better reasoning,” the study authors wrote.
Among open models, llama-3.3-nemotron-super-49b-v1 was the most efficient, while Magistral models were the most inefficient.
OpenAI’s models were standouts as well. Both its o4‑mini and the new open-weight gpt‑oss models showed impressive token efficiency, especially on math problems.
The researchers noted that OpenAI’s gpt‑oss models, with their concise chain-of-thoughts, could serve as a benchmark for improving token efficiency in other open models.


