The Chinese AI model GLM-5.1 can operate autonomously for up to eight hours. The open-source giant becomes the world's top challenger

Chinese company Z.ai has updated its flagship open-weight language model, and the new GLM-5.1 version rewrites the rules of the game in the field of agent AI - artificial intelligence capable of performing complex tasks over long periods of time without continuous human supervision. While most of today's models operate within a fixed token budget or give up once they assess that further reasoning will not change the outcome, GLM-5.1 can autonomously work on a single task for up to eight hours.

The key is a different approach to thinking. The model goes through a loop of planning, execution, evaluation of intermediate results, and re-evaluation of the chosen strategy - and repeats this loop hundreds of times until it decides that the task is complete. If it recognizes that the current approach is not leading to the goal, it changes the entire strategy.

In internal tests, Z.ai models used thousands of tool calls across several hours. It is this ability to recognize dead ends and deviate from them that experts say is what today's benchmarks fail to reliably capture.

From a technical point of view, it is an impressive machine. GLM-5.1 is built on a mixture-of-experts transformer architecture with a total of 754 billion parameters, with 40 billion parameters active per token. The context window can hold up to 200,000 input tokens, and the output reaches 128,000 tokens. The model handles reasoning, function calls and structured output. The weights are freely available via HuggingFace under the MIT license - for commercial and non-commercial use.

The results in the benchmarks are convincing, especially in the programming area. On the Artificial Analysis Intelligence Index, GLM-5.1 scores 51 points in reasoning mode - the highest among open-source models, albeit behind proprietary models Gemini 3.1 Pro Preview and GPT-5.4 (both 57 points) and Claude Opus 4.6 (53 points).

On the Arena Code leaderboard, where models compete in anonymous pairwise battles rated by programmers, GLM-5.1 came in third with an Elo rating of 1,530, behind Claude Opus 4.6 (1,542) and Claude Opus 4.6 in reasoning mode (1,548). On real software problems from GitHub tested by the SWE-Bench Pro benchmark, GLM-5.1 even led with 58.4 percent - ahead of GPT-5.4 (57.7 %), Claude Opus 4.6 (57.3 %), and Gemini 3.1 Pro (54.2 %).

Weaknesses are evident in mathematics and scientific reasoning. On the GPQA Diamond, a test of graduate-level science questions, GLM-5.1 scored 86.2 percent, while Gemini 3.1 Pro scored 94.3 percent. On the AIME 2026 competitive math problems, GLM-5.1 finished with 95.3 percent behind GPT-5.4 with 98.7 percent.

The price per performance remains significantly lower than the proprietary alternatives - $1.40 per million input tokens versus $5 for Claude Opus 4.6. However, Z.ai has increased prices over the previous version: tokens by about 40 percent and programmer subscriptions by about double. The gap is narrowing.

The broader context of the report is crucial. According to independent testing organisation METR, the length of tasks that AI agents can complete autonomously doubles approximately every seven months. However, even the best models still only successfully complete about a quarter of the long-term programming tasks in benchmarks designed to measure persistence. GLM-5.1 pushes this ceiling - and if its ability to strategically re-evaluate is confirmed in independent tests, this will be a qualitative shift, not just a performance gain.

deeplearning.ai/gnews.cz - GH

The Chinese AI model GLM-5.1 can operate autonomously for up to eight hours. The open-source giant becomes the world's top challenger

TOP

The Czech government has completed its first six months in office. The Prime Minister claims that it is fulfilling the promised agenda.

Do you think so? And do you believe that you are doing the right thing? If we want to change something in our lives, or address a situation differently...

"Chat Control" is back through the backdoor: Members of the European Parliament have used a procedural trick to reopen the controversial exemption.

Weekly summary of Chinese economy and technology: DeepSeek accelerates AI by 85%, 11,000 humanoid robots sold, Chinese electric vehicles in EU factories.

British singer Bonnie Tyler has died at the age of 75. The singer passed away in a hospital in Portugal from an illness she had been treated for.

Daily summary of the global economy: Apple, Broadcom, Meta, OpenAI, UniCredit, Steadfast, oil (July 9, 2026).

German exports surprised with growth: the economy received a strong signal from the industrial sector.

Petr Holec exposes President Petr Pavel's arms deals with NATO, the scandalous decisions of the Constitutional Court, Fiala found guilty of unconstitutional censorship, and the bias of Czech state media (Petr Holec live #279).

GNEWS Exclusive

British singer Bonnie Tyler has died at the age of 75. The singer passed away in a hospital in Portugal from an illness she had been treated for.

Petr Holec exposes President Petr Pavel's arms deals with NATO, the scandalous decisions of the Constitutional Court, Fiala found guilty of unconstitutional censorship, and the bias of Czech state media (Petr Holec live #279).

Two imprisoned souls: Parallel universes of Kafka and Lu Xun – Marie

Summary of the past 26th week on General News (June 27 – July 5, 2026).

The day when the priest, Master Jan Hus, was burned at the stake, he defended the truth, which did not extinguish in the flames.

Tensions between the United States and China, the war in Ukraine, and the global shift in power dynamics: An analysis from an interview with Peter Sabela.

Why can't George Soros stop the Belt and Road Initiative, a project of the century?

Corruption scandals and political power struggles in Ukraine: Inside information from a conversation with Peter Sabela.