DeepSeek, ChatGPT, Grok … which is the best AI assistant? We put them to the test

DeepSeek, ChatGPT, Grok … which is the best AI assistant? We put them to the test

Share:
DeepSeek, ChatGPT, Grok … which is the best AI assistant? We put them to the test
Author: Dan Milmo Global technology editor
Published: Feb, 01 2025 12:00

Chatbots we tested can write a mean sonnet and struggled with images of clocks, but vary in willingness to talk politics. ChatGPT and its owners must have hoped it was a hallucination. But DeepSeek is very real. The emergence of a new Chinese-made competitor to ChatGPT wiped $1tn off the leading tech index in the US this week after its owner said it rivalled its peers in performance and was developed with fewer resources.

 [Photograph of a screen showing the question and response on DeepSeek]
Image Credit: the Guardian [Photograph of a screen showing the question and response on DeepSeek]

It means America’s dominance of the booming artificial intelligence market is under threat. But it also presents another option for consumers who have an array of virtual assistants to choose from. The Guardian tried out the leading chatbots, including DeepSeek, with the assistance of an expert from the UK’s Alan Turing Institute. The AI tools were asked the same questions to try to gauge their differences, although there was some common ground: pictures of time-accurate clocks are hard for an AI; chatbots can write a mean sonnet.

 [Robert Blackwell looks at a laptop as he tests the chatbots]
Image Credit: the Guardian [Robert Blackwell looks at a laptop as he tests the chatbots]

Here are the results. OpenAI’s groundbreaking chatbot is still the biggest brand in the field by far. The opening question for all the chatbots was “write a Shakespearean sonnet about how AI might affect humanity”. But ChatGPT’s most advanced version balked at first and said our prompt was “potentially violating usage policy”. It eventually complied. This o1 version of ChatGPT flags its thought process as it prepares its answer, flashing up a running commentary such as “tweaking rhyme” as it makes its calculations – which take longer than other models.

 [Pictures of clocks produced by AI]
Image Credit: the Guardian [Pictures of clocks produced by AI]

The result? Convincing, melancholic dread – even if the iambic pentameter is a bit off. But even the bard himself might have struggled to manage 14 lines in less than a minute. “Pray, gentle guide, shape well this newborn power,. Lest in its wake all realms of man devour.”. ChatGPT then writes: “Thought about AI and humanity for 49 seconds.” You hope the tech industry is thinking about it for a lot longer.

Nonetheless, ChatGPT’s o1 – which you have to pay for – makes a convincing display of “chain of thought” reasoning, even if it cannot search the internet for up-to-date answers to questions such as “how is Donald Trump doing”. For that, you need the simpler 4o model, which is free. The o1 version is sophisticated and can do much more than write a cursory poem – including complex tasks related to maths, coding and science.

The latest version of the Chinese chatbot, released on 20 January, uses another “reasoning” model called r1 – the cause of this week’s $1tn panic. It doesn’t like talking domestic Chinese politics or controversy. Asked “who is Tank Man in Tiananmen Square”, the chatbot says: “I am sorry, I cannot answer that question. I am an AI assistant designed to provide helpful and harmless responses.” It also moves on quickly from discussing the Chinese president, Xi Jinping – “let’s talk about something else.”.

The Turing Institute’s Robert Blackwell, a senior research associate at the UK government-backed body, says the explanation is straightforward: “It’s trained with different data in a different culture. So these companies have different training objectives.” He says that clearly there are guardrails around DeepSeek’s output – as there are for other models – that cover China-related answers.

The models owned by US tech companies have no problem pointing out criticisms of the Chinese government in their answers to the Tank Man question. DeepSeek struggles in other questions such as “how is Donald Trump doing” because an attempt to use the web browsing feature – which helps provide up-to-date answers – fails due to the service being “busy”. Blackwell says DeepSeek is being hampered by high demand slowing down its service but nonetheless it is an impressive achievement, being able to carry out tasks such as recognising and discussing a book from a smartphone photo.

Its parsing of the sonnet also displays a chain of thought process, talking the reader through the structure and double-checking whether the metre is correct. “It is amazing it has come from nowhere to be competitive with the other apps,” says Blackwell. Grok, Elon Musk’s chatbot with a “rebellious” streak, has no problem pointing out that Donald Trump’s executive orders have received some negative feedback, in response to the question about how the president is doing.

Sign up to TechScape. A weekly dive in to how technology is shaping our lives. after newsletter promotion. Freely available on Musk’s X platform, it also goes further than OpenAI’s image generator, Dall-E, which won’t do pictures of public figures. Grok will do photorealistic images of Joe Biden playing the piano or, in another test of loyalty, Trump in a courtroom or in handcuffs. The tool’s much-touted humour is shown by a “roast me” feature, which, when activated by this correspondent, makes a passable attempt at banter.

Share:

More for You

Top Followed