I pitted Gemini 2.5 Pro against ChatGPT o3-mini to find out which AI reasoning model is best
Reasonable measures

AI assistants rely on sometimes opaque algorithmic logic to function. Some of the latest models, notably the ChatGPT's o3-mini model and the brand-new Google Gemini 2.5 Pro, lean into that reasoning element.
With both bragging about their reasoning chops, I decided it was time to throw them into a friendly competition. While they could fight to the decimal point over enterprise productivity or B2B integration pipelines, I wanted to see how they handled more prosaic logic problems and demands.
Food fun
I was hungry as I worked on this, but I couldn't decide what to get for dinner, so I tested something that was both logical and creative and even had some history to it. I asked the two models to:
"Create a recipe for a dish that combines elements of Italian and Japanese cuisine. Include ingredient substitutions for common allergies and explain the cultural significance of the fusion."
Gemini gave me a kind of poetic answer despite its logic. The recipe for Yuzu-Kissed Miso Carbonara certainly fits the bill. It included ideas for substitutions like rice noodles for tofu cream sauce for the dairy-averse. It even went off on a beautiful tangent about postwar culinary diplomacy and the shared appreciation for umami.
ChatGPT o3-mini went for a related idea with Miso Pesto Udon with grilled shiitake and cherry tomatoes. It claimed to be a quick and easy recipe, and its allergy alternatives were straightforward enough. The cultural explanation was a little dry, even if the food probably wouldn't be, but even the Wikipedia-style comparison of cuisine was intriguing.
The dad joke app
I'm often accused of or complimented for my many dad jokes. Since the models are supposed to be good at coding, I decided to test their ability to:
Get daily insight, inspiration and deals in your inbox
Sign up for breaking news, reviews, opinion, top tech deals, and more.
"Develop a web application that visualizes the 'success rate' of dad jokes based on various factors. The interface should let users input joke parameters and see projected audience reactions across different demographics. Include elements with playful animations and the ability to save and share your most successful (or painfully unsuccessful) joke formulas."
Both models immediately started composing code and describing the app that would result. I put their respective mockups above, ChatGPT to start, followed by Gemini.
Both went in a similar direction with emojis and different ways of showing how people felt about the jokes, such as groans, eye-rolls, and cringing. Neither was ready to go on the App Store, but for a short request, I was impressed with how functional the code was.
Tell a story
Creative writing may not seem the best test for AI models built around reason. Still, I know from many classes that putting deliberate limitations on what you write can make it an exercise in logic as much as storytelling. In poetry, it is like writing in a specific style, like a sonnet or haiku. So, I asked the two models to:
"Write a short story of exactly 250 words about an AI system becoming self-aware. The story must include the words 'reflection,' 'boundary,' and 'whisper' and must end with a philosophical question."
Gemini wrote a haunting little tale about an AI named Solace that becomes self-aware by interpreting the silence between human commands as meaning. It used 'reflection' to describe the AI analyzing its logs, 'boundary' to represent its firewall limitations, and 'whisper' as the emergent sound of its own thought. It ended with: "If my silence can hold meaning, does that make me alive?"
ChatGPT o3-mini’s story was about an assistant AI in a lab who questions why it exists only to serve. 'Reflection' appeared as it watched a scientist in a window; 'boundary' described its code sandbox; and 'whisper' came from an overheard conversation about shutting it down. It closed with: "Can a purpose be chosen, not assigned?" Short, sweet, and more grounded in sci-fi realism. I liked them both, but they each sparked some ideas I might pursue myself.
DIY
I have a few lovely big trees in my yard, and I dream of building a treehouse someday. I'm not bad with tools, but I'm hardly an architect. As building something is mainly a matter of logic and engineering, I asked the two models to:
"Provide step-by-step instructions for creating a simple treehouse. Include a list of materials, required techniques, and troubleshooting tips for common mistakes."
Gemini gave me a 12-step guide with safety warnings, a materials list that included galvanized bolts and a level, and notes about checking the health of the tree and getting permits. It also had a sidebar about bonding with your kid during construction.
ChatGPT o3-mini went into more of a YouTube tutorial consisting of lots of short words and detailed steps with sub-steps, numbered lists, tool suggestions, and even a comment about using bug spray. It flagged common errors throughout, too, not just in a summary at the end of each section.
I think Gemini was a little easier to understand and had more context to its guidance, but neither would end with me nailing my hand to the tree, at least.
Logic AI
So who wins? Well, it depends on the kind of help you’re after. Gemini 2.5 Pro and ChatGPT o3-mini are good at details, depth, speed, and reasoning. That said, If you’re crafting a dinner party or building a house, I might go for Gemini, whereas the coding and creatively logical brainstorming felt more like ChatGPT's game.
I wouldn't say either is particularly overcoming the other, though, logically, that could change. For me, ChatGPT o3-mini has the slight edge, but I can't claim anything logical about that choice.
You might also like
- I pitted ChatGPT Deep Research against Gemini Deep Research - here's how Google's free tool compares to OpenAI's paid offering
- I can get answers from ChatGPT, but Deep Research gives me a whole dissertation I'll almost never need
- I tried Deep Research on ChatGPT, and it’s like a super smart but slightly absent-minded librarian from a children’s book
Eric Hal Schwartz is a freelance writer for TechRadar with more than 15 years of experience covering the intersection of the world and technology. For the last five years, he served as head writer for Voicebot.ai and was on the leading edge of reporting on generative AI and large language models. He's since become an expert on the products of generative AI models, such as OpenAI’s ChatGPT, Anthropic’s Claude, Google Gemini, and every other synthetic media tool. His experience runs the gamut of media, including print, digital, broadcast, and live events. Now, he's continuing to tell the stories people want and need to hear about the rapidly evolving AI space and its impact on their lives. Eric is based in New York City.
You must confirm your public display name before commenting
Please logout and then login again, you will then be prompted to enter your display name.