Gartner: The LLM price war in China will accelerate the AI gravity to cloud

A graphic image of a cloud set in a digital background.
(Image credit: Shutterstock/ZinetroN)

In recent months, Chinese generative AI (GenAI) vendors have significantly reduced the inference costs of their large language model (LLM) APIs by over 90%, a strategic move aimed at facilitating the business adoption of GenAI.

While the immediate business impact may be limited, the long-term implications are profound, particularly in driving the migration of enterprise AI workloads from on-premises to cloud environments. This shift is fueled by the sustained decrease in API prices, coupled with the inherent advantages of cloud deployment, such as agility, innovation speed, and an extensive ecosystem. Data and analytics leaders must evaluate the impact of the price war and scale GenAI solutions.

Mike Fang

Senior Director Analyst at Gartner covering AI in data and analytics.

1. The price drop: Short-term vs. long-term impacts

In the short term, the price reduction of LLM APIs will have a limited impact on companies. Many organizations that have implemented on-premises GenAI solutions are not directly affected by these price changes. For those utilizing cloud deployment, API cost constitutes just a component of the overall GenAI solution cost. Factors such as AI software, data readiness, governance, security, and human talent significantly contribute to the total cost of ownership (TCO).

However, the continuous decrease in API prices will likely drive a revaluation of AI deployment strategies. Enterprises will increasingly see the benefits of cloud deployment, including lower upfront costs, flexibility, and the ability to leverage a broader ecosystem of tools and services. This shift will be further accelerated by the expectation that the average price of LLM APIs will drop to less than 1% of the current average price by 2027.

Therefore, prioritize generative AI investments based on the value, risks, and capabilities of the models, as well as the end-to-end cost structures of GenAI solutions. Emphasis should be placed on identifying the specific use cases where GenAI can deliver the most value and evaluating the capabilities of different models while considering associated risks. Additionally, all components of the GenAI cost structure, including fine-tuning, security, services, and talent, must be taken into account. Beyond costs, factors such as the quality, throughput, and latency of LLM APIs are critical in selecting the right model for specific use cases.

The sustained decrease in API prices necessitates a revaluation of AI deployment strategies, with a focus on the trade-offs between cloud-based and on-premises deployments. Cloud environments are increasingly being used for hosting enterprise workloads and data, offering capabilities such as analytics, AI/ML, and platform as a service (PaaS). On-premises deployments, while still relevant, may face challenges in keeping up with the rapid pace of innovation.

While the cloud offers robust data privacy and security standards, certain companies may have regulatory requirements that necessitate on-premises data storage. Cloud providers offer diverse compute resources and containerized virtualization, making it easier to deploy and optimize models, whereas on-premises solutions require significant investment in AI chips and infrastructure.

Cloud ecosystems provide extensive integration options and access to various LLMs, whereas on-premises solutions depend on existing technical infrastructure and may have limited integration options. The cloud’s pay-as-you-go model lowers upfront costs and provides financial flexibility, while on-premises solutions involve heavy upfront investments but may offer long-term savings in maintenance and operation costs. Furthermore, cloud deployments require expertise in cloud architecture and infrastructure management, while on-premises deployments demand a deep understanding of the end-to-end technology landscape.

2. Future outlook and strategic recommendations

By 2027, Gartner projects that cloud-based AI inference workloads in China will constitute 80% of the total AI inference workloads, up from 20% currently. This shift is driven by the continuous decline in API prices and the advantages offered by cloud deployment. As technology evolves and architectures become more complex, the benefits of cloud solutions will become increasingly evident. To navigate this transition, balance cost considerations with the broader benefits of cloud deployment to fully harness the potential of generative AI.

Reevaluate LLM deployment strategies by assessing the pros and cons of cloud versus on-premises solutions in the context of priorities and regulatory environment. Leveraging cloud LLM APIs, analyze the data collected from pilot use cases to inform future deployment strategies. Identify suitable use cases for cloud deployment and evaluate the flexibility, agility, and cost-efficiency offered by cloud solutions. The ongoing price war among Chinese LLM vendors is a catalyst for the acceleration of AI gravity towards the cloud, and enterprises must be prepared to adapt and thrive in this evolving landscape.

We've featured the best cloud storage.

This article was produced as part of TechRadarPro's Expert Insights channel where we feature the best and brightest minds in the technology industry today. The views expressed here are those of the author and are not necessarily those of TechRadarPro or Future plc. If you are interested in contributing find out more here: https://www.techradar.com/news/submit-your-story-to-techradar-pro

Mike Fang is a Senior Director Analyst at Gartner covering AI in data and analytics.

Read more
A profile of a human brain against a digital background.
Navigating the rising costs of AI inference in the era of large-scale applications
DeepSeek
Nvidia out? DeepSeek pairs with banned Chinese tech giant to deliver unbelievably low pricing on AI inference which could cause Nvidia's house of cards to come crashing
Data center racks with cables and servers
The tipping point for AI and Managed Cloud
An AI face in profile against a digital background.
Bang goes AI? DeepSeek and the ‘Star Trek’ future
A hand reaching out to touch a futuristic rendering of an AI processor.
DeepSeek and the race to surpass human intelligence
Ai tech, businessman show virtual graphic Global Internet connect Chatgpt Chat with AI, Artificial Intelligence.
Apple is the biggest winner of DeepSeek’s new AI breakthrough
Latest in Pro
Code Skull
Interpol operation arrests 300 suspects linked to African cybercrime rings
Insecure network with several red platforms connected through glowing data lines and a black hat hacker symbol
Multiple H3C Magic routers hit by critical severity remote command injection, with no fix in sight
ai quantization
Shadow AI: the hidden risk of operational chaos
An abstract image of a lock against a digital background, denoting cybersecurity.
Critical security flaw in Next.js could spell big trouble for JavaScript users
Digital clouds against a blue background.
Navigating the growing complexities of the cloud
Zendesk Relate 2025
Zendesk Relate 2025 - everything you need to know as the event unfolds
Latest in News
Samsung Galaxy S25 from the front
The Now Bar on Samsung One UI 7 is about to get a lot more useful – and could soon match Live Activities on iOS
Netflix Ads
Netflix adds HDR10+ support – great news for Samsung TV owners, but don't expect LG and Sony to do the same any time soon
FiiO FX17 IEMs
Our favorite budget audiophile brand unveils wired earbuds with 26(!) drivers, electrostatic units, USB-C ultra-Hi-Res Audio, and a not-so-budget price
Nvidia RTX 5080 against a yellow TechRadar background
RTX 5080 24GB version teased by MSI - is it time to admit that 16GB isn't enough for 4K?
A close up of the PlayStation symbol at the top of a PS5 Slim console with a white brick background
Sony has dropped a new PS5 update, improving activities and adding more emoji support
girl using laptop hoping for good luck with her fingers crossed
Windows 11 24H2 seems to be a massive fail – so Microsoft apparently working on 25H2 fills me with hope... and fear