In recent months, Chinese generative AI (GenAI) vendors have significantly reduced the inference costs of their large language model (LLM) APIs by over 90%, a strategic move aimed at facilitating the business adoption of GenAI.
While the immediate business impact may be limited, the long-term implications are profound, particularly in driving the migration of enterprise AI workloads from on-premises to cloud environments. This shift is fueled by the sustained decrease in API prices, coupled with the inherent advantages of cloud deployment, such as agility, innovation speed, and an extensive ecosystem. Data and analytics leaders must evaluate the impact of the price war and scale GenAI solutions.
Senior Director Analyst at Gartner covering AI in data and analytics.
1. The price drop: Short-term vs. long-term impacts
In the short term, the price reduction of LLM APIs will have a limited impact on companies. Many organizations that have implemented on-premises GenAI solutions are not directly affected by these price changes. For those utilizing cloud deployment, API cost constitutes just a component of the overall GenAI solution cost. Factors such as AI software, data readiness, governance, security, and human talent significantly contribute to the total cost of ownership (TCO).
However, the continuous decrease in API prices will likely drive a revaluation of AI deployment strategies. Enterprises will increasingly see the benefits of cloud deployment, including lower upfront costs, flexibility, and the ability to leverage a broader ecosystem of tools and services. This shift will be further accelerated by the expectation that the average price of LLM APIs will drop to less than 1% of the current average price by 2027.
Therefore, prioritize generative AI investments based on the value, risks, and capabilities of the models, as well as the end-to-end cost structures of GenAI solutions. Emphasis should be placed on identifying the specific use cases where GenAI can deliver the most value and evaluating the capabilities of different models while considering associated risks. Additionally, all components of the GenAI cost structure, including fine-tuning, security, services, and talent, must be taken into account. Beyond costs, factors such as the quality, throughput, and latency of LLM APIs are critical in selecting the right model for specific use cases.
The sustained decrease in API prices necessitates a revaluation of AI deployment strategies, with a focus on the trade-offs between cloud-based and on-premises deployments. Cloud environments are increasingly being used for hosting enterprise workloads and data, offering capabilities such as analytics, AI/ML, and platform as a service (PaaS). On-premises deployments, while still relevant, may face challenges in keeping up with the rapid pace of innovation.
While the cloud offers robust data privacy and security standards, certain companies may have regulatory requirements that necessitate on-premises data storage. Cloud providers offer diverse compute resources and containerized virtualization, making it easier to deploy and optimize models, whereas on-premises solutions require significant investment in AI chips and infrastructure.
Are you a pro? Subscribe to our newsletter
Sign up to the TechRadar Pro newsletter to get all the top news, opinion, features and guidance your business needs to succeed!
Cloud ecosystems provide extensive integration options and access to various LLMs, whereas on-premises solutions depend on existing technical infrastructure and may have limited integration options. The cloud’s pay-as-you-go model lowers upfront costs and provides financial flexibility, while on-premises solutions involve heavy upfront investments but may offer long-term savings in maintenance and operation costs. Furthermore, cloud deployments require expertise in cloud architecture and infrastructure management, while on-premises deployments demand a deep understanding of the end-to-end technology landscape.
2. Future outlook and strategic recommendations
By 2027, Gartner projects that cloud-based AI inference workloads in China will constitute 80% of the total AI inference workloads, up from 20% currently. This shift is driven by the continuous decline in API prices and the advantages offered by cloud deployment. As technology evolves and architectures become more complex, the benefits of cloud solutions will become increasingly evident. To navigate this transition, balance cost considerations with the broader benefits of cloud deployment to fully harness the potential of generative AI.
Reevaluate LLM deployment strategies by assessing the pros and cons of cloud versus on-premises solutions in the context of priorities and regulatory environment. Leveraging cloud LLM APIs, analyze the data collected from pilot use cases to inform future deployment strategies. Identify suitable use cases for cloud deployment and evaluate the flexibility, agility, and cost-efficiency offered by cloud solutions. The ongoing price war among Chinese LLM vendors is a catalyst for the acceleration of AI gravity towards the cloud, and enterprises must be prepared to adapt and thrive in this evolving landscape.
We've featured the best cloud storage.
This article was produced as part of TechRadarPro's Expert Insights channel where we feature the best and brightest minds in the technology industry today. The views expressed here are those of the author and are not necessarily those of TechRadarPro or Future plc. If you are interested in contributing find out more here: https://www.techradar.com/news/submit-your-story-to-techradar-pro
Mike Fang is a Senior Director Analyst at Gartner covering AI in data and analytics.