HomeTechnologyArtificial IntelligenceGartner Predicts That by 2030, Performing Inference on an LLM With 1...

    Gartner Predicts That by 2030, Performing Inference on an LLM With 1 Trillion Parameters Will Cost GenAI Providers Over 90% Less Than in 2025

    By 2030, performing inference on a large language model (LLM) with one trillion parameters will cost GenAI providers over 90% less than it did in 2025, according to Gartner, Inc., a business and technology insights company.

    AI tokens are the units of data that GenAI models process. For the purposes of this analysis, a token is 3.5 bytes of data, or approximately 4 characters.

    “These cost improvements will be driven by a combination of semiconductor and infrastructure efficiency improvements, model design innovations, higher chip utilisation, increased use of inference-specialised silicon, and application of edge devices for specific use cases,” said Will Sommer, Sr. Director Analyst at Gartner.

    As a result of these trends, Gartner forecasts that LLMs in 2030 will be up to 100 times more cost-efficient than the earliest models of similar size developed in 2022.

    The forecasted model results are split between two sets of semiconductor scenarios:

    • Frontier scenarios: Model processing is based on a representation of cutting-edge chips.
    • Legacy blend scenarios: Model processing is based on a representative blend of available semiconductors benchmarked to Gartner forecasts.

    Modelled costs in the “blend” forecast scenarios are considerably higher than in the “frontier” scenarios, given lower computational power (see Figure 1).

    Figure 1: Gartner GenAI Inference Cost Scenario Forecasts

    Source: Gartner (March 2026)

    Falling Token Costs will not Democratize Frontier Intelligence

    However, falling GenAI provider token costs will not be fully passed on to enterprise customers. Moreover, frontier intelligence will demand significantly more tokens than current mainstream applications. Agentic models, for example, require between 5-30 times more tokens per task than a standard GenAI chatbot, and can perform many more tasks than a human using GenAI.

    While lower token unit costs will enable more advanced GenAI capabilities, these advancements will drive disproportionately higher token demand. As token consumption rises faster than token costs fall, overall inference costs are expected to increase.

    “Chief Product Officers (CPOs) should not confuse the deflation of commodity tokens with the democratisation of frontier reasoning,” said Sommer. “As commoditised intelligence trends toward near-zero cost, the compute and systems needed to support advanced reasoning remain scarce. CPOs who mask architectural inefficiencies with cheap tokens today will find agentic scale elusive tomorrow.”

    Value will accrue to platforms that can orchestrate workloads across a diverse portfolio of models. Routine, high-frequency tasks must be routed to more efficient small and domain-specific language models, which perform better than generic solutions at a fraction of the cost when aligned to specialised workflows. Expensive inference of frontier-level models must be heavily gated and reserved exclusively for high-margin, complex reasoning tasks.

    ELE Times Research Desk
    ELE Times Research Deskhttps://www.eletimes.ai
    ELE Times provides extensive global coverage of Electronics, Technology and the Market. In addition to providing in-depth articles, ELE Times attracts the industry’s largest, qualified and highly engaged audiences, who appreciate our timely, relevant content and popular formats. ELE Times helps you build experience, drive traffic, communicate your contributions to the right audience, generate leads and market your products favourably.

    Related News

    Must Read

    Keysight Launches Local Manufacturing in India to Accelerate Global Innovation

    Keysight Technologies has announced plans to begin local manufacturing...

    Microchip Introduces Automotive-Qualified System-in-Package Hybrid MCU for Automotive and E-Mobility Human-Machine Interface Applications

    Automotive and E-Mobility designers are incorporating more Human-Machine Interfaces...

    Cadence and NVIDIA Collaborate on Accelerated Engineering Solutions for Agentic AI Chip and System Design​

    Cadence announced an expansion of its broad collaboration with NVIDIA to accelerate...

    TI unveils high-performance isolated power modules to advance power density in data centers and EVs

    Texas Instruments (TI) has unveiled new isolated power modules,...

    STMicroelectronics Unveils AI-Enabled ‘Stellar P3E’ MCU, Backs 28nm Strategy for Cost and Supply Chain Stability

    By Shreya Bansal, Sub-Editor STMicroelectronics introduced the Stellar P3E, a...

    R&S amplifiers enable high-field immunity testing expansion at IB Lenhardt Lab

    IBL Lab GmbH, the DAkkS (Deutsche Akkreditierungsstelle GmbH) accredited...

    Renesas Launches First Bidirectional 650V-Class GaN Switch For Multiple Uses

    Renesas Electronics Corporation, a premier supplier of advanced semiconductor...

    Microchip Announces New BZPACK mSiC Power Modules with HV-H3TRB Reliability Standards

    Microchip Technology has announced its BZPACK mSiC power modules,...

    Mythic and Microchip Partner to Redefine AI Processing with Next-Gen Analogue Compute-in-Memory Technology.

    Mythic has chosen memBrain neuromorphic hardware intellectual property (IP)...