Google’s New Gemini 2.5 Flash Cuts AI Costs with Flexible ‘Thinking Budgets’ for Developers

Google has just launched the much-anticipated Gemini 2.5 Flash, a cutting-edge update to its AI models, bringing an exciting new feature to the table — the thinking budget. With this innovative addition, Google offers businesses and developers unprecedented control over how much computational “thinking” their AI performs, and more importantly, at what cost.

As competition intensifies among AI giants, this release is Google’s bold move to maintain its edge while providing a more affordable solution to the ever-rising costs of AI processing. The thinking budget allows developers to tailor the computational effort of the AI based on their needs, offering a scalable way to balance costs with performance.

What is Google’s ‘Thinking Budget’ and How Does It Work?

The thinking budget mechanism, available through Google AI Studio and Vertex AI, provides developers with a way to allocate computational resources for reasoning tasks. By specifying how much of the AI’s processing power should be devoted to complex problem-solving, Google aims to reduce latency and pricing. Essentially, the more “thinking” the AI does, the more expensive the task becomes. The flexibility here is key: developers can decide to scale back reasoning to lower costs or ramp it up for more sophisticated, multi-step tasks.

“We know cost and latency matter for a number of developer use cases, and so we want to offer developers the flexibility to adapt the amount of the thinking the model does, depending on their needs,” said Tulsee Doshi, Product Director for Gemini Models at Google DeepMind.

This feature reflects Google’s ongoing effort to refine how AI is deployed in business environments, where cost control is often just as critical as performance.

Cost and Pricing: How Google’s New AI Model Affects Budgets

One of the key selling points of Gemini 2.5 Flash is its unique pricing structure. For basic use, the input cost is set at just $0.15 per million tokens — a competitive rate compared to many models in the market. However, it’s the cost of the output that varies dramatically depending on whether the reasoning function is turned on or off.

With the reasoning function disabled, developers pay $0.60 per million tokens, but activating the reasoning feature increases the output cost to a staggering $3.50 per million tokens. This pricing strategy directly correlates to the computational power required for the AI to process complex tasks, reflecting the energy-intensive nature of reasoning.

“Customers pay for any thinking and output tokens the model generates,” Doshi explained. “In the AI Studio UX, you can see these thoughts before a response. In the API, we currently don’t provide access to the thoughts, but a developer can see how many tokens were generated.”

The flexibility to adjust the thinking budget can go as high as 24,576 tokens, allowing for more control over how much computational power is spent on each task. This feature is particularly useful for businesses that need to optimize their AI costs based on task complexity.

Benchmark Performance: How Gemini 2.5 Flash Stands Out

On the performance front, Gemini 2.5 Flash impresses with its competitive benchmark results. It scored 12.1% on the Humanity’s Last Exam, a challenging test that evaluates reasoning and knowledge, surpassing its competitors like Anthropic’s Claude 3.7 Sonnet (8.9%) and DeepSeek’s R1 (8.6%). Although it didn’t beat OpenAI’s o4-mini (14.3%), Gemini 2.5 Flash demonstrated strong capabilities, particularly in mathematical problem-solving and multimodal reasoning.

The model also excelled in technical benchmarks such as the GPQA diamond (78.3%) and AIME mathematics exams, showcasing its ability to handle long-context tasks and complex reasoning scenarios with remarkable speed and efficiency.

“Companies should choose 2.5 Flash because it provides the best value for its cost and speed,” Doshi noted. The performance across these key metrics makes it a viable option for businesses aiming to balance AI capabilities with their budgets.

Customizing AI Thinking: When Smart is Better Than Fast

A unique aspect of Gemini 2.5 Flash is its ability to intelligently manage how much reasoning is applied to a given task. Simple queries, such as asking about basic factual information, require minimal computational effort, and the AI can disable reasoning to save costs. For more intricate tasks, like solving mathematical problems or conducting nuanced analysis, the reasoning process is activated to ensure the best quality results.

This dynamic approach means businesses can adjust the AI’s power based on the complexity of the problem at hand. For example, a simple question like “How many provinces does Canada have?” requires little processing power, while a more complicated inquiry, such as a technical engineering problem, will engage the deeper thinking capabilities of the model.

Doshi explains, “Integrating thinking capabilities into our mainline Gemini models, combined with improvements across the board, has led to higher quality answers.” This reflects a shift toward smarter AI that can tailor its response depth to the specific needs of the user.

A Week of Big Moves: Google’s AI Expansion Continues

The release of Gemini 2.5 Flash is part of Google’s broader strategy to maintain its foothold in the competitive AI market. As part of this week’s announcements, Google also rolled out Veo 2 video generation capabilities for Gemini Advanced subscribers, allowing them to create eight-second video clips from text prompts. On top of that, all U.S. college students now have free access to Gemini Advanced until spring 2026, further expanding Google’s reach and influence in the AI space.

These initiatives show that Google is committed to offering innovative tools to both businesses and consumers while staying competitive in the face of AI giants like OpenAI. The introduction of thinking budgets and other AI advancements points to Google’s effort to offer customizable, cost-effective AI solutions for a variety of use cases.

While Gemini 2.5 Flash is currently in preview, it is already available to developers who are eager to experiment with its features. Google has stated that it plans to continue refining the model based on feedback from this early testing phase, making adjustments to its dynamic reasoning capabilities as more developers get hands-on experience.

For businesses, this represents a chance to experiment with a more flexible approach to AI deployment — one that allows them to allocate resources efficiently while also maintaining high performance when needed. As the model matures, Gemini 2.5 Flash is poised to become a significant player in the enterprise AI space.

With its thinking budgets, Gemini 2.5 Flash offers businesses a unique way to control AI costs while still accessing powerful reasoning capabilities. Google’s strategy to provide flexibility, combined with strong performance metrics, makes this release one to watch as the AI industry continues to evolve.

Tags: AI cost management AI pricing Developer Tools flexible AI Gemini 2.5 Flash Google AI thinking budget

Google’s New Gemini 2.5 Flash Cuts AI Costs with Flexible ‘Thinking Budgets’ for Developers

TRENDING

Nintendo Sues Genki Over Switch 2 Mockups and Misleading Accessory Claims

Apple’s iPhone Release Strategy Shake-Up, What to Expect from the 2026 Spring and Fall Launches

70+ Ways to Use Drones for Photography

Google Gemini AI Now Available for Kids, What Parents Need to Know?

Skype Shuts Down After 20 Years, What’s Next for Video Calls?

Xbox Series S Price Hike Makes PS5 the Better Buy Right Now

Google’s Gemini AI Beats Pokémon Blue – What This Means for AI Gaming

Grand Theft Auto VI Release Pushed to 2026, What This Means for Fans and Rockstar?