Recently, IBM’s video podcast series Mixture of Expertsreleased a special episode titled “2025 — The Year of AI Agents?” Hosted by Tim Hwang, the episode featured insights from three IBM experts: engineer Chris Hay, Director of IBM’s Open Source AI Innovation Program Lauren McHugh Olende, and Vice President of Core AI and Watsonx.ai Volkmar Uhlig. Together, they shared their perspectives on the current state and future trajectory of AI agent technology.Notably, IBM (IBM.US) has seen its stock price rise 41.2% year-to-date in 2024, outperforming the Nasdaq Composite’s 15.2% and the S&P 500’s 13.2% gains. The company now has a market cap of approximately 282.9 billion,with Q3 2025 revenue growing 916.3 billion, driven in part by a 17% surge in its infrastructure segment.Here’s a summary of the key insights shared by the IBM experts during the discussion:
🔹 1. Consumer-Facing AI Agents: Still a Long Way Off
The panel agreed that consumer-grade AI agents are unlikely to see widespread adoption in the near term. Current technology still struggles to reliably handle complex, multi-step tasks in the real world. Moreover, there remains a huge gap between prototyping AI agents and deploying them at scale. A platform solution that dramatically lowers the barrier for non-technical users to create and deploy agents doesn’t yet exist.
🔹 2. Can “Natural Language to Agent” Fully Bypass Traditional Development?
Volkmar Uhlig envisions that true mass adoption will come when users can describe complex tasks in natural language, and AI will autonomously translate those into executable agents — a form of direct “natural language to agent” conversion. This, he argues, could largely bypass today’s complex framework-building processes that require developer involvement.However, Chris Hay sounded a note of caution from a practical standpoint. Giving large language models (LLMs) too much freedom to invoke tools can easily lead to unpredictable or “derailed” behavior. Therefore, for the foreseeable future, a reliable agent system will still depend on a planning module (Planner) to define and strictly enforce execution steps. This requires a careful balance between the open-ended creativity of models and the deterministic needs of real-world tasks — something that goes far beyond simple natural language commands.
🔹 3. From Proof of Concept to Scale: Key Challenges
The conversation highlighted three critical challenges in moving AI agents from concept validation to large-scale deployment:
- 1.Reliability & Control:Ensuring that agents can reliably execute plans in complex environments without going off-track or generating hallucinations will require mature frameworks and “guardrail” technologies.
- 2.Cost Efficiency:Volkmar Uhlig stressed that for agents to replace human labor or tackle previously unmanageable tasks, their costs must drop exponentially. Currently, their use remains limited to high-value, highly controlled scenarios.
- 3.Infrastructure & Ecosystem:There’s a need for “agent cloud platforms” that simplify deployment, operation, and monitoring, as well as potentially specialized optimization models for planning to reduce reliance on expensive frontier models.
🔹 4. What Will the Future AI Agent Industry Look Like?
Lauren McHugh Olende drew a parallel between today’s agent development and the customized AI model landscape of a decade ago — where each project essentially started from scratch. She believes future breakthroughs may come from the emergence of reusable “foundation agents”, or from companies that deeply specialize in a specific use case (similar to how AWS evolved from addressing its own needs) and eventually abstract that into a general-purpose platform.Volkmar Uhlig, meanwhile, argued that dominance in this space will hinge on two core capabilities:
- •Who can deliver the best reasoning and planning capabilities at the model level, and
- •Who can achieve the most extreme cost optimization at the infrastructure level, making AI agents ubiquitous.
🔹 5. Interview Highlights (as compiled by Bright Company)
🎤 Q: How close are we to a “one-click” consumer experience with AI agents?
Lauren McHugh Olende:If we use the evolution of LLMs as a benchmark, the path becomes clearer. The Transformer paper came out in 2017, BERT and GPT-1 in 2018, and it wasn’t until 2022 that ChatGPT became widely accessible via web and mobile. That’s roughly a four-year journey from lab breakthrough to mass adoption.Today’s AI agents are more akin to LLMs in 2018 — they’ve moved beyond pure research, but we haven’t yet seen a “killer app” or a truly simple, consumer-friendly product. We have “BERT-level” demos that validate concepts but aren’t ready for non-technical users. The big question is: will it take agents another four years to go mainstream? Or might capital, compute, and attention compress that timeline? Alternatively, if agents prove more complex to engineer than LLMs, the timeline could be longer.
“Consumer-facing agents may remain hindered by the bottleneck of natural language interaction.”
🛠️ Q: What’s holding back the developer ecosystem?
Lauren McHugh Olende:For experimentation, it’s actually quite exciting. With no-code tools like LangFlow, users can visually assemble agents via drag-and-drop, avoiding the risks of coding first and then discovering missing data or misaligned semantics.For more advanced users, there are options like LangChain, LangGraph, CrewAI, AutoGen, and Semantic Kernel — some offer high abstraction and ease of use, others give full control. But the real pain point comes when you try to deploy outside a controlled environment — integrating inference services, hosting logic, and connecting everything. There’s no “one-click deploy” solution yet, and developers still have to build the full tech stack themselves.Volkmar Uhlig:That’s one of the key barriers. We don’t yet have a “ready-to-use” agent solution. The true “Shopify moment” hasn’t arrived — where anyone can say, “Give me an agent,” and it just works.IBM has been experimenting internally — turning business process descriptions written in natural language directly into executable LangFlow files. Once we reach the point where anyone can describe a problem in natural language and have an agent auto-generated without coding, that’s when it becomes truly consumer-facing. Imagine saying, “Turn on the lights when I get home,” and the system instantly generates the automation — no setup required.Right now, the interface is still like a “baby programmer” tool — built for those who can code. But if we can translate natural language business logic into agents as smoothly as we do with code today, the mass-market moment will arrive. The interface just isn’t there yet.
⚙️ Q: What’s needed for production-grade AI agents?
Chris Hay:Moving from POCs or MVPs to scale is hard because consumer behavior is unpredictable. To safely deploy LLMs directly to consumers, you need guardrails — either via guard models or deterministic workflows to keep them on track.Approaches like text-to-planning are emerging, with tools such as Claude Code, Cursor, Windsurf, and Manus using planners to break down complex requests before execution. Projects like Manus decompose tasks with a planning agent, then execute them step-by-step.This “plan first, execute later” model is essential. Giving an LLM access to hundreds of tools without structure often leads to tool overuse and derailment. Even with a plan, models might skip tools, forget updates, or confidently hallucinate answers based on internal memory — causing compounding errors.Therefore, deterministic frameworks are needed to enforce step-by-step execution. But today, these frameworks are still built manually by developers, not natively integrated into platforms or models. For mass adoption, such frameworks must be embedded into the stack.
💰 Q: Who will win in the AI agent economy?
Volkmar Uhlig:Two key challenges define the competitive landscape:
- 1.Which models to use?Today’s frontier models can easily go off-track when given too many tools. Avoiding such failures still relies on their dense reasoning capabilities — which are expensive. We may soon see specialized “planning models” that focus solely on generating correct plans, but we’re not there yet.So for now, frontier models are the only option — albeit costly. We’ll likely see smaller, cheaper models dedicated to planning.
- 2.How to execute — and where?My belief — and IBM’s product philosophy — is that “AI should be everywhere”. It’s not just about stacking H100s or H200s in data centers. Agents will run on phones, edge devices, cloud, and on-prem.The key is who can make AI agents cost-efficient first. Many business processes in places like Portugal still rely on manual labor. We want agents to take over repetitive tasks, freeing humans for higher-value work — and to provide services in areas that were previously unaddressed. This is fundamentally a cost optimization race.We need infrastructure that can drive the cost per task down by 10x to 100x. Right now, agents are only viable in high-value, labor-intensive, and controlled environments. Once models become more powerful and affordable, agents will become as ubiquitous as electricity and water.
🏆 Q: Will a single model or platform dominate?
Lauren McHugh Olende:The winner will be whoever can make processes repeatable first. Today’s agent development is like traditional AI a decade ago — every project starts from scratch. With agents, it’s even harder because you’re not just rewriting code, but tweaking natural language prompts to control tool usage and optimize quality.The inflection point for traditional AI came with foundation models — large pretrained models that could handle diverse tasks out of the box. Similarly, if we can define a “foundation agent” with built-in planning and execution capabilities, and then fine-tune or configure it for different use cases, we eliminate the need to rebuild prompts from scratch each time.
“That’s the key to moving from a craft-based model to a platform-scale one.”
Will it be today’s big players?She doubts it will be a single model. Instead, success may come from multi-model orchestration combined with control mechanisms. The winner might not be an existing leader, but a dark horse that perfects a niche use case, then scales it through modular reuse — much like how AWS began with internal e-commerce needs and evolved into a universal cloud platform.
“The next platform might emerge from someone who masters one thing exceptionally well — and then finds a way to make it repeatable.”