The Rise of AI Agent Infrastructure

WhatsApp Group Join Now
Telegram Group Join Now
Instagram Group Join Now

An explosion of Jenai apps is plain to see, with applications for productivity, development, cloud infrastructure management, media consumption, and even healthcare revenue cycle management. This explosion has been made possible by the rapid models and infrastructure improvements our industry has built over the past 24 months, which have simplified hosting, fine-tuning, data loading, and memory and building apps. I made it easy. As a result, many founders and investors have their eyes turned north to the top of the stack, where we can finally start putting our cutting-edge technologies to work for end users.

But the breakneck pace of Jain development means that some assumptions hold true for much longer. Apps are now being built in a new way that will impose new requirements on the underlying infrastructure. Those developers are running fast on a half-finished bridge. If our industry fails to help them stack with a new set of AI agent infrastructure components, their apps will not be able to achieve their full potential.

The rise of agents

A key change is the rise of AI agents: autonomous actors that can plan and execute multi-dimensional tasks. Today, AI agents—not directly indicative of the underlying model—are becoming a common interface that end users face and even a basic abstraction upon which developers build. This is accelerating how quickly new apps can be built and creating a new set of opportunities at the platform layer.

Starting in 2022 with projects like MRKL and in 2023 with React, Bebeggy, and AutoGPT, developers began to discover that instantiation and reaction chaining can make large tasks smaller (scheduling). I can decompose and perform them autonomously. Frameworks such as Lengchain, Lamandex, Semantic Kernel, GripType, and more have demonstrated that agents can interact with APIs through code, and research papers such as Toolfarmer and Gorilla show that underlying model APIs Can learn to use effectively. Research from Microsoft, Stanford, and Tencent has shown that AI agents do much better than themselves.

Today, the word agent means different things to different people. If you talk to enough practitioners, a spectrum emerges with multiple concepts that can all be called agents. Bebegi creator Yuhei Nakajima has it A great way to look at it:

  1. Hand-crafted agents: Chains of pointers and API calls that are independent but operate within tight constraints.
  2. Specialized agents: dynamically decide what to do in a subset of task types and tools. Compulsive, but less so than hand-crafted agents.
  3. General Agent: AGI of Agents – Even today is on the horizon as a practical reality.

The reasoning limitations of our latest frontier models (GPT-4O, Gemini 1.5 Pro, Cloud 3 Ops, etc.) are a key barrier to our ability to build, deploy, and rely on more advanced agents (specialized and general). are held back. Agents use frontier models to plan, prioritize, and self-validate—that is, to decompose large tasks into smaller tasks and ensure that the output is correct. So a trivial level of reasoning means that agents are also constrained. Over time, new frontier models with more advanced reasoning capabilities (GPT-5, Gemini 2, etc.) will make for more advanced agents.

Application of agents

Today, developers say that the best-performing agents are all highly handcrafted. Developers are exploring how to use these technologies in their current state by figuring out which use cases work under the right constraints today. Agents are expanding despite their limitations. End users are sometimes familiar with them, such as with a coding agent who responds to Slack. Increasingly, agents are also buried under other UX abstractions such as search boxes, spreadsheets, or canvases.

Consider Matrix, a spreadsheet application company founded in 2024. Matrix creates a spreadsheet that is self-completed by the user, for example, find what information the user wants based on the row and column headings in (say) cells A1:J100. Web and web pages to find every piece of data. Matrix's core spreadsheet UX isn't all that different from Excel (launched 1985) or even VisaClick (launched 1979). But matrix developers can use 1,000+ agents for independent multi-step reasoning about each row, column, or even each cell.

Or consider Grail, a marketing automation company founded in 2023. Gradual lets digital marketing teams automate their content delivery chain by helping them create asset variations, execute content updates, and create/move pages across channels. Greil offers a chat interface, but can also meet marketers within their existing workflows by answering tickets in a tracking system like JIRA or Workfront. A marketer does not need to break down high-level tasks into individual actions. Instead, Grail Agent accomplishes this and eliminates the behind-the-scenes work(s) on behalf of marketers.

To be sure, agents today have many limitations. They are often wrong. They need to be managed. Running many of these has bandwidth, cost, latency and user experience implications. And developers are still learning how to use them effectively. But readers would be right to note that those limitations echo complaints about the foundation model itself. Techniques such as validation, voting, and model linking provide reinforcement for AI agents that recent history has shown for GENAI as a whole: developers are rapidly improving science and engineering and building with the future state in mind. Trusting to do. They are running fast on the half-finished bridge I mentioned above, under the assumption that it will be finished quickly.

Support agent with infrastructure

All of this means that our industry has work to do to build the infrastructure that supports AI agents and the applications that rely on them.

Today, many agents are almost vertically integrated, with no organized infrastructure. This means: self-managed cloud hosts for agents, databases for memory and state, contextual connectors to external sources, and function calling, tool usage, or something called tool calling to use external APIs. thing Some developers stitch things together with software frameworks like LangChain (especially its evaluation product LangSmith). This stack works best today because developers are iterating faster and feel they need end-to-end control over their products.

But the picture will change in the coming months as use cases stabilize and design patterns improve. We are still in the era of hand-crafted and specialized agents. Therefore, the most useful infrastructure primitives in the near term are going to be those that meet developers where they are and let them build hand-crafted agent networks that they control. That infrastructure can also be forward-looking. Over time, logic will gradually improve, frontier models will come to drive more and more workflows, and developers will want to focus on products and data—the things that differentiate them. They want the core platform to “just work” with scale, performance and reliability.

Certainly, when you look at it this way, you can see that a rich ecosystem has already begun to form that provides AI agent infrastructure. Here are some key topics:

Agent-related developer tools

Tools like FlowPlay, Wordware, and Rift generally support common design patterns (voting, costumes, authentication, “staff”), which further help developers understand these patterns and use them to build agents. Will help to do. A useful and opinionated developer tool can be one of the most important pieces of infrastructure that drives the next wave of applications based on this powerful agent technology.

As a service agent

Hand-crafted agents for specific tasks are starting to serve as infrastructure that developers can choose to buy versus build. Such agents offer feedback such as UI automation (Tinifish, Rework, FireCrawl, Superagent, Inspire, and Browse.E), tool selection (NPI, Quick), and quick creation and Engineering. Some end users may implement these agents directly, but developers will also access these agents through APIs and integrate them into broader applications.

Browser infrastructure

Reading and following the web is a top priority. Developers enrich their agents by allowing them to interact with APIs, SaaS applications, and the web. API interfaces are fairly straightforward, but websites and SaaS applications are complex to access, navigate, analyze, and scrape. Doing so makes it possible for any web page or web app to use it because they will use an API to access its information and capabilities in a structured way. This requires managing connections, proxies and captchas. Browserbase, Browserless, Upify, Roshan Data, Platform.sh, and CloudFlyer Browser Rendering are all examples of companies that have products in this area.

A personalized memoir

When agents distribute tasks across multiple models, providing shared memory ensures that each model has access to relevant historical data and context. Vector stores such as Pinecone, Vivit, and Chroma are useful for this. But a new class of companies with complementary, polling functionality, including Why Who and Cogni, a feature from Langchain, a feature called LangMem, and a popular open-source project called Memput. These companies show how to personalize agent memory for the end user and that user's current context.

For agents

These agents manage authentication and authorization on behalf of agents as they interact with external systems on behalf of the end user. Today, developers are using oauth tokens to allow agents to impersonate end-users (vulnerable), and, in some cases, even ask users to provide API keys. But the UX and security implications are serious, and not all of the web supports OAUTH (which is why Plaid exists in financial services). anon.com, mindware, and statics.e are three examples of what developers want at scale: managed authentication and authorization for self-agents.

“Versailles for Agents”

Seamlessly manage, orchestrate and scale a host of agents with a distributed system. Today there are primitives for agent hosting (E2B.DEV, Olama, Langserv), persistence (integist, hatchet.ron, trigger.def, temporal.u), and orchestration (dspy, autogen, crew, SEMA4.ai). There is a different set. , Langgraf). Some platforms (LangChain and Griptape) offer managed services for different combinations of these things. A stable service that offers scalable, managed hosting with persistence and orchestration by an app developer will mean that the developer no longer has to think at multiple levels of abstraction (app and agent) and Instead they can focus on the problem they want to solve.

Building the future of AI agent infrastructure

It is so early in the evolution of AI agent infrastructure that today, we see a mix of operational services and open source tools that have yet to be commercialized or incorporated into broader services. And it's not clear who the winners will be – in this domain, the endgame winners may be young today or may not even exist yet. So let's get to work.

We are pleased to share a view of the emerging AI agent infrastructure stack as it looks in June 2024. Our industry is still learning together – so please let us know what we're missing and how we can more accurately reflect developer options.. Madrona is actively investing in AI agents, the infrastructure that supports them, and the apps that rely on them. We have supported several companies in this area and will continue to do so. If you're building AI agent infrastructure, or applications that rely on it, we'd love to meet you. You can reach us directly: [email protected]

Relevant insights.

    Moving to Production: The Playbook for Customizing Genai Apps

    Dust founders' BET Foundation models will change how companies operate

    The Generative AI Stack: Making the Future Faster

WhatsApp Group Join Now
Telegram Group Join Now
Instagram Group Join Now

Leave a Comment