Should AI initiatives change network planning?

WhatsApp Group Join Now
Telegram Group Join Now
Instagram Group Join Now

Everyone is into AI these days, even network planners. But for network professionals, the primary focus has been on the use of AI in network operations. What about the impact of AI on network traffic?

When I asked about 100 network planners about this, only eight of them told me that they had thought about the impact of AI on network traffic and network plans. Are they missing something? Perhaps, because there are two questions on the table here. One is whether AI has a potential impact on enterprise networks and traffic, and the other is whether it can impact technology.

The impact of AI on traffic and infrastructure depends largely on an enterprise’s plans for self-hosting its AI. The vast majority of AI models run on specialized chips like GPUs, which means specialized servers in a data center. To date, I have received comments on “self-hosted” AI from 91 enterprises. I put the term in quotes, because the truth is that only 16 of them planned dedicated AI hosting in 2023, and only eight said they did any self-hosting this year. Not surprisingly, those eight planners are the same eight who have thought about network effects. For 2024, the number jumps to 77, and I think that growth is driving interest in both AI and network equipment vendors. Cisco and Juniper, for example, have both been vocal about their AI network credentials.

The question, of course, is just what kind of AI is becoming self-hosted and networked, and we shouldn’t think we can answer that based on what we read about AI.

Generative AI is getting a lot of ink from players like ChatGPT, Google, and Microsoft, but there’s a fundamental problem with the classic generative open-internet approach as far as businesses are concerned. They are concerned about too many generalizations in public-trained chatbots. They are concerned about copyright issues cutting off AI-generated content. If the AI ​​is trained in a special way, they are concerned about the security of their data. Some are concerned about the energy and environmental impact of all those GPUs churning out human-like results. Many recent AI initiatives, including Google’s Gemini, have gone some way to advancing a new form of generative AI, which applies the underlying big language model technology that has been used inside enterprise data centers. Created popular generative AI services for data. , or as part of enterprise cloud services.

If enterprises are looking for a kind of lightweight big language model for AI, this will mean that the number of dedicated AI servers in their data center will be limited. Think in terms of a single AI cluster of GPU servers, and you have what enterprises are looking at. The dominant strategy for AI networking within this cluster is InfiniBand, a very fast, low-latency, technology strongly supported by NVIDIA but not particularly popular (or even popular) at the enterprise level. . NVIDIA’s DGX InfiniBand approach is what connects this mass of GPUs in most large AI data centers, which is why it’s almost a guess that InfiniBand will be the technology used for self-hosted AI.

This is probably unnecessary, and possibly outright wrong. Enterprises don’t need to crawl the Internet for training data for their models. Enterprises don’t need to support mass-market use of their AI, and if they do for applications like chatbots in customer support, they’ll likely use cloud hosting rather than in-house. deployment This means that AI to enterprise is really a form of better analytics. The widespread use of analytics has influenced data center network planning for database access, and AI will likely increase database access if it is widely used. But even given all that, there’s no reason to think that Ethernet, the dominant data center network technology, won’t be fine for AI. So forget the concept of InfiniBand technology replacement. But that doesn’t mean AI won’t need to be planned into the network.

Think of an AI cluster as a huge virtual user community. It has to collect data from enterprise repository, All of all, to answer user questions, to provide training and to receive up-to-date information. This means it needs a high-performance data path for that data, and that path cannot be allowed to crowd other traditional workflows within the network. This issue is acute for enterprises with multiple data centers, multiple customer complexes, as they likely won’t want to host AI everywhere. If the AI ​​cluster is isolated from certain applications, databases, and users, data center interconnect (DCI) paths may need to be extended to carry traffic without the risk of congestion.

According to these eight AI-hosting enterprises, the rule of thumb for AI traffic is that you want the workflow to be as small as possible, on the fastest connections you have. Pulling or pushing AI data over large-scale connections can make it nearly impossible to prevent random mass movement of data from interfering with other traffic. It is especially important to ensure that AI flows do not collide with other high-volume data flows, such as traditional analytics and reporting. One approach is to map the AI ​​workflow and increase capacity along the way, and the other is to shorten and guide the AI ​​workflow by properly positioning the AI ​​cluster.

Planning for an AI cluster begins with the relationship between enterprise AI and business analytics. Analytics uses the same database that AI will likely use, which means it would be smart to place AI where large analytics applications are hosted. Remember that this means putting AI where the actual analytics applications are run, not where the results are formatted for consumption. Because analytics applications are often run close to the location of large databases, this will place the AI ​​in a location that is likely to generate the fewest network connections. Run fat Ethernet pipes inside the AI ​​cluster and to the database hosts, and you’re probably in good shape. But watch AI usage and traffic carefully, especially if there aren’t many controls over who uses it and how much. Six out of eight enterprises reported rampant and largely unenforced use of self-hosted AI, and this could drive costly network upgrades.

The future of AI networking for enterprises isn’t about how the AI ​​is. Runit’s about how it is. is used, and while the use of AI will certainly drive additional traffic, it won’t require replacing an entire data center network for hundreds of gigabits of Ethernet capacity. This will require a better understanding of how AI deployments connect with AI data center clusters, cloud resources, and some generative AI. If Cisco, Juniper, or another vendor can deliver that, they can expect a nice bonus in 2024.

WhatsApp Group Join Now
Telegram Group Join Now
Instagram Group Join Now

Leave a Comment