Altman Solon, the largest global strategy consulting firm dedicated to the TMT sectors, examines how generative AI impacts network infrastructure. The final article in the “Putting Generative AI to Work” series examines the impact of emerging enterprise generative AI solutions on compute, storage, and networking infrastructure.
The best spokesperson for generative AI might be ChatGPT. The chatty, engaging app, developed by OpenAI, dazzled millions worldwide with life-like answers to any number of questions. Going under the hood, OpenAI’s impressive large language model (LLM) runs on a vast supercomputing infrastructure. In the case of ChatGPT, OpenAI relied on their partner Microsoft Azure’s cloud computing platform to train this model. Microsoft spent years linking thousands of Nvidia graphics processing units (GPUs) together while managing cooling and power constraints.
Altman Solon believes growth in enterprise-level generative AI tools will lead to incremental demand for compute resources, positively benefiting core, centralized data centers, where training occurs, and local data centers, where inference occurs. Network providers should see a moderate increase in demand for data transit networks and a boost for private networking solutions.
We used a four-step methodology to understand infrastructure impact, accounting for the average compute time requirement per generative AI task, the volume of overall generative AI tasks, the incremental compute requirements needed, and the quantifiable impact on the infrastructure value chain. To meet this demand, service providers will need to start planning for adequate compute resources and network capacity.
Using data from our survey of 292 senior business leaders, Altman Solon calculated the hourly volume of generative AI tasks across four business functions – software development, marketing, customer service, and product development/design. Considering the potential adoption rate per use case, average users per case, and average generative AI tasks per user hour, we estimate between 80-130 million generative AI tasks per hour within the next year across these key enterprise functions in the US alone.
Generative AI Tasks per Hour Across Enterprise Functions
Although our research shows that customer service has been slower to adopt generative AI solutions than software development and marketing, customer service tasks would be responsible for the bulk of generative AI usage for enterprises. This is primarily due to the sheer volume of user interactions with generative AI chatbots. Of course, computing doesn’t only happen when an LLM is queried. Building and using generative AI requires considerable infrastructure.
When building and using generative AI tools, compute resources are required during two distinct phases: training the model and then using the model to respond to queries (also known as ”inference”). Training an LLM entails feeding a neural network vast amounts of collected data. In the case of ChatGPT, large amounts of text data were inputted to the LLM to recognize patterns and relationships between words and language. As the model processed new text data, its performance improved, and the parameters or values assigned to neural connections changed. The GPT-3 model contains over 175 billion parameters; GPT-4 is speculated to have 1 trillion parameters.
Training a Large Language Model: From research to human feedback
Even though iterating massive amounts of data to train a model is compute-heavy, training occurs only during model development and stops once the model is finalized. Over an LLM’s lifespan, training will take up 10-20% of infrastructure resources. By contrast, inference is where most workloads take place. When a user enters a query, the information is fed through the generative AI application’s cloud environment. 80-90% of compute workloads occur during inference, which only increases with use. To use the example of ChatGPT, training the model supposedly costs tens of millions of dollars, but running the model is reported to exceed the training costs weekly.
Generative AI models tend to be trained and stored in centralized, core data centers because they have the GPUs necessary to process large amounts of data that cycles through LLMs. Models are also housed and trained in the public cloud to eliminate ingress and egress costs since training data already resides there. As adoption grows, we expect inference to be conducted more locally to alleviate congestion in the core, centralized data centers where models are trained. However, this has some limits: generative models are too large to house in conventional edge locations, which tend to have higher real estate and power costs and cannot be easily expanded to accommodate resource-hungry AI workloads.
“We built a system architecture [for the supercomputer] that could operate and be reliable at a very large scale. That’s what resulted in ChatGPT being possible,” she said. “That’s one model that came out of it. There’s going to be many, many others.”
— Nidhi Chappell, Microsoft General Manager of Azure AI infrastructure —
We believe the uptick in generative AI tools will thus have the most significant impact on the public cloud. Public cloud providers should focus on regional and elastic capacity planning to support demand. In the near future, as models mature and as critical compute resources become cheaper, private cloud environments might begin housing and training LLMs. This could be particularly useful for industries in regulated spaces like pharmaceuticals, financial services, and healthcare, which tend to prefer developing and deploying AI tools on private infrastructure.
Network usage of generative AI tools is affected most by query volume and complexity. For example, asking ChatGPT to answer a fundamental question versus asking it to draft a 900-word essay impacts network usage differently, just as requesting a single image on Dall-E versus a series of high-definition photos or videos does. Longer, more complex outputs require higher volumes of data, impacting increasing demand on backbone and metro edge networks. This is especially true for the fiber that connects to major cloud platforms or between cloud environments. We predict that metro edge networks will experience an uptick in usage as generative AI matures and gets mainstreamed by consumers and businesses. The number of queries could create traffic and compute or storage saturation in specific cloud choke points, necessitating a more distributed approach to housing AI models (e.g., replicating in regional cloud environments).
Although we are still in the early days of generative AI adoption, we believe that to meet increased compute, networking, and storage demand, infrastructure providers should adapt in the following ways:
Using LLMs to automate essential business functions will not only impact employees and management. Still, it will also affect the building blocks of generative AI itself, from core, centralized data centers to original equipment manufacturers, chip manufacturers, and designers. While the future has yet to be written, infrastructure providers should consider how to service a technology requiring considerable computing, storage, and network resources.
In March 2023, Altman Solon analyzed emerging enterprise use cases in generative AI. Leaning on a panel of industry experts and a survey of 292 senior executives, Altman Solon analyzed preferences for developing and deploying enterprise-grade generative AI tools. The survey also examined the key deciding factors in developing and deploying generative AI tools. All organizations surveyed were from the United States of varying business sizes. Explore additional insights from our Putting Generative AI to Work series, including emerging generative AI use cases for enterprises and preferences for developing and deploying enterprise generative AI tools.
Fill out the form below to gain strategic insights into the implications of generative AI use.
Thank you to Elisabeth Sum, Oussama Fadil, and Joey Zhuang for their contributions to this report.