AI supply chains (and why they matter)

The second post in our series On AI Deployment (By Sarah H. Cen, Aspen Hopkins, Andrew Ilyas, Aleksander Madry, Isabella Struckman, and Luis Videgaray)

Sarah H. Cen

Aspen Hopkins

Andrew Ilyas

, and 3 others

Apr 03, 2023

In the second post of our Series on AI Deployment, we highlight the supply chains through which AI is deployed. We explain how AI supply chains arise, what makes them different from those that came before them, and why business leaders and policy makers must take stock of AI supply chains.

Read the entire series here

These days, AI development rarely happens “in-house.” Rather, the AI system that you interact with is typically a product of many AI components glued together. One company may curate datasets that another company uses to train their AI model, which a third company then fine-tunes for a specific application (like mortgage lending decisions).

The result is a complex network of AI supply chains. In the past year, it’s become abundantly clear that these AI supply chains are here to stay. Indeed, base models (a.k.a. foundation models) such as GPT-4, PaLM, Midjourney, Stable Diffusion, and Codex have attracted so much attention precisely because other AI systems can be built on top of them. GPT alone has generated an almost Cambrian explosion of startups, companies and plugins. While some are hoping to use GPT to more efficiently generate news articles, others are devising GPT-based systems to help doctors diagnose patients.

In this way, base models like GPT will become “upstream links” in AI supply chains, alongside companies that aggregate data1. Farther downstream the supply chain will be AI systems—such as writing assistants and diagnostic aids—that directly interface with users. And wedged in between will be various AI models, tools, and services—such as computing platforms and data-labeling tools.

So, should we be concerned that AI is being deployed through an intricate, interdependent, and growing network of AI systems?

The answer is: yes, for two key reasons.

This is a visual representation of a supply chain with upstream and downstream nodes. Upstream for one supply chain with 3 downstream products and two so called middleware models is a base model such as GPT3 or DALL-E. The middleware models for this particular supply chain are summarization, and a speech to text model (2). Below that is a hiring tool, a podcast app, and an email autocomplete model. In a separate supply chain, there is overlap between the first with a downstream email autocomplete. This second supply chain does not use a base model, but rather a keyword extraction model for its upstream model. This also feeds into an event scheduler. — *An example of AI supply chains*. AI models are built on top of another one, forming complex, multi-layered supply chains. Base models are typically at the top of a supply chain (upstream). Later in the supply chain (downstream) are AI products built using other (upstream) AI systems.

First, AI supply chains amplify existing, known problems in AI. For example, there are already concerns that AI systems are biased and inscrutable. However, up until now, we’ve only thought about these problems within the context of a single AI system. The bad news is: when multiple AI components are combined, these issues are often exacerbated. In other words, problems that exist within a single AI system are not only propagated, but often amplified in an AI supply chain.

The second reason is that AI supply chains are likely to undermine existing efforts to regulate AI. For example, determining who is liable for damages caused by AI will become even more difficult when there’s an AI supply chain. Plus, the ways that we regulate supply chains in other industries won’t necessarily work for AI. Unless we begin unpacking the implications of the AI supply chain, we’ll be ill-equipped to handle a rapidly approaching future. To illustrate this point, we’ll spend this post examining several prototypical supply chains (such as those found in the auto industry). We’ll then discuss three characteristics that make AI supply chains unique and surface gaps in our existing approach to AI governance.

Complex supply chains are the norm in our economy

Let’s begin by taking a step back from AI and looking at existing supply chains.

Complex supply chains are not new—they’re everywhere. Auto manufacturing, for example, depends on a supply chain that comprises a vast network of companies, from raw material suppliers to parts manufacturers to assembly factories. In fact, understanding the different types, drivers, and consequences of supply chain complexity—as well as how to handle them—is a major area of study in Supply Chain Management (SCM).23

Each component in a supply chain requires careful coordination and management. For example, the success of the auto industry depends on the ability of each company to deliver parts on time and ensure that each part meets the required specification and quality standards. Similarly complex supply chains can be found in pharmaceuticals, food production, and aerospace; in service sectors such as banking, healthcare, and hospitality; and in other engineering domains such as software development.

What is remarkable about these supply chains is that they (generally) work. One factor that contributes to their success modularirty: that a supply chain can be broken down into distinct components. For example, when romaine lettuce is recalled due to E.Coli outbreaks4, restaurants can replace the “module” that failed—that is, to remove the undesirable produce from their menu temporarily, substituting it with spinach. And when brakes are recalled in, say, your 2006 Subaru Outback5, Subaru knows what to provide the local dealer with so they know how, when and what to replace said brakes with. While failures did occur in both of these examples, the number of people impacted by it was greatly decreased through quick and explainable responses. When a misstep inevitably occurs in a modular system, we are able to mitigate further escalation.

Modularity is only effective because there are also redundancies—the second feature of a well-functioning supply chain. If a given manufacturer or transporter has a failure, others are able to pick up the slack. Beyond this, there are also industry-specific standards encouraging replicability across manufacturers; well-articulated product specifications; certifications for safety quality and environmental compliance; as well as state, federal, and international regulations regarding materials, construction, and safety.

The third reason existing supply chains work is that we can track the provenance6 of each particular component of the final product. This ability makes it possible to both explain failures as well as to fix or work around such failures. Tracking provenance is an incredibly powerful tool: while we do see cases where an entire supply chain fails—as in the COVID-induced supply chain failures of 2021—these failures are still explainable, and thus are largely mitigatable.

What about AI?

So, supply chain complexity is the norm throughout the economy—it is present in car manufacturing and in the software industry. But what’s the point of talking about it in the context of AI deployment? After all, policy makers have largely ignored, for example, the software supply chain7. So, why should the treatment of AI deployment be any different?

Indeed, when AI deployment came about, it was natural to treat it as just another form of software. But it turns out that AI supply chains (at least in their current form) are very different from software supply chains (or any supply chains that have come before them) and will pose a unique set of risks and policy challenges.

Two key elements of AI systems: Data and algorithms

Before we delve into the three characteristics of AI that make the AI supply chain particularly unwieldy, we take a brief detour to introduce the two key elements of an AI system that we’ll continually refer to in our discussion: data and algorithms. (Feel free to skip ahead if that’s something that you are already familiar with!)

Let's consider, as an illustrative example, an AI hiring assistant that decides, based on someone’s resume, whether they should be interviewed at a particular company.

The first element that goes into building such an AI system is data (also called training data). That is, before developing a model, algorithm designers must source thousands (or millions) of resumes, as well as information about the corresponding applicants’ interview performance. Such data is valuable because it contains patterns. For example, candidates with a particular related job experience or with a specific major on their resume may interview better.

It is the job of the algorithm to extract these patterns—at scales and speeds that outmatch humans—and distill them into a set of rules that can be used to evaluate future resumes. Much of the innovation in AI lies in how these patterns are extracted—a process known as model training. During this process, the model is (repeatedly) exposed to different pieces of training data—each time it is shown a piece of this data, the model updates its internal logic—logic that specifies how the model will decide when deployed whether a given applicant should be interviewed.

Now, let’s get back to the risks particular to AI supply chains.

AI and its supply chain

To understand just what makes AI supply chains unique, let’s unpack three key differences between the AI supply chain and the supply chains that came before it:

Poor specification in AI,
The lack of modularity in AI supply chains, and
Hidden data interactions in AI.

These three properties amplify the risks of AI deployment and create new challenges in AI policy. Together, they highlight why the AI supply chain deserves attention from both business leaders and policy makers.

I: Poor specification

One reason why most supply chains are manageable is that their individual components are well-specified. That is, we know what purpose each component in the supply chain serves and how it is expected to perform (both on its own and when combined with other components).

In contrast, AI systems currently don’t come with any formal specifications or provide meaningful, standardized performance guarantees. There is simply no industry-wide standard (or even a currently viable way to go about coming up with such a standard) for how to specify or audit AI systems. There is thus a wide gap between the expectations that downstream developers have when using an AI tool and what is actually observed through its deployment—this gap we refer to as “poor specification”.

Poor specification8 can contribute to many of the challenges that AI faces in the context of delivery of robust performance, explainability, and bias. For example, suppose that one company develops an AI model to caption photos, and that this company’s training data only contains photos captured on people’s phones in everyday life. Say, however, that this training data is sourced from another company that never specifies this fact—that is, they never state (i) what type of training data they use or (ii) what criteria they use for labeling. This lack of specification from the data provider prevents the captioning company from specifying when their tool should and should not be used.

*An example of poor specification.* Consider a company building an AI captioning tool. Suppose that every photo of a dog in the tool’s training data contains a tennis ball, which causes the model to associate tennis balls with the label “dog” (see the **left** photo, in which a model labels the image a “dog” due to the presence of a tennis ball). When this model is deployed in another context (e.g., a tennis match, **right** image) it will incorrectly believe that a “dog” is present. Lack of specification here can then hurt downstream performance, as the model may be inappropriately applied in the next component of an AI supply chain.

Such poorly specified systems can be problematic. In our photo captioning example9, other models may leverage the first model as a building block, mistakenly believing that this model performs well on all types of photos. Without specification, subsequent models may propagate unanticipated errors that are difficult to trace and explain.

As AI systems become small building blocks in a larger supply chain, the risks associated with poor specification will only grow, and are likely to be transmitted (and even amplified) downstream.

II: Non-modularity

Much like in the car manufacturing and agriculture supply chain examples we describe above, a key to the success of the software supply chain is its modularity, in the sense that its components are connected but can be clearly separated. When components are modular, connecting or disconnecting them does not change their individual attributes—just like connecting Lego bricks does not change their shape or color.

AI systems, however, tend to interact with each other in a non-modular way. The way that components are combined when building such a system (and even the order used to combine them) matters, and disentangling the role of each component after-the-fact is difficult. One can think about it almost like cooking a soup—once all the ingredients have been combined, it’s hard for the recipient (or, in the case of AI, the end-user) to determine what went into the soup or how it was made.

This can be true even within a single model. After all, a single AI model is trained on many pieces of data and the way each training sample affects the model’s output is often unintuitive. Although researchers have recently begun developing tools that enable one to deconstruct the influence of each sample at scale10, there’s a long way to go before we can confidently attribute model behavior to any one sample.

The problems posed by non-modularity are, however, only further amplified in a multi-layered supply chain. A prime example of this is transfer learning: a process in which an AI model that is trained on one task is then adapted (or fine-tuned) for another. (Think of an AI model that is trained to translate between English and French being fine-tuned to perform English-Spanish translation.) This fine-tuning changes the adapted model permanently, making it difficult to trace the behavior of the new model back to the behavior of the first model. In some cases, we have knowledge of the data and training context that the downstream model inherits from upstream models. But, in many instances, we do not11 (see our discussion of poor specification above).

III: Hidden data interactions

As we already discussed, AI models learn patterns from data. In a way, data is key—without good, accurate information, a model will inevitably make mistakes. However, the problem is not just that each piece of data must be “good” on its own. It must also play well with the other pieces of data that we leverage. For example, a dataset should be balanced—that is, it should contain data across different settings of interest.

The interactions between data become more complicated when multiple datasets are combined. For example, two datasets might be “good” and “balanced” on their own, but misleading when combined. Indeed, suppose a weather forecast tells us it will rain tomorrow. If a second weather forecast also tells us it will rain, we become more confident that it will rain. But should we be more confident? What if it turns out that the two forecasts used the exact same meteorological measurements to produce their predictions? In this case, we should not have been more certain that it will rain after seeing the second forecast—we’ve fooled ourselves into being overconfident.

This principle extends to AI decisions, but in a subtler way. Consider a local credit union. Let’s say this credit union uses multiple AI models to make lending decisions. It uses one model to predict a mortgage’s performance based on an internal dataset. It then feeds those predictions (of who will default on their mortgage) into a downstream model. This downstream model considers this prediction and multiple other factors—like home valuation, income, and the type of loan being requested—to generate a range of reasonable interest rates for the borrower.

Here’s the problem: the dataset used to train the upstream model (the mortgage default predictor) might be similar to—or even coincide with—the dataset used to train the downstream model. If it does, the downstream model may generate biased recommendations and may also be overconfident in its recommendations (similarly to the weather forecast scenario we discussed above). And this is just one of many possible hidden interactions between data that can be misleading to the model—we’ll delve into more in future posts.

Takeaways and what’s next

Overall, as AI systems move out of research labs into deployment, we must pay a very close attention to the emerging AI supply chains. In particular, AI supply chains will be highly problematic for (at least) three reasons: (I) AI suffers from poor specification, (II) AI systems are non-modular, and (III) there are hidden data interactions between AI systems. And these three reasons bring with them a host of other problems: AI systems will become more difficult to audit and therefore to trust. Questions of accountability will arise too. To what extent will already known problems of AI, such as hidden biases and inability to explain AI-driven decisions, compound (and get exacerbated)? Who is liable for a harmful decision made by a downstream AI model composed of many non-modular parts? All in all, it is clear that the regulatory and policy initiatives intended to ensure that AI is safe, fair, and trustworthy cannot afford to overlook the AI supply chain. We will explore all these questions in future posts.

Deploying AI safely requires careful, comprehensive, and end-to-end consideration of the AI supply chain. At the moment, there is little discussion of the AI supply chain, perhaps because complex supply chains are such an ingrained part of the software industry (the industry spearheading AI development). However, as new AI systems (including generative AI) enter the picture, the issues highlighted in this post will continue to grow.

Our future posts will delve further into all of these issues. Stay tuned for our next post, in which we’ll consider: Who supplies the base AI systems? Will AI be controlled by just a handful of key players? How will the resulting dynamics shape our economy at large?

Read the previous post

Read our next post

A big thank you to David Goldston for his invaluable feedback on this piece.

Widder & Naffus (2022). Hopkins & Booth (2021). Bommasani, R. et al (2022).

Complexity in the context of supply chains is understood as “a large number of different elements (such as specific technologies, raw materials, products, people, and organizational units) that have many different connections to one another,” see Reeves, et. al. (2020)

Campos et al. (2022), Akin Ates et al. (2021), Bode and Wagner (2015), Serdarasan (2013), Ellison et al. (2010)

Why Romaine Lettuce Keeps Getting Recalled From E.coli Contamination. K. Kindy & J. Achenbach. Washington Post, 2019.

See other Subaru Outback recalls here.

Provenance here refers to tracking the history of an object or manufactured component to understand its development or origin.

While this is an interesting case study on its own, we note that in many ways this was possible due to the tech industry and special interest groups creating standards early on for issues of accessibility, web development, and more. It’s unclear whether this approach led to ideal outcomes.

Note that “poor specification” here is not to be confused with the—admittedly, related—technical term of “underspecification,” which typically refers to having more degrees of freedom in a model than pieces of evidence (i.e., data). D’Amour, Heller Moldovan, et al. (2022),

We refer to this particular case as an instance of spurious correlations.

Koh & Liang (2017); Feldman & Zhang (2020); Ilyas et al. (2022); Park et al. (2023)

Salman et al. (2022)

Thoughts on AI Policy

Discussion about this post