AI Semiconductor Supply Chain Explained: A Portfolio Manager’s Framework for Finding Mispricing

Andrea Bonini
May 4
17 min read

The article is divided into two sections:

A) AI Semiconductor Supply Chain Explained - an overview of the key segments in the AI semiconductor value chain.

B) How Mispricing And Opportunities Emerge Across the Semiconductor Supply Chain - a portfolio-manager oriented framework for identifying historic signals, where the market may overvalue or undervalue different segments.

A) AI Semiconductor Supply Chain Explained

A modern AI chip moves through a highly specialized supply chain in which each stage requires different expertise, capital investment, technical capabilities, and risk management. The steps are explained below.

1. Design and EDA

The process begins with design and electronic design automation, or EDA. Chip designers such as NVIDIA, AMD, Apple, or Broadcom define the product architecture: target workloads, performance requirements, power limits, memory needs, and system-level goals. In the case of an AI GPU, this determines how compute units are structured, how memory is accessed, how data moves across the chip, and how billions of transistors are logically organized.

EDA software providers such as Synopsys and Cadence provide the tools used to design, verify, and prepare the chip for manufacturing. These tools simulate timing, power, thermal behavior, and physical layout. They also help convert the chip design into detailed geometric patterns that can eventually be manufactured on silicon.

From a product management point of view, this is where customer needs are translated into technical requirements. The key question is not simply “how fast can the chip be?” but “does this product solve the customer’s workload problem at the right cost, power level, and time to market?”

2. Photomasks

Once the architecture and physical design are finalized, the chip company reaches tape-out, the point at which the final manufacturing data is released to the foundry and mask-making process. This effectively locks the design for manufacturing preparation. Late changes after tape-out can require new photomasks and additional requalification, making them costly and time-consuming.

Specialist photomask manufacturers such as Toppan, DNP, and Photronics convert the digital design into highly precise quartz plates called photomasks. These contain the patterns for different layers of the chip and are used during lithography to project circuit patterns onto silicon wafers. Photomasks must match the foundry’s process rules and include corrections that allow patterns to print accurately at extremely small dimensions.

3. Lithography

Lithography equipment suppliers, such as ASML, provide the scanners that transfer photomask patterns onto wafers. These machines act like ultra-advanced printers for semiconductor manufacturing. EUV lithography is used for the most advanced layers, while DUV lithography remains important for many others.

ASML does not manufacture chips itself, but its tools are essential for producing leading-edge semiconductors. This shows an important supply-chain principle: some companies create value not by making the final product, but by controlling a critical enabling technology.

4. Foundry / Wafer Fabrication

The semiconductor foundry, such as TSMC, Samsung Foundry, or GlobalFoundries, uses lithography equipment and many other process tools to manufacture wafers. Foundries build chips layer by layer through lithography, deposition, etching, cleaning, inspection, and other tightly controlled process steps.

Yield is one of the most important factors at this stage. Even if a wafer contains many potential chips, only a percentage of them may be fully functional. Yield and capacity directly affect how many good chips can be shipped and how much each one costs. For product managers, foundry selection involves trade-offs between performance, power efficiency, cost, process maturity, supply availability, and manufacturing risk.

5. Memory

For AI GPUs, a critical component is high-bandwidth memory, or HBM. Memory manufacturers such as SK Hynix, Micron, and Samsung supply the HBM dies that allow GPUs to access data fast enough for AI training and inference workloads.

Even if GPU demand is extremely strong, limited HBM supply can restrict final system shipments. This is one of the clearest examples of a supply-chain bottleneck: the final product may be constrained not by demand for the GPU, but by the availability of memory that sits beside it.

6. Packaging

After wafer fabrication, good GPU dies are combined with HBM stacks through advanced packaging. Providers such as TSMC, through technologies like CoWoS, as well as ASE and Amkor, perform packaging steps that bring multiple components together into a high-performance module.

Advanced packaging creates dense interconnects and short electrical paths, enabling high bandwidth and energy efficiency. In AI semiconductors, packaging is no longer a simple back-end step. It is a core part of product performance and a major capacity constraint.

7. Testing

Testing happens at several points across the supply chain rather than only at the end. After wafer fabrication, dies are electrically tested at wafer level to identify defective units before packaging. After packaging, modules are tested again to confirm they work correctly with HBM and interconnects. Later, systems are tested under real operating conditions.

Testing is essential because semiconductor defects can be extremely costly if discovered too late. For AI hardware, testing also includes performance, thermal behaviour, power stability, reliability, and compatibility with the broader system. From a PM perspective, testing protects launch quality, customer trust, and margin.

8. System Assembly

The packaged and tested modules are then integrated into full systems by electronics manufacturing service providers and original design manufacturers, such as Foxconn, Quanta, Wistron, and others. These companies mount modules onto boards and integrate them into servers or racks with power delivery, cooling, networking, storage, and management systems.

This stage turns packaged chips into usable hardware capable of running continuously in data-center environments. For AI infrastructure, system-level performance matters because the customer often buys not only a chip, but a complete compute platform.

9. Cloud Deployment and Software

Finally, cloud service providers and enterprise customers, such as Amazon, Microsoft, Google, Meta, and large AI labs, deploy these systems and run the software stack. In NVIDIA’s case, this includes CUDA, drivers, AI libraries, networking software, and developer tools.

This is where the hardware becomes monetizable AI compute capacity.

B) How Mispricing And Opportunities Emerge Across the Semiconductor Supply Chain

The semiconductor sector is often perceived as complex because it is fragmented into many specialized segments. In reality, its economics are governed by a simple but powerful principle: value is captured at the point of constraint, and mispricing occurs when the market misidentifies where that constraint sits. The supply chain is not uniformly fragile but instead fails at specific nodes.

To understand when companies become overvalued or undervalued, it is therefore not sufficient to follow demand narratives. One must instead observe whether the relevant operational KPIs - design readiness, yield progression, capacity tightness, or integration constraints - confirm or contradict the prevailing market expectations. In practice, these KPIs often move a few quarters ahead of changes in reported revenue, which is why they are so critical for portfolio positioning.

1. Design and EDA: complexity, not volume, drives value

In the design layer, including companies such as Synopsys and Cadence, the most important driver of value is not the number of chips being produced but the complexity of designing them. This distinction is critical because EDA demand is not driven simply by semiconductor unit growth.

It is driven by the number of design iterations, the number of verification steps, the difficulty of physical implementation, and the cost of avoiding errors before a chip is committed to manufacturing.

This is why the design-to-manufacturing boundary deserves disproportionate investor attention. Once a chip design is taped out, flexibility collapses. A design error discovered before tape-out may require some engineering time, but a design error discovered after tape-out can require new masks, new qualification work, and potentially several months of delay.

The clearest undervaluation case for EDA emerged between 2018 and 2020. At that time, the underlying KPIs were improving materially. Backlog and multi-year contract visibility were increasing, design activity was moving toward more advanced nodes, and chip complexity was rising sharply as AI, GPUs, custom accelerators, and high-performance computing designs became more important. Leading-edge chips were moving into the tens of billions of transistors, which made verification, timing closure, power integrity, thermal modelling, and manufacturability checks far more difficult.

The market initially underestimated this shift. What appeared to be a low-growth software segment was becoming a critical bottleneck in the semiconductor value chain. It becomes more valuable when design complexity rises faster than chip volumes, because customers cannot afford to reduce spending on the tools that prevent tape-out failure.

For context, EDA gross profit margins are generally above 80%, but due to significant R&D, the net operating margins are typically 30%, therefore this type of expenditure and ROI should be analysed carefully.

The investment lesson is that EDA becomes undervalued when the market focuses on semiconductor cyclicality but ignores increasing design complexity.

Conversely, EDA becomes vulnerable when new design starts weaken or customers pause advanced-node projects.

The relevant signal is not simply whether chip demand is strong today, but whether customers are continuing to commit to future complex designs.

2. Photomasks: low revenue share, high execution risk

Photomasks, produced by companies such as Photronics, Toppan, and DNP, typically represent a small portion of total semiconductor economics, often less than 5% of total chip cost. However, they are far more important than their revenue share suggests because they are the physical bridge between a verified digital design and wafer manufacturing.

A photomask is not just a printed version of the design. It is a highly precise quartz plate that contains the patterns used during lithography to expose wafers layer by layer. Every important layer of a chip requires a mask, and advanced chips can require dozens of masks. At older nodes, a chip may require roughly 40 masks. At advanced nodes such as 7nm or below, the number can rise toward 80+ masks, depending on the process and the number of multi-patterned layers.

This is where EUV becomes relevant to photomasks, as EUV changes what is required from the mask. In lithography, the scanner uses light to project the photomask pattern onto the wafer. With EUV, the wavelength of light is much shorter, which allows smaller features to be printed, but it also makes mask quality, defect control, pellicles, and optical correction much more demanding. In other words, EUV increases the precision requirements of the photomask because any defect or distortion on the mask can be transferred repeatedly onto wafers. A mask problem is therefore not a one-off defect, as it can replicate across many dies and destroy yield.

This is why mask demand and mask risk are linked to node execution rather than simply to semiconductor volume. During the early EUV transition between 2014 and 2016, the market expected advanced mask demand to rise quickly because EUV adoption was expected to accelerate. However, EUV insertion was delayed by several years compared with early expectations. As a result, the anticipated increase in EUV-related mask complexity was deferred. The market’s mistake was not believing that advanced masks would matter, but pricing the demand too early, before the node transition had been proven in production.

A better example of undervaluation occurred during the multi-patterning period between 2017 and 2019. Before EUV was widely inserted, foundries had to use multiple DUV exposures to print features smaller than a single exposure could reliably define. This increased the number of masks per chip and raised the complexity of mask making, even without a clean EUV ramp. The market often treated photomasks as a niche, low-growth supplier category, but the operational KPIs were moving in the opposite direction: more layers, more mask steps, more correction complexity, and higher re-spin risk.

For a PM, the key point is that photomasks are not usually the largest profit pool, but they are a schedule-risk amplifier. A leading-edge mask set can cost around $10-20 million, but the larger cost of a mask error is the lost time, lost foundry slot, delayed qualification, and potential launch slippage. Therefore, photomask companies should be monitored less like high-growth compounders and more like strategic risk indicators for node transitions and tape-out execution.

3. Lithography: structural bottleneck with cyclical distortions

Control over lithography, dominated by ASML, effectively determines the pace of scaling and the availability of leading-edge capacity. EUV scanners can cost approximately $150-200 million per unit, and annual EUV shipments have historically been measured in only a few dozen systems globally, which illustrates the extreme scarcity and strategic importance of this segment.

Taking a historic example, in 2021, the market priced in sustained demand for lithography tools, particularly EUV systems. ASML backlog rose above €30-40 billion, covering multiple years of production. However, the risk was that some of this demand reflected a capex cycle rather than purely structural growth. Customers were ordering aggressively because of semiconductor shortages, supply-chain uncertainty, and strong pricing encouraged capacity expansion. In the memory segment, capital expenditure expanded very quickly, well ahead of sustainable end-demand growth.

The key signal was that tool orders were growing faster than wafer demand. That matters because equipment companies sit early in the cycle. When customers expect strong demand, they place tool orders well in advance. But if final demand weakens or inventories build, those orders can be delayed, reduced, or stretched out. This is what happened as the cycle corrected in 2022-2023, when global semiconductor capex declined by roughly 20-30%, with weakness in memory, historically considered cyclical.

The overvaluation risk in lithography therefore does not come from doubting ASML’s structural position. ASML remains a unique chokepoint. The risk comes from paying a structural multiple at the top of a cyclical order book. A PM should separate the long-term monopoly value from the near-term order cycle.

Earlier, between 2012 and 2014, the opposite setup existed. ASML’s structural importance was not fully appreciated because EUV had not yet reached large-scale production. However, the industry’s dependence on advanced patterning was already visible. R&D spending was rising, the difficulty of scaling with DUV multi-patterning was increasing, and there were no credible alternative suppliers for the next lithography transition.

The lesson is that lithography should be valued through two lenses at the same time. Structurally, it is one of the strongest profit pools in semiconductors. Cyclically, order intake and backlog can still overshoot if customer capex expectations become too optimistic.

4. Foundry manufacturing: execution defines value

In wafer fabrication, represented by companies such as TSMC, Samsung Foundry, and Intel, the most important KPIs are yield, utilization, node progression, customer mix, and capex discipline.

A node refers to a generation of manufacturing technology, such as 10nm, 7nm, 5nm, or 3nm. The smaller the node, the more advanced the process generally is, although node names are no longer exact physical measurements. Moving to a new node usually allows better performance, lower power consumption, or more transistors per area, but only if the process can be manufactured with acceptable node yield.

Node yield measures the percentage of chips on a wafer that are functional and meet required specifications. This is one of the most important economic variables in semiconductors. If a wafer costs the same to process but fewer chips work, the cost per good die rises sharply. For example, if a wafer has 600 potential dies and yield is 80%, the company gets 480 good dies. If yield is only 60%, it gets 360 good dies. That is a 25% reduction in good output from the same wafer, and the cost per good die rises by roughly 33%. This is why yield problems can destroy margins before they are obvious in revenue.

To make the Intel and TSMC comparison consistent, the cleanest period to use is 2017-2020.

At the beginning of this period, many investors still expected Intel to maintain process leadership through its own chips production, but TSMC was increasingly proving that the foundry model for third-party chips was innovative and superior.

Intel’s 10nm delays were already visible through repeated timeline slippage and weak yield.

At the end, the 10nm node was delayed by roughly two years versus original expectations. This mattered because Intel’s historical advantage had been built on process leadership. Once process execution weakened, its vertically integrated model became less of a strength and more of a burden. Gross margins, which had historically been around the 60%+ level, came under pressure as execution issues, competition, and manufacturing inefficiencies accumulated.

Over the same 2017-2020 period, TSMC was showing the opposite signals. Utilization remained high, advanced-node revenue increased materially, and major customers such as Apple, AMD, NVIDIA, and others increasingly relied on TSMC for production. TSMC’s 7nm ramp was particularly important because it demonstrated that the company could execute at scale while customers moved more high-value designs to the foundry model. Despite heavy capital investment, TSMC has maintained gross margins above 50% by relentlessly upgrading to more sophisticated manufacturing nodes.

The mispricing was that investors were slow to adjust from a world where Intel’s internal manufacturing leadership was assumed, to a world where TSMC’s foundry expansion, customer breadth, and updated node reliability created the stronger platform. Intel was overvalued to the extent that the market underweighted yield and node execution risk. TSMC was undervalued to the extent that the market still treated foundry as cyclical manufacturing rather than as a strategic bottleneck for the entire fabless ecosystem.

For a PM, the key foundry rule is simple: architecture and customer demand matter, but yield determines whether that demand becomes profitable revenue. High utilization without good yield can destroy margins. Strong yield without demand creates underutilized fabs. The best foundry setup is when both move together: high utilization, improving yield, rising advanced-node mix, and disciplined capex.

5. Memory: from commodity cycles to structural constraints

Memory markets historically followed supply-demand cycles. This is especially true for DRAM and NAND, where products are more standardized and pricing responds quickly to changes in supply, demand, and inventory. Memory companies can look extremely cheap at the top of the cycle because earnings are temporarily inflated, and they can look expensive at the bottom just before the recovery begins. This is why a PM must focus less on spot earnings and more on pricing, inventory, supply growth, and capex discipline.

The 2017-2018 DRAM cycle is a useful overvaluation example. During that period, DRAM average selling prices rose sharply, in some cases more than 50% year over year, as demand from servers, smartphones, and PCs met a relatively consolidated supply base. Micron, Samsung, and SK Hynix benefited from very strong pricing and high margins. The market began to price these peak earnings as if they were more sustainable than they actually were.

The warning signals were visible before the decline. First, higher prices encouraged supply additions and higher capital expenditure. Second, customers began building inventory because they feared shortages and wanted to secure supply. Third, as inventories rose above normal levels - moving from a healthier range of roughly 4-6 weeks toward 8-12+ weeks in parts of the supply chain - the probability of a pricing correction increased.

When customers have too much inventory, they stop buying aggressively, even if end demand has not declined. That pause in orders can cause prices to fall quickly because memory suppliers still have high fixed costs and need to keep fabs utilized.

This is why prices declined in 2019. The market had mistaken a tight supply cycle for a permanent improvement in memory economics. Once supply caught up and customer inventories were too high, DRAM average selling prices declined by roughly 50% from peak levels, and earnings fell sharply. The overvaluation signal was the combination of peak average selling prices, rising inventories, and supply growing at a much faster pace than demand.

HBM (high-bandwidth memory) is different because it is still memory, but it does not behave like standard commodity DRAM in the same way. It is needed because AI accelerators and GPUs require enormous memory bandwidth to feed data into the compute engine. Standard DRAM is not sufficient for leading AI workloads because the bottleneck is how quickly data can move between memory and logic. HBM solves this by stacking DRAM dies vertically and connecting them with very high-bandwidth interfaces, placing memory physically close to the GPU, or accelerator, in advanced packaging.

In 2022, the market initially treated HBM companies largely through the old commodity-cycle lens. Micron and SK Hynix were still viewed as cyclical DRAM/NAND companies exposed to weak PC and smartphone markets. However, the HBM signals were moving in a different direction. AI accelerator demand was accelerating, HBM supply was limited, qualification barriers were high, and not every DRAM supplier could immediately produce high-quality HBM at scale. HBM represented a small share of total memory bits, but a much larger share of incremental profit growth.

The key investment insight was that HBM changed the mix. Compared with standard DRAM, HBM carries a much higher average selling price, more complex manufacturing and stronger customer stickiness because it must be qualified with GPU and accelerator platforms. Pricing premiums versus commodity DRAM could reach several times standard DRAM pricing, while supply additions were constrained by stacking capacity, advanced packaging availability, and customer qualification timelines.

SK Hynix in particular benefited because it was early and strong in HBM supply for AI accelerators.

The PM conclusion is that memory should not be analysed as one homogeneous market. Standard DRAM and NAND remained cyclical. HBM, while still exposed to supply cycles, can temporarily behave like a bottleneck asset when AI demand exceeds qualified supply. The buy signal appears when pricing, mix, and supply tightness improve before consensus earnings reflect them. The sell or avoid signal appears when customers build inventory, suppliers add too much capacity above expected demand, and average selling prices peak.

6. Packaging: from commoditized step to system bottleneck

Traditional packaging historically represented a low-value part of the semiconductor chain. It was labour-intensive, competitive, and generally generated operating margins in the 5-10% range for many OSAT providers. For a long time, the market was broadly correct to treat standard packaging as lower quality than EDA, lithography, process control, or leading-edge foundry.

The mistake is assuming that all packaging is the same. Advanced packaging is economically and strategically different from traditional wire bond or basic flip-chip packaging. In advanced AI systems, the package is no longer just a protective shell around the chip. It becomes part of the performance architecture because it determines how closely logic and memory can be integrated, how much bandwidth the system can achieve, how much power is consumed moving data, and how many functional dies can be combined into one module.

This became clear from 2023. NVIDIA GPUs and other AI accelerators did not only require leading-edge wafers from TSMC. They also required HBM stacks and advanced packaging capacity, particularly CoWoS-type processes that place the GPU and HBM close together on an interposer. Without enough CoWoS or equivalent advanced packaging capacity, finished wafers could not become shippable AI modules.

CoWoS capacity was estimated to grow from roughly 10-15k wafers per month in 2022 to more than 40-50k wafers per month by 2025, yet demand still exceeded supply because AI accelerator demand grew even faster. Lead times extended beyond 9 months, and customers competed for packaging allocation. This was a strong signal that packaging had become a revenue bottleneck.

The market initially underappreciated this because packaging had historically been treated as commoditized. That created an undervaluation opportunity in companies and suppliers exposed to advanced packaging capacity, substrate supply, and test/assembly intensity.

Packaging capacity was limiting shipment growth for the highest-value AI chips.

For a PM, the key distinction is between legacy packaging and advanced packaging. Legacy packaging should still be valued as a lower-margin, cyclical service business. Advanced packaging deserves a different framework when it becomes the binding constraint for high-performance systems. The relevant KPIs are advanced packaging capacity, utilization, lead times, substrate availability, HBM integration yield, and customer allocation language.

If CoWoS capacity expands faster than AI system demand, pricing power and urgency decline. Therefore, the undervaluation case depends on sustained tightness, not simply on the fact that advanced packaging is important.

7. Testing: complexity-driven demand

Testing, performed by companies such as Teradyne and Advantest, was often misunderstood as being driven mainly by semiconductor unit volume. In reality, the more important driver is complexity. More complex chips require longer test times, more advanced testers, more steps, more thermal testing, more validation, and more binning (sorting). This is especially true for AI accelerators, GPUs, HBM, and advanced packaged modules.

Test demand does not automatically grow with units. It grows when units become harder to test.

Advantest is one of the leading suppliers of semiconductor test equipment, with particular strength in high-end SoC (System-on-Chip) testers and memory testers. Its growth drivers improve when chips become larger, more expensive, more heterogeneous, and more performance-sensitive. AI accelerators are exactly that type of product. They combine large logic dies, HBM stacks, advanced packaging, and demanding performance requirements. A failed chip or failed package is extremely costly, so customers need more intensive testing to identify known-good dies, validate HBM stacks, test high-speed interfaces, and ensure the final module can operate reliably.

AI chips can require test times that are three times longer than simpler chips, depending on architecture and test coverage. HBM also increases memory test complexity because it is not just a standard DRAM die, but it is a stacked memory product that must meet demanding bandwidth and reliability requirements. Advanced packaging adds another layer of complexity because companies may need to test at wafer level, die level, package level, and system level.

For Advantest, this means growth is driven not only by semiconductor volumes but by test intensity per device. When AI accelerator demand rises, the number of tests per chip, the value of each tester, and the need for advanced test solutions can all increase. This explains why Advantest can benefit disproportionately from AI even though it is not designing GPUs or manufacturing wafers.

If AI chips become more complex, test equipment demand can grow faster than wafer volumes. Conversely, if customers delay new accelerator ramps, or if test capacity is overbuilt, tester demand can correct. The investment case should therefore monitor AI accelerator ramps, memory tester demand, Advantest order trends, customer concentration, and whether growth is coming from sustainable test intensity or temporary capacity ordering.

Final synthesis

Mispricing occurs when investors misinterpret which signals matter at a given point.

The semiconductor supply chain does not operate uniformly. At any given time, one segment acts as the binding constraint, determining whether demand translates into revenue and margins.

These constraints typically show up in measurable ways. Yield below 75% signals execution risk. Utilization below 75% signals weak demand. Lead times above 6 months signal bottlenecks. Inventory above 8-10 weeks signals oversupply.

Understanding where this constraint sits - and whether the relevant KPIs confirm or contradict the prevailing narrative - is the key to identifying both overvaluation and undervaluation.

AI Semiconductor Supply Chain Explained: A Portfolio Manager’s Framework for Finding Mispricing

Recent Posts

Comments