As the industry continues to pursue smaller, denser, cheaper, and faster semiconductor devices, the spirit of Gordon Moore will continue to exist. The primary venue for understanding these developments is the International Electron Devices Meeting (IEDM), and today we would like to provide an overview of the future of semiconductor device miniaturization this year.
We will discuss some topics where artificial intelligence is more than just a buzzword (although it is often a buzzword), including Intel's innovative work in using diffusion models to improve process yield.
The main topics covered will be a high-level logic review of the progress beyond 2 nanometers in 2D materials, CFET, and backside power supply by TSMC, Intel, and Samsung. Applied Materials has demonstrated its new suite for metal interconnects in 2-nanometer and higher processes, which could drive share growth.
Another of the most exciting areas is storage. Micron has launched a non-volatile FeRAM with a density higher than the world's densest DRAM and performance within an order of magnitude; SK Hynix has demonstrated their HBM4 hybrid bonding plan, flip-chip MR-MUF with TCB plan; Samsung's plan to achieve more than 1,000 layers of NAND through various wafer stacking forms; Kioxia has shown the world's densest mass production-level NAND and its CBA method.
Advertisement
Below, let's delve into the main text.
Intel's Generative AI to Improve Process Yield
Intel has demonstrated early work on deep generative models for predicting device variations. The complexity of each generation of chips has grown far beyond the number of transistors, and the number of Cadence emulation/simulation boxes continues to explode. Nvidia is trying to introduce GPUs to improve this process.
Existing EDA benefits from a virtuous cycle, where increased computing power enables better modeling, which in turn further improves computing power. In a sense, it is the same as the generative AI scaling law, although much milder at present. The use of artificial intelligence to design better AI accelerator chips is rapidly evolving, with Nvidia and Google leading the way.
Introducing generative AI into process and device modeling is obviously the first step, as it is an extremely data-intensive task, and chip manufacturers have access to a large amount of high-quality (relative to other applications) datasets at any time. The benefits of higher process yield and faster cycle times are easily quantifiable and can be translated into revenue.
Although still in the early stages of development, Intel has shown promising results for the implementation of GenAI models. Initial tests were conducted using two different types of models: generative adversarial networks (GAN) and diffusion models.GAN models are a popular architecture commonly used for image, text, and audio generators, requiring synthetic samples that are very similar to real ones. They consist of two deep neural networks: the generator and the discriminator. The generator creates fake samples from random noise. These fake samples, along with real ones, are input into the discriminator, which attempts to distinguish between the real and fake. Essentially, the generator tries to deceive the discriminator, hence the adversarial part of the Generative Adversarial Network.
Through training, the quality of the samples output by the generator will approach and even be indistinguishable from the real ones. However, GAN models are prone to mode collapse. This means that their outputs cannot replicate the entire space of the input distribution function; in simpler terms, this means the outputs tend to look similar. While this is not an issue for many popular consumer applications (such as image generation), it is not feasible for chip design and process modeling.
The key difference lies in the fact that, in this model setup, the process yield is defined by the long tail of the distribution, and failing to replicate these means the model cannot accurately predict the yield.
Diffusion networks are more suitable for this task. Real samples with added noise are used to train the model, which learns to denoise them. Crucially, the diffusion networks in this application can replicate the long tail of the sample data distribution, providing an accurate prediction of the process yield.
In Intel's research, the SPICE parameters used during the design phase as part of device simulation were used as inputs for the deep learning model. Its output is the predicted electrical characteristics or ETEST metrics at the time of device fabrication. The results show that the model can correctly predict the distribution of ETEST metrics. The circuit yield is defined by the tail of this distribution. Therefore, by correctly predicting the distribution of ETEST metrics, the model can accurately predict the yield.
The potential here is clear: better optimization of chip yield at the design stage means lower costs. Fewer mask redesigns, shorter development times, and ultimately higher yields are powerful differentiators for foundries and design teams that can implement the model into their PDK/design process.Intel's current work is in the research phase, but it is expected that all major wafer fabs and design companies will commit to industrializing similar technologies. These foundational data are closely guarded, making it very difficult for startups and even fabless design companies to access all the data. In this sense, Intel has an advantage as an IDM. If these data can be obtained, then this is the best place for entrepreneurs to start companies.
Logic Scaling: 2D Materials
For many years, logic scaling has been at the core of the industry. Although the pace of recent scaling has slowed, it remains one of the key drivers of the continuous improvement of the semiconductor economy. IEDM has traditionally been a venue for chip manufacturers to showcase progress in their process roadmaps.
Current development work is focused on two areas: traditional horizontal scaling in the x and y directions and 3D stacking (z direction).
For horizontal scaling, Gate-All-Around (GAA) will enable the "2nm" node to continue to shrink, just as FinFET is losing momentum. These 2nm nodes will enter mass production at Intel and TSMC in 2025. Samsung's 3nm also has full gate transistors, but despite claims of mass production, they have not shipped any fully functional chips, even in their own smartphones.
Much of the new development is focused on further shrinking the GAA architecture, as existing materials will be depleted by the end of this century. This will require a shift to exotic "2D" materials—first, transition metal dichalcogenide (TMD) monolayers, followed by carbon nanotubes.
In the vertical direction, the first stacked transistor architecture is imminent. We will delve into each idea in more detail when we introduce updates from TSMC, Intel, and Samsung.
2D channel materials are expected to be one of the next steps in the development of GAA architecture. Initially, GAA processes will use silicon (Si) channels, the same as traditional FinFETs. However, as contact resistance and parasitic capacitance of silicon channels increase at smaller sizes, new materials with better electrical properties will be needed to continue scaling down. Once the 10A (1nm) node arrives, around the 2030 timeframe, this shift may be necessary.
TMD monolayers, commonly referred to as "2D materials," have long been considered to have the required properties due to their thickness of only a few atoms; as the industrialization of two-dimensional material manufacturing processes advances, chip manufacturers seem to have focused on TMDs. It should be emphasized that it is not carbon nanotubes, which are often considered the holy grail, but MoS2 for N-type metal-oxide-semiconductor (NMOS) and WSe2 for P-type metal-oxide (PMOS) devices.These materials are only a few atoms thick, and manufacturing them is certainly challenging, with people competing to find reliable methods for large-scale production.
TSMC demonstrated a working nanosheet FET (NSFET) made with a single nanosheet channel. They also demonstrated the ability to build 2 stacked nanosheets, but did not mention any working transistors built on these nanosheets. The key point is that two-dimensional materials are grown directly by chemical vapor deposition (CVD), rather than using an additional thin film transfer step as before.
Growth is a fundamental issue for two-dimensional materials. There is currently no solution that can reliably grow two-dimensional materials over a non-negligible surface area.
TSMC also demonstrated a novel "C" shaped contact scheme, which is a method to reduce contact resistance (lower contact resistance means better device performance), because the "C" shaped contact surrounds the channel, providing a larger contact area, thereby reducing resistance.
TSMC only detailed NMOS devices, while Intel demonstrated working PMOS and NMOS devices with TMD channels. In addition, Intel manufactured these devices on a 300mm wafer pilot line, not just at the laboratory scale. At least in terms of the research presented, Intel is far ahead of TSMC in the 2D materials race.
However, it is worth noting that these are all simple planar transistors, not utilizing GAA architecture, and are not manufactured according to the pitch required for the 14A+ node a few years later.
Surprisingly, Samsung has hardly made any statements on 2D materials. Dr. Choi, President and General Manager of Samsung Foundry Business, mentioned the possibility of expanding GAA size with 2D channel materials, but did not publish technical papers on this topic. Despite being a "pioneer" of GAA, they seem to let others pave the way in 2D.
Strangely, according to the presentation at IEDM, Samsung still seems to have not figured out which of the three types of back-end power supply schemes they want to adopt, while Intel and TSMC have obviously determined their roadmap.No matter what progress has been made, currently we are in the long tail of horizontal expansion: each step brings fewer benefits, and the development time is longer than before. 3D stacking is the exact opposite, which is a new technology with the potential for 1.5-2 times density scaling in just the first generation.
Traditionally, a chip contains one layer of NMOS and PMOS, and the necessary connections are built on top of it. The advancement of manufacturing technology and the necessity to go beyond horizontal scaling means that building multiple layers of transistors on top of each other is becoming possible.
Logic scaling: CFET
The first natural step is to stack 1 NMOS + 1 PMOS transistor because they can be connected together to form an inverter or a NOT gate, which is the basic building block of digital circuits. Producing more complex standard cells will also be difficult. TSMC has released a wonderful illustration of this concept, as well as a composite image showing the real thing with transmission electron microscopy (TEM) images.
Last year, most of the work in this field was demonstrated by university laboratories. This year, all major logic manufacturers (as well as IMEC) have shown results led by their internal R&D organizations, which is a solid step towards commercialization. 3D stacking may be introduced around the 10A node in the time frame of around 2030.
Overall, these four methods seem to be converging in terms of architectural decisions and manufacturing schemes.
Intel's integrated scheme is particularly interesting and worth emphasizing because it not only shows CFET, but also the backside contact power supply for NMOS and the PowerVia backside power supply for PMOS. When using CFET, the power transmission problem becomes extremely difficult.
Logic scaling: thermal limits and Dennard scalingA key area to watch in the future will be thermal performance. We have seen more than one paper from chip manufacturers on scaling enablers (3D transistor stacking, backside power supply, advanced packaging, etc.) claiming that thermal performance has not degraded. AMD published a paper that very clearly states from a customer's perspective that thermal issues require additional attention.
AMD's simulations show that performance can drop by up to 5% when using backside power supply, as the chip must be throttled to avoid overheating. The culprit is the wafer thinning and bonding process. While manufacturing backside devices is necessary, it has the unfortunate side effect of significantly reducing the thermal conductivity of silicon near the device, meaning the device cannot dissipate heat effectively.
3D packaging also encounters the same problem when bulk wafer thinning is required: performance loss of up to 5% due to throttling at hot spots.
Please note that logical scaling may exacerbate this issue, as it has a compounding effect on heat generation. Not only does resistance increase with device scaling, thereby increasing heat generation, but the transistor density also increases, resulting in more heat generated in a given area. Dennard scaling has been problematic for a long time, but it becomes more of an issue with each shrink. Further scaling technologies such as CFET, 3D stacking, and backside power supply exacerbate these issues.
There are some interesting implications to this result. First, the chip design process must begin to treat these issues as "first-order problems" and use tools that allow designers to mitigate these issues; second, manufacturing methods should also address thermal challenges. According to several designers we have interviewed, the current EDA tools provided by Cadence and Synopsys cannot yet address the related issues.
Logical scaling: 3D stacking
We only saw one paper focusing on the latter topic, which is using advanced packaging to combat out-of-control thermal density through common 3D stacking, which may be the perfect solution to the problems demonstrated by AMD. TSMC demonstrated two methods to cope with the increase in power density, both of which attempt to improve the thermal conductivity at the wafer-to-wafer bonding interface, where thinning silicon performs poorly.
First, place virtual copper heat dissipation vias - essentially small "heat pipes" to conduct heat away from hot spots. This shows excellent thermal performance, but since copper also conducts electricity, this method will have a negative impact on electrical performance, even though it is not connected to signal interconnects.The second, and more promising, approach is the use of a thermal conductive layer between bonded wafers. Currently, wafers are bonded together through SiO2 bonding. Replacing it with an inter-layer dielectric (ILD) that has a high thermal conductivity can improve heat dissipation without adverse electrical effects.
The benefits of ILDs are clear, but they are not easy to produce. Two candidate materials were demonstrated: AlN and diamond. TSMC has demonstrated both technologies in a laboratory environment, producing sub-micrometer thickness with sufficiently high thermal conductivity, making them feasible.
Although this process seems not to have been industrialized yet, it is worth paying attention to, considering the aforementioned issues. It is very noteworthy that the conference did not give it more attention, perhaps it will be given more attention at ISSCC or VLSI.
From a manufacturing perspective, it may make sense to first replace pure fusion bonding (such as fusion bonding in backside power delivery), rather than fusion bonding in hybrid bonding that may have bonding issues.
Logical extension: Interconnects/BEOL
While device scaling seems to have everyone's attention, backend (BEOL) scaling is equally, if not more, important. Increasing transistor density is useless if signals and power cannot be effectively routed to the transistors. One of the biggest challenges is translating the theoretical increase in transistor density into an actual increase in wiring density on the device.
A key challenge in scaling these interconnects is the increasing resistance as the "wires" shrink. In fact, this challenge could disrupt the entire process node: Intel's long-term struggle at the 10-nanometer node was largely due to the attempt to switch from copper interconnects to cobalt interconnects in the lowest metal layer. Although cobalt has lower resistance than traditional copper at that pitch, many problems arose in the implementation process, which ultimately led them to abandon this option.Poor back-end scaling design decisions can cause significant value destruction for chip manufacturers. Therefore, new interconnect materials and manufacturing schemes are worth paying attention to.
Applied Materials and IMEC have both demonstrated their scaling interconnect solutions. Applied Materials first introduced a titanium nitride liner + tungsten fill in 2022 to create smaller, lower-resistance interconnects. This year, they noted that the process is now in high-volume production at a major logic manufacturer. Building on this, Applied Materials has introduced an all-tungsten interconnect scheme, which is expected to further extend capabilities.
The demonstration was clearly technical marketing, but the TSMC and Intel personnel in the room were paying very close attention and asked very good questions.
It is worth noting that the scheme can be completed in situ using Applied Materials' Endura tool, which means that the wafer is not exposed to the fab environment when building the interconnect. Exposure to oxygen due to interconnect oxidation can lead to performance degradation, so maintaining the same vacuum at all times means better results: more than 20% lower resistance than ex-situ processes.
Applied Materials can bundle many tools of individual process modules in ways that other companies cannot, which gives them room to gain share from other etch, clean, and deposition providers in the early layers of the back-end production line (i.e., one of the most expensive production lines).
The future of memory scaling: 3D DRAM
The memory requirements for computing and storage in the era of artificial intelligence are exploding. The huge memory wall is limiting progress, as Micron pointed out in a plenary session, data growth is accelerating on a trajectory similar to computing demand, and both slopes are becoming steeper and steeper.
As with logic, memory scaling needs to continue to meet the growing data demand in an economical way. Doing so requires progress in many areas. The logic used to control the memory array needs to scale accordingly, and FinFETs will appear on the roadmap by the end of this century.Packaging technology will also play a role, as denser integration of memory and computation can achieve better system-level performance.
The final aspect is the memory array itself, where the key turning point is the introduction of 3D DRAM. Some background knowledge is needed here: traditionally, DRAM memory arrays consist of vertical capacitors. Like transistors and logic, memory scaling has largely been achieved by simply making the devices smaller. DRAM capacitors are typically tall and narrow cylinders. Reducing their diameter allows them to be packaged more densely, but this means they must be taller to maintain sufficient capacitance—in other words, their aspect ratio must increase.
Today's DRAM arrays have extremely high aspect ratios, making them very challenging to manufacture, just as horizontal scaling reaches the physical limits of logic. Producing these mainly involves maintaining uniformity while scaling down horizontally and continuously increasing the aspect ratio.
At some point in the future, scaling will require 3D DRAM. The concept is simple: if capacitors cannot be made smaller/taller, place them horizontally and stack a large number of capacitors together.
The significance of this shift lies in the difference in manufacturing methods. 3D may require a 50% reduction in photolithography usage compared to existing planar DRAM, and a significant increase in etching and deposition tools. A similar rebalancing occurred in the transition from 2D to 3D NAND memory, which will have a strong impact on the DRAM device supply chain—when the memory cycle peaks again in 2025, the market size will be around $30 billion.
So, the key is when the transition will occur. Micron's keynote speaker referred to it as a "typical problem," giving a serious warning version of "within 10 years." It is clear that no major memory manufacturer would present a serious 3D DRAM paper at IEDM, as this is a race that will change market share. This year, Macronix launched some products on the subject, but Samsung, SK Hynix, or Micron did not. The IMEC roadmap example shared by Micron Technology shows a vague timeline between 2030 and 2035. In other words, this will not happen in the short term.The Future of Memory Expansion: SK Hynix HBM 4 and MR-MUF
SK Hynix has introduced HBM packaging on multiple occasions, including the most comprehensive overview of its MR-MUF technology. To recap, MR-MUF stands for "Mass Reflow - Molded Underfill," and SK Hynix used TC-NCF (Thermo Compression - Non-Conductive Film) in HBM2e.
As the name suggests, MR-MUF uses the traditional flip-chip mass reflow soldering process to stack chips and form joints. In contrast, TCB requires a separate bonding process for each layer in the stack, resulting in much higher throughput because it is a batch process (the entire stack undergoes reflow soldering at once).
Not only does MR-MUF improve productivity, but it also leads to higher-performance HBM. Between the chips, epoxy molding compound is used as a gap-filling material, which has a much higher thermal conductivity than the non-conductive film in TC-NCF. Considering the importance of thermal management for high-power chips like GPUs, this reduces the junction temperature, which is a significant benefit for customers.
Hynix has addressed some challenges more deeply through MR-MUF, and so far, Hynix is the only supplier that has overcome these challenges. They co-designed these materials with suppliers and have exclusivity over them.
The first challenge is controlling chip warpage, especially for very thin chips with high independent stacking. If the warpage is too large, it can lead to incorrect joint formation. The advantage of TCB is that it can better address warpage issues, which is why TCB became the first technology for HBM packaging.This is also why Intel's unique use of TCB in packaging over other OSATs and the foundry packaging ecosystem is more extensive. As this is part of their secret weapon, details are scarce, but Hynix's approach is to deposit a prestressed film on the back of the wafer to control warpage. Intel's method is similar but different, and also has patents on its process.
Another challenge is the dispensing of EMC to fill the gaps between chips and ensure there are no voids. The role of underfill is to provide structural support to the bumps, but voids in the underfill weaken this support. Denser bumps and narrower gaps make the dispensing of underfill for HBM more challenging.
To address this issue, Hynix optimized the mold and found that the pattern of EMC dispensing is also crucial. It turned out that using a mold with the chip face up inevitably leads to voids, so a custom face-down mold must be used. In addition, certain dispensing patterns result in fewer voids, such as the Serpentine Imp.2 pattern on the far right of the figure. Another thing is to ensure that EMC is not placed between the stacks, which would reduce airflow, causing trapped air in the structure, thus creating voids.
In fact, there are more advanced technologies presented at IEDM, which we will introduce in more detail later.
Post a comment