Newcomers in AI chips, each with their own tricks.

The recent two-year boom in artificial intelligence has allowed Nvidia to ascend to the pinnacle of the chip industry with its GPUs. Consequently, both traditional and emerging chip companies, including AMD, Intel, Graphcore, Cerebras, and Tenstorrent, are attempting to dethrone Nvidia in this domain.

However, despite their best efforts, it seems they are still unable to make a significant impact on Nvidia. As a result, a new group of AI chip startups has emerged in the market, hoping to use different architectures and approaches to displace Nvidia from its pedestal.

Below, we take a look at some of the more popular challengers recently.

Every Model Needs a Corresponding AI Chip

This is the perspective of Taalas founder Ljubisa Bajic. A hint: Ljubisa Bajic also has another identity, which is that he is also the founder of Tenstorrent and a former close partner of Jim Keller.

A year after leaving Tenstorrent, Ljubisa Bajic finally brought his new company to the fore recently.

Ljubisa Bajic stated that even today's dedicated AI chips are too generic to meet their needs. His new startup, Taalas (which means locksmith in Hindi), promises to break through the efficiency barrier by several orders of magnitude by developing architectures and chips specifically tailored to particular models.

Advertisement

According to the introduction, the new company has raised $50 million through two mini-rounds of funding ($12 million and $38 million) from Quiet Capital and Pierre Lamond. Based on their vision, silicon can be further optimized during manufacturing to suit specific models. Although artificial intelligence and machine learning are rapidly developing in both software and hardware, we are beginning to see a trend of "good enough" models, and dedicated computing paths do indeed herald a more specialized and efficient chip approach.

We believe that Taalas will ultimately use an enhanced configurable hardware — existing between truly fixed-function ASIC/DSP or fully reconfigurable hardware solutions (such as FPGA or CGRA) (both of which have found their niche in the field of artificial intelligence). Many chip design companies in this field operate eASIC (i.e., structured ASIC) businesses, where the underlying hardware is configurable but can be locked into a given configuration upon final manufacturing. This allows the manufacturing process to still create general-purpose programmable chips but can reduce the reconfigurable overhead when deployed to the customer market.According to Taalas, this addresses two major issues with today's artificial intelligence hardware—efficiency and cost. The anticipated ubiquity of machine learning in everyday consumer life will be as omnipresent as electricity, thus it will exist in everything from cars to white goods to smart meters and all that can be electrified in the stack. To meet the demands for cost, computational power/efficiency, and the fact that some/most of these devices will never be connected to the internet, the hardware needs to be dedicated and fixed at the time of deployment. This only happens when the computational workload is fixed (or simple), and Taalas and Ljubisa believe this is an upcoming frontier (if not already here today).

In a press release, Ljubisa Bajic stated: "Artificial intelligence is like electricity—a basic commodity that needs to be made available to everyone. The commodification of AI requires a 1000-fold increase in computational power and efficiency, a goal that cannot be achieved with current incremental approaches. The way forward is to realize 'we should not simulate intelligence on general-purpose computers, but rather inject intelligence directly into silicon. Implementing deep learning models in silicon is the most direct path to sustainable artificial intelligence.'"

Taalas is developing an automated process for rapidly implementing all types of deep learning models (Transformers, SSM, Diffusers, MoE, etc.) in silicon. The proprietary innovation allows a single chip to accommodate an entire large AI model without external memory. The efficiency of hardwired computation enables the performance of a single chip to surpass that of a small GPU data center, paving the way for a 1000-fold reduction in AI costs.

"We believe Taalas's 'direct to silicon' foundry has achieved three fundamental breakthroughs: a significant reset of today's AI cost structure, a feasible increase in model size by 10-100 times next, and the efficient local operation of powerful models on any consumer device. Quiet Capital partner Matt Humphrey said: 'For the scalability of the future of artificial intelligence, this may be the most important mission in the field of computing today. We are proud to support this outstanding n-of-1 team to accomplish this.'"

In short, if you need to use the Llama2 model with 7B parameters in your product, and the company determines that this is all it needs for its entire lifecycle, then the dedicated hard-core Llama2-7B chip and model device with the lowest power consumption and lowest cost for that handheld device is all you might need.

It is understood that the Taalas team is located in Toronto, Canada, with expertise from AMD, NVIDIA, and Tenstorrent. The company will launch its first large language model chip in the third quarter of 2024 and plans to provide it to early customers in the first quarter of 2025.

South Korean AI Chip: Significant Reduction in Power Consumption and Size

A team of scientists from the Korea Advanced Institute of Science and Technology (KAIST) detailed their "Complementary-Transformer" artificial intelligence chip at the recent 2024 International Solid-State Circuits Conference (ISSCC). The new C-Transformer chip is said to be the world's first ultra-low power AI accelerator chip capable of processing large language models (LLMs).

In a press release, the researchers boldly challenged Nvidia, claiming that the C-Transformer's power consumption is 625 times lower than that of the green team's A100 Tensor Core GPU, and its size is 41 times smaller. It also suggests that the achievements of Samsung's foundry chip are largely due to sophisticated neuromorphic computing technology.Although we were informed that the KAIST C-Transformer chip can perform the same LLM processing tasks as one of Nvidia's powerful A100 GPUs, no direct performance comparison metrics were provided in the news or conference materials. This is an important statistic, and its absence is conspicuous, leading cynics to speculate that a performance comparison would not benefit the C-Transformer.

The image above features a "chip photo" and a summary of processor specifications. You can see that the C-Transformer is currently manufactured using Samsung's 28nm process, with a chip area of 20.25mm². Its maximum operating frequency is 200 MHz, and power consumption is below 500mW. Under the best conditions, it can achieve 3.41 TOPS. On the surface, this is 183 times slower than the 624 TOPS claimed by Nvidia's A100 PCIe card (but the KAIST chip is said to use 625 times less power). However, we prefer some benchmark performance comparison rather than looking at the claimed TOPS of each platform.

The architecture of the C-Transformer chip looks interesting, featuring three main functional blocks: First, the Homogeneous DNN-Transformer / Spiking-transformer Core (HDSC) and Hybrid Multiplication-Accumulation Unit (HMAU) can effectively handle dynamically changing distribution energy. Second, there is an Output Spike Speculation Unit (OSSU) to reduce the latency and computational load of spike domain processing. Third, researchers have implemented an Implicit Weight Generation Unit (IWGU) with Extended Sign Compression (ESC) to reduce the energy consumption of external memory access (EMA).

It is explained that the C-Transformer chip does not just add some off-the-shelf neuromorphic processing as its "special sauce" for compressing the large parameters of LLMs. The press release from the Korea Advanced Institute of Science and Technology (KAIST) stated that previously, neuromorphic computing technology was not accurate enough for the use of LLMs. However, the research team said that it "successfully improved the accuracy of the technology to match [Deep Neural Networks] DNNs."

Despite the uncertainty in the performance of the first C-Transformer chip due to the lack of direct comparison with industry-standard AI accelerators, there is no doubt that it will be an attractive choice for mobile computing. It is also encouraging that researchers have made such great progress using Samsung's test chips and extensive GPT-2 testing.

Transforming AI with Chips

Recently, Princeton University's advanced AI chip project, supported by DARPA and EnCharge AI, is expected to significantly improve energy efficiency and computing power, aiming to transform the accessibility and application of artificial intelligence.

Naveen Verma, a professor of electrical and computer engineering at Princeton University, said that the new hardware redesigns AI chips for modern workloads and can run powerful AI systems with much less energy than today's most advanced semiconductors. Verma, who leads the project, said that these advancements break through the key barriers hindering the development of AI chips, including size, efficiency, and scalability.

"The best AI only exists in data centers, and there is a very important limitation," said Verma. "I think, you unlock it, and the way we get value from AI will explode."In a project led at Princeton University, researchers will collaborate with Verma's startup, EnCharge AI. Headquartered in Santa Clara, California, EnCharge AI is commercializing technology based on discoveries made in Verma's laboratory, including several pivotal papers he co-authored with electrical engineering graduate students as early as 2016.

According to the project proposal, EnCharge AI is "at the forefront of developing and executing a powerful and scalable mixed-signal computing architecture." Verma co-founded the company in 2022 with former IBM Fellow Kailash Gopalakrishnan and leader in semiconductor system design, Echere Iroaga.

Gopalakrishnan stated that as artificial intelligence began to demand a significant amount of new computational power and efficiency, innovation in existing computing architectures and improvements in silicon technology began to slow down. Even the best graphics processing units (GPUs) used to run today's AI systems cannot alleviate the memory and computational energy bottlenecks faced by the industry.

"Although GPUs are the best tools available today," he said, "we concluded that a new type of chip is needed to unleash the potential of artificial intelligence."

Verma, the director of the Keller Center for Innovation in Engineering Education at Princeton University, stated that from 2012 to 2022, the computational power required for AI models has grown exponentially. To meet the demand, the latest chips encapsulate hundreds of billions of transistors, each with a width only as wide as a small virus. However, the computational power of these chips is still insufficient to meet modern needs.

Today's leading models combine large language models with computer vision and other machine learning methods, with each model using over a trillion variables for development. The GPUs designed by Nvidia, which has driven the AI boom, have become extremely valuable, with reports that major companies transport them in armored vehicles. The backlog for purchasing or leasing these chips has reached the point of disappearance.

To create chips that can handle modern AI workloads in compact or energy-constrained environments, researchers must completely re-envision the physical principles of computation, while designing and packaging hardware that can be manufactured using existing manufacturing techniques and can work well with existing computing technologies, such as central processing units.

"The scale of AI models is growing explosively," Verma said, "which means two things." AI chips need to become more efficient in mathematical calculations and also more efficient in managing and moving data.

Their approach is divided into three key parts.

The core architecture of almost every digital computer follows a seemingly simple pattern first developed in the 1940s: storing data in one place and performing calculations in another. This means transferring information between storage units and processors. Over the past decade, Verma has pioneered a newer approach where calculations are performed directly within memory units, known as in-memory computing. This is the first part. In-memory computing is expected to reduce the time and energy cost required to move and process large amounts of data.So far, the digital methods for in-memory computing have been very limited. Verma and his team turned to another approach: analog computing. That's the second part.

"In the special case of in-memory computing, you not only need to perform calculations efficiently," said Verma, "you also need to perform calculations at a very high density because it now needs to fit into these very small memory units." Analog computers do not encode information as a series of 0s and 1s, and then process that information using traditional logic circuits, but instead utilize the richer physical properties of the devices.

Digital signals began to replace analog signals in the 1940s, mainly because as computing grew exponentially, binary code could scale better. But digital signals did not delve into the physical principles of the devices, so they may require more data storage and management. This makes their efficiency lower. Analog improves efficiency by processing finer signals by leveraging the inherent physical properties of the devices. But this may sacrifice accuracy.

The key, according to Verma, is to find the physical principles suitable for the job, enabling the devices to be well controlled and manufactured on a large scale.

His team found a way to perform high-precision calculations using analog signals generated by capacitors specifically designed for precise switching. This is the third part. Unlike semiconductor devices such as transistors, the electrical energy transmitted through capacitors does not depend on variable conditions such as temperature and electron mobility in the material.

"They only depend on the geometric shape," said Verma. "They depend on the space between one metal wire and another." The geometric shape is something that today's most advanced semiconductor manufacturing technology can control very well.

Photonic chips, astonishing speed

Engineers at the University of Pennsylvania have developed a new chip that uses light waves instead of electricity to perform the complex mathematics necessary for training artificial intelligence. The chip has the potential to fundamentally speed up the processing speed of computers while reducing energy consumption.

The design of the silicon photon (SiPh) chip is the first to combine the pioneering research of Benjamin Franklin Medal winner and H. Nedwill Ramsey Professor Nader Engheta in manipulating materials at the nanoscale, using light (possibly the fastest communication method) for mathematical calculations. The SiPh platform uses silicon, a cheap and abundant element used for mass production of computer chips.

The interaction of light waves with matter represents a possible path for developing computers that can overcome the limitations of today's chips, which are essentially the same principle as the chips at the beginning of the computing revolution in the 1960s.In a paper published in the journal Nature Photonics, Engheta's team, in collaboration with the team of Associate Professor of Electrical and Systems Engineering Firooz Aflatouni, described the development process of a new chip.

"We decided to join forces," said Engheta, who took advantage of the fact that Aflatouni's research group was the first to develop nanoscale silicon devices.

Their goal was to develop a platform to perform what is known as vector-matrix multiplication, which is a core mathematical operation in the development and functionality of neural networks, the computer architecture of today's artificial intelligence tools.

"You can make silicon thinner, such as 150 nanometers," Engheta explained, instead of using highly uniform silicon wafers but limited to specific areas. These variations in height (without adding any other materials) provide a way to control the propagation of light in the chip, as the variations in height can be distributed to scatter light in a specific pattern, enabling the chip to perform mathematical calculations at the speed of light.

Aflatouni said that due to the limitations imposed by commercial foundries that manufacture the chip, the design is ready for commercial applications and is likely to be suitable for graphics processing units (GPUs), which have seen a surge in demand with their widespread application. There is a growing interest in the development of new artificial intelligence systems.

"They can adopt the silicon photonics platform as an add-on," Aflatouni said, "and then they can speed up the training and classification."

In addition to faster speeds and less energy consumption, the chip by Engheta and Aflatouni also has privacy advantages: since many calculations can be performed simultaneously, there is no need to store sensitive information in the computer's working memory, making future computers using such technology almost impossible to hack.

"No one can invade non-existent memory to access your information," Aflatouni said.

Other co-authors include Vahid Nikkhah, Ali Pirmoradi, Farshid Ashtiani, and Brian Edwards from the School of Engineering and Applied Science at the University of Pennsylvania.

Post a comment