Li Auto seeks to break the “high compute, low intelligence” trap in automotive chips

By Zhou Yongliang

Chinese electric-vehicle maker Li Auto says a new research framework for designing AI models and chips together could help underpin its upcoming in-house processor, Mach 100, expected to enter mass production in 2026.

In a recent paper published with the National Innovation Decision Intelligence Technology Research Institute, the company proposes what it calls a “software–hardware co-design law” aimed at maximizing the performance of large AI models running on power-constrained in-vehicle chips.

The idea challenges a widely held assumption in the autonomous-driving industry: that more computing power automatically leads to smarter vehicles. Instead, Li Auto argues that future competition will depend less on raw computing power and more on how efficiently algorithms and hardware work together.

The research provides part of the theoretical foundation for Mach 100, a 5-nanometer automotive chip that the company plans to deploy in its next-generation vehicles, including the successor to its flagship L9 SUV.

The limits of brute-force computing power

As intelligent driving systems increasingly rely on AI rather than rule-based software, demand for onboard computing has surged. Developers are exploring architectures built around large models capable of perception, reasoning and decision-making, sometimes described as vision–language–action (VLA) systems.

But running such models inside a car poses major constraints. While cloud-based AI systems can scale across thousands of GPUs, automotive chips must operate within strict limits on power consumption, heat dissipation and cost.

Chip development and algorithm design often evolve separately. Hardware engineers typically pursue steady improvements in processor performance, while AI researchers follow “scaling laws” that encourage ever-larger models.

The paper notes this mismatch can mean a chip’s theoretical peak performance may not translate into real-world efficiency. Models designed without hardware constraints may fail to fully utilize available compute, while compromises to fit hardware limitations can reduce model capability. Li Auto argues that solving this problem requires designing algorithms and chips together not treating them as separate systems.

A framework for co-design

To study how model architecture interacts with hardware, Li Auto’s team trained 170 model architectures and evaluated nearly 2,000 configurations.

The goal was to move away from trial-and-error. Instead of repeatedly training and testing models in vehicles, the researchers built a system capable of predicting model accuracy from its hyperparameters before training, turning what was previously a “black-box trial-and-error” process into a predictable one.

The researchers also adapted the classic Roofline performance model, which analyzes the balance between computation and memory bandwidth in processors. Beyond the traditional model that focuses on compute capacity and memory bandwidth, they incorporated factors specific to automotive AI, such as KV cache usage, attention mechanisms, and mixture-of-experts (MoE) routing.

Building on this, the team developed an optimization framework called PLAS (Pareto-optimal LLM Architecture Search). By inputting hardware characteristics like computing power, bandwidth, and cache hierarchy, along with engineering constraints like latency and power consumption, PLAS automatically generates model architectures that best match the chip’s capabilities.

The study showed optimized models achieved a 19.4% accuracy improvement at the same latency as the Qwen2.5-0.5B model, suggesting that closer alignment of algorithms and hardware significantly improves efficiency.

Implications for automotive AI chips

The research highlights several findings that challenge common assumptions about chip design for intelligent vehicles.

Memory bandwidth and cache efficiency may matter more than peak compute power in determining real-world AI performance. Even high-performance processors can struggle if memory resources can’t keep up with computational demands.

Sparse architectures, especially MoE models, consistently outperform dense architectures in automotive scenarios, since only a small subset of specialized components needs to be activated at any one time.

AI inference involves two stages — input processing and output generation — which place different demands on hardware resources. Future automotive chips may need to dynamically adjust the allocation of computing resources during these stages.

Taken together, the findings point to a broader conclusion: there is no universal chip architecture suitable for every application. The most effective processors are those designed specifically for the algorithms they will run.

Toward algorithm-native chips

Li Auto says Mach 100 is intended to follow this philosophy. Its architecture — including its memory subsystem and computing-unit configuration — was developed with large AI models in mind rather than adapted from a general-purpose processor.

More broadly, the research reflects a shift toward tighter integration of AI software and hardware in the automotive industry. 

Over the past decade, automakers have moved from running third-party algorithms on off-the-shelf chips to developing proprietary software on external hardware platforms. Leading companies are now pursuing a third stage: building both their own algorithms and their own computing hardware.

Companies including Apple, Google, and Tesla have followed similar approaches in smartphones, cloud computing, and autonomous driving.

Li Auto’s paper argues that such integration may ultimately be necessary to unlock the full potential of automotive AI.

Source: 
https://mp.weixin.qq.com/s/P_AWVOFwl3QsiugQmG0Etg

, ,