Mobile phone overheating is a common nuisance. To address the temperature challenges of multi-core chip operation, Associate Professor Chen Kunzhi and his research team at National Yang Ming Chiao Tung University in Taiwan have developed innovative chip internal network temperature prediction and control technology, which significantly enhances the heat dissipation performance of multi-core chips.
National Yang Ming Chiao Tung University in Taiwan stated in a press release that multi-core chips have been widely used in personal computers, mobile phones, servers, and other devices in recent years. As the number of processor cores increases, the challenges of internal connections of multi-core chips also escalate, making the chip internal network (Network on Chip, NoC) connection structure a hot topic in technology. At the same time, with the increase in clock frequency of computing cores, the increase in power density on multi-core chips poses serious temperature challenges, greatly affecting the operational efficiency and reliability of the chips.
Associate Professor Chen Kunzhi from the Department of Electronics at National Yang Ming Chiao Tung University led the Ceres Lab research team, consisting of master’s students Liao Yuanhao, Chen Zhengting, and Wang Leiqi, to propose a low-cost online learning mechanism for accurate temperature prediction of the chip internal network system and to improve the temperature challenges of multi-core chips through dynamic active temperature management using adaptive reinforcement learning technology. This innovative research achievement was selected for the 2024 IEEE TVLSI Best Paper Award, the first time Taiwan has received this honor.
The research team explained that the heat issue of the NoC system must be monitored during operation. When the system temperature reaches a critical level, the dynamic thermal management mechanism is triggered to prevent overheating. The Predictive Dynamic Thermal Management (PDTM) controls the system temperature in advance based on temperature prediction information, reduces performance impact during temperature control using partial throttling schemes, and is more effective compared to traditional reactive dynamic thermal management.
The temperature behavior of the NoC system varies with different workload distributions, making it difficult to accurately capture physical parameters such as capacitance, resistance, and power values during operation, resulting in larger temperature prediction errors. In recent years, machine learning prediction methods have been able to dynamically meet the hyperplanes of physical system behavior. However, machine learning methods heavily rely on the quality of training data, leading to significant errors in the NoC system.
Chen Kunzhi stated that the active temperature management based on machine learning proposed by the research team adopts the Minimum Mean Square Adaptive Filtering Theory to optimize the model, dynamically adjust temperature prediction, improve prediction accuracy to cope with different workloads and temperature changes. By introducing adaptive reinforcement learning methods, real-time feedback on current temperature, predicted temperature, and system throughput dynamically adjusts the throttling ratio to achieve optimal heat management effects while maximizing system performance. The research results show that compared to traditional methods, the proposed adaptive reinforcement learning method significantly reduces temperature prediction errors while enhancing system performance.
This innovative research achievement not only won the IEEE TVLSI Best Paper Award this year but also marks the first time in 30 years that a Taiwanese team has received this honor. This not only affirms the research team’s efforts in the laboratory but also demonstrates the outstanding research contributions and forward-looking technological development capabilities of the university.
