Saturday, September 23, 2023

All EPYC ROM CPUs crash after 1,044 days running, AMD won’t fix the issue

Server processors don’t know what holidays are, because they are designed to work continuously and non-stop, 24 hours a day, 7 days a week. well it seems AMD EPYC ROM CPUs stop their activity on 1044. Why does this happen and what are the consequences?

Today, many businesses provide services based on server infrastructure connected to the Internet. They do this all over the world without interruption and the fact that their CPUs shut down from time to time with no choice but to restart or disregard energy saving methods as scheduled maintenance is now There is also an economic problem.

Why do AMD EPYC ROM CPUs stop their activity in 1044 days?

Amd Epyc Genoa Ram Problem

In fact, it is not the entire processor that ceases its activity after the said time, and what happens is shutdown and not shutdown. That is, its activity is stopped in time. According to AMD, this event occurs when one of the cores is unable to wake up again due to inactivity. Why does this happen Well, they are not known yet, because there is no official explanation of the problem yet.

We should note that AMD’s EPYC ROM is based on the Zen 2 architecture and is behind it by a few years. The curious thing is that the error occurs almost three years after the last system reset. Although a server is designed to run without interruption, it is completely normal for various parts of the system to have regularly scheduled maintenance shutdowns.

What’s more, unlike PCs, a modern server has mechanisms to save information on the state of RAM and processor cache lines for immediate retrieval. Either due to drop in voltage, due to degradation in the electrical system or due to maintenance. So the problem is not as serious as it might seem at first glance.

AMD does not intend to provide a solution

Amd Epyc Rom Stops 1044 Days

And it’s that the problem is not in any firmware or driver, but within the processor itself. Keeping this in mind that all the time, after AMD EPYC ROMs which accumulates in 1044 days, they have released two generations of their server processors. One is based on Zen 3 and the other on Zen 4 architecture, so there is no interest from their side to solve the problem.

Rather, the core problem lies in the way each core manages the so-called CC6 state, which occurs when one core’s voltage drops to 0 volts. This is done constantly in any type of processor today to reduce power consumption. However, the problem here is that the subsystem in charge of it is unable to reactivate the affected kernel after a certain amount of time.

In other words, the problem is minor due to the fact that it doesn’t happen while the kernel is active, but will no longer wake up if it goes to sleep, 1044 days after the last reboot. This is why AMD currently recommends disabling the CC6 state, which prevents the various processor cores from sleeping.

World Nation News Desk
World Nation News Desk
World Nation News is a digital news portal website. Which provides important and latest breaking news updates to our audience in an effective and efficient ways, like world’s top stories, entertainment, sports, technology and much more news.
Latest news
Related news