Update: AMD has confirmed it is investigating reports of damaged Ryzen CPUs, and also that it's working with its partners to ensure BIOS limits are in place to keep voltages within spec.
Here's the statement in full (via Anandtech):
"We are aware of a limited number of reports online claiming that excess voltage while overclocking may have damaged the motherboard socket and pin pads. We are actively investigating the situation and are working with our ODM partners to ensure voltages applied to Ryzen 7000X3D CPUs via motherboard BIOS settings are within product specifications. Anyone whose CPU may have been impacted by this issue should contact AMD customer support."
Original story: Following initial reports of Ryzen 7000-series processors burning out under certain conditions, we now have a little more information on what might be causing these chips' untimely deaths.
In a statement to Der8auer, Asus notes that it has added new thermal monitoring mechanisms to protect chips. The statement goes on to mention AMD Expo and SoC voltage, which appears to suggest these may have some connection with the issues reported so far.
"The EFI updates posted on Friday contain some dedicated thermal monitoring mechanisms we've implemented to help protect the boards and CPUs. We removed older BIOSes for that reason and also because manual Vcore control was available on previous builds. We're also working with AMD on defining new rules for AMD Expo and SoC voltage. We'll issue new updates for that ASAP. Please bear with us," Asus spokesperson, Rajinder Gill, says.
The issue has been linked to excessive SoC voltages, as Tom's Hardware notes, and can be exposed via either AMD EXPO profiles, for memory overclocking, or via manual adjustment in the BIOS.
EXPO memory profiles are used to run DDR5 memory at advertised overclocking speeds, akin to Intel's XMP profiles, and do so by increasing memory clocks and voltages to accommodate, including SoC voltage.
According to Tom's sources, the issue may actually lie in this excessive voltage destroying or disabling the thermal protection mechanisms on the chip and thus allowing it to continue operating without thermal limits to slow itself down and protect it from overheating. That's a thought shared by Der8auer, who also believes that the thermal protection on affected chips died and that led to further heat damage until a catastrophic failure occurred.
All of which leads to the visible heat damage present on the underside of some of AMD's latest chips, specifically bulging around the vCore pads.
And while this issue has been linked to Ryzen 7000X3D processors, which come fitted with 3D V-Cache, it's noted that standard Ryzen 7000-series chips are also susceptible in some regard.
While the exact cause of this issue is still being investigated, it's important to mitigate your risk of exposure to it as much as possible. Generally, a safe SoC voltage is 1.25V, as beyond that you're getting into riskier territory. You can check what voltage your CPU is running at in any decent monitoring app. It's easy to find in apps like HWinfo, which we use for benchmarking. That's not to say any voltage exceeding that will cause the issue, as it remains highly unlikely—AMD has sold a lot of 7000-series CPUs and even these unfortunate cases are a drop in the ocean—but the risk may still increase and we don't want that.
Best CPU for gaming: The top chips from Intel and AMD
Best gaming motherboard: The right boards
Best graphics card: Your perfect pixel-pusher awaits
Best SSD for gaming: Get into the game ahead of the rest
The other thing to consider is that using EXPO overclocking is not covered under warranty, and that means technically killing your chip via the use of EXPO could leave you high and dry for a replacement. We don't know if the manufacturers affected by the issue (reportedly most motherboard manufacturers have been impacted to some degree) will stick to this if a wider issue is discovered, but it's good to consider EXPO in the meantime until a more formalised solution is found.
It's expected that AMD is working on a fix for the issue that would limit the voltage via the firmware or system management unit (SMU), which could also limit the extreme limits of memory overclocking. There is likely going to be a way to circumvent any limits, however, and in some circumstances damage your chip. But you ultimately take that risk with any sort of overclocking.