Skip to content

Instantly share code, notes, and snippets.

@eduncan911
Last active July 10, 2023 21:12
Show Gist options
  • Star 9 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save eduncan911/db9ea42207d96bf06120170e6dce6d65 to your computer and use it in GitHub Desktop.
Save eduncan911/db9ea42207d96bf06120170e6dce6d65 to your computer and use it in GitHub Desktop.
Fixing Thermal Throttling on Thinkpad P1 and X1 Extreme - Linux Edition

Fixing Thermal Throttling on Thinkpad P1 and X1 Extreme - Linux Edition

Lenovo messed up with the X1E and P1 Gen 1 versions (and maybe later generations) in that the system boots with a thermal limit (aka Tjunction or tjmax) set to 82C (some report 80C). What this means is that regardless of power draw or under-volting settings, when your CPU hits 82C, it will drop the frequency down to the "Configurable TDP-down" frequency, or even lower. It will also may limits the system power draw.

Thermal Paste and Stress Testing

First, note that I have already replaced the thermal paste on my P1's CPU and GPU with Noctua NT-H2 thermal compound (affiliate link). This immediately made a very noticable difference in idle temps and placing the laptop on my lap stayed cool. Also, the keyboard no longer got hot to the touch.

For stress testing under Linux, I used the s-tui application to dig into the details for all testing below.

How to Fix It

The fix is really two steps:

  • Set the Tjunction higher, say, -3 under your CPU's rated Tjunction value.
  • Undervolt the CPU, Cache, Uncore, and iGPU to maximize your performance.

Windows has a "driver" fix

Lenovo released a software update that effectively sets the Tjunction back up to 97C. However, this is only for Windows, and there are many posts of where Hyper-V negates the setting. I am not sure, but perhaps Lenovo has fixed this with newer drivers since others reported it back in Q1 2018.

Linux instructions

For Linux, we are left to fend for ourselves. Therefore, here's how to verify your system is affected, and how to fix it.

Verify your system is affected

Two different ways to do this.

Use msr-tools

You can install the msr-tools utility.

sudo apt install msr-tools
sudo modprobe msr 

Then, read the field and convert it to a digit:

$ sudo rdmsr --bitfield 23:16 -d 0x00001a2
18

This means your system is set to -18C under your Tjunction max, which for my Xeon E-2176M is 100C. So, that would be 100 - 18, which is 82C max.

Use the undervolt utility

Current install instructions are on the github:

https://github.com/georgewhewell/undervolt

But in short, install it via pip under root (I know, anti-Python, but this needs root to access the DMA).

sudo pip install undervolt

Now, you can read the Tjunction directly (called temperature target):

$ sudo undervolt --read
temperature target: -18 (82C)
core: 0.0 mV
gpu: 0.0 mV
cache: 0.0 mV
uncore: 0.0 mV
analogio: 0.0 mV
powerlimit: 78.0W (short: 0.00244140625s - enabled) / 45.0W (long: 96.0s - enabled)

As you can see, mine is set to 82C.

1. Set Tjunction to proper setting

Go lookup your CPU on Intel's Ark site and find its Tjunction value. My E-2176M has a max of 100C. You do NOT want to hit this 100C, ever! So we are going to set it to 97C instead, to leave a little headroom as sometime CPU temps spike 1C or 2C higher than your target temp while waiting on fans to ramp up. If you do hit your Tjunction max, your system will shut down out of safety.

Armed with target temp, mine being 97C, we can use the undervolt utility listed under the Verify section above.

sudo undervolt --temp 97

We can check it now:

$ sudo undervolt --read
temperature target: -3 (97C)
core: 0.0 mV
gpu: 0.0 mV
cache: 0.0 mV
uncore: 0.0 mV
analogio: 0.0 mV
powerlimit: 78.0W (short: 0.00244140625s - enabled) / 45.0W (long: 96.0s - enabled)

2. Undervolting

Now that my CPU ramps up to 97C, I went from 2700Mhz to 3400Mhz across all cores! However, this is still a far cry from its rated 4.4Ghz turbo setting. And, it only lasts about 10 seconds before it throttles pretty quickly down to 1500Mhz, and back up to 3400Mhz again. The reason is that our CPU is running at full voltage, which is hot. Intel processors run with more voltage than they need to account for unstable/inaccurate system voltage regulation.

To address this, I used undervolt to find a safe setting for undervolting. Here are my settings I found to be stable for the E-2176M:

sudo undervolt --temp 97 --core -150 --cache -150 --gpu -100 --uncore -100

And checking it's all set correctly:

$ sudo undervolt --read
temperature target: -3 (97C)
core: -150.39 mV
gpu: -99.61 mV
cache: -150.39 mV
uncore: -99.61 mV
analogio: 0.0 mV
powerlimit: 78.0W (short: 0.00244140625s - enabled) / 45.0W (long: 96.0s - enabled)

With these settings, I am connected to two Thunderbolt 3 docking stations, 3 1080p monitors, 5 USB external accessories, Brave browser open with about 29 tabs, and a couple of terminals on Pop_OS.

I ran s-tui stress test for about 3 hours straight, while using the Brave browser and watching youtube and various surfing. Zero issues.

All cores now hover around 3900Mhz to 4000Mhz, much closer to that Turbo of 4.4Gh and 35W of usage. It would still drop after a minute or two, but it only drops to 2200 or 2400Mhz now which is much better for the low before.

Your mileage may vary. Adjust the voltages 20mV at a time.

Persist it all across reboots

You'll want to read up on Undervolt's github site for how to persist it with systemd service. While I do use it, and my undervolting remains, my max temp isn't sticking yet across all reboots. It's a hit or miss, more likely a race condition with another service on startup. I'll setup the timer as described in the Undervolt instructions later.

Enjoy!

@jdchristensen
Copy link

Does anyone know if these issues happen with later generations, e.g. the Gen 4?

@eduncan911
Copy link
Author

eduncan911 commented Jan 3, 2022

@jdchristensen from what I heard, it was only Gen1. As the Windows fix is only available on Gen1 machines. But it's been a year or two since I looked it up.

It's easy to test though:

sudo undervolt --read

And it will tell you if it's set to 82C or 97C.

As far as undervolting, I just reverted. Had a few random reboots and hard locks on a machine I've never issues with before.

@jdchristensen
Copy link

I can confirm that on a Lenovo Thinkpad X1 Extreme Gen 4, the temperature target is -4 (96C) without any custom settings. All 8 cores go to ~3GHz when starting a stress test with s-tui, and then quickly taper down to ~2.8GHz which is sustained, at least for the 4 minutes I tested it.
This is above the advertised 2.5GHz, so seems good. If I run a single job, I get 3.8GHz to 4.8GHz, depending on which core it is on (slow for odd-numbered cores and fast for even-numbered cores). Not sure what is going on there. This is all with Ubuntu 21.10. The processor is:

Processor: 11th Generation Intel® Core™ i7-11850H Processor with vPro™
(2.50 GHz, up to 4.80 GHz with Turbo Boost, 8 Cores, 16 Threads, 24 MB Cache)

@eduncan911
Copy link
Author

Nice @jdchristensen . then I'd say Gen4 does not suffer from the same fate. :)

@lenardg
Copy link

lenardg commented Jan 10, 2022

@eduncan911 May I ask which driver is supposed to fix this under Windows? Thx!

@eduncan911
Copy link
Author

@lenardg I can't recall off hand as it's been years. It was called something like Power Delivery, or System Performance, or some sublet name. There's a very long thread on Lenovos forums about throttling with Hyper-V on the P1 Gen1. That's the thread you want to read, as they find the driver and name it there.

@jdchristensen
Copy link

@eduncan911 I think my slower performance on odd cores is probably a thermal issue. The odd cores get hotter much more quickly and then the speed gets limited. Sounds like something that repasting might help. Do you have any advice about how to do this? I have a Gen 4, so it might not be the same procedure, but still it would be helpful to know what worked for you.

@eduncan911
Copy link
Author

@jdchristensen disable Hyperthtreading in your bios and try that for a little while.

Hyperthtreading a byproduct of the algos that are still under attack for years (Meltdown, Spectra) - yes, these are still around, and we are still finding more and more CVEs from them. Safest bet is to disable Hyperthtreading on all Intel machines.

But anyhoot,, disable HT and see how things go for a few days with temps.

@jdchristensen
Copy link

@eduncan911 I have already disabled hyperthreading. So this is about different physical cores running at higher temps and throttling as a result. I also find the fans come on sooner than I'd expect with just one moderate application running, like Zoom. The more I read, the more I think that repasting might help. I found a pretty good guide https://imgur.com/a/Blvpjd0 with comments at https://www.reddit.com/r/thinkpad/comments/a14vi2/basic_repasting_guide_for_the_lenovo_extreme_x1/

@jdchristensen
Copy link

For others reading this, it looks like the Gen 4 does not support undervolting.

@jdchristensen
Copy link

I repasted my X1 Extreme Gen 4 using Noctua NT-H2, and it made a huge difference. Now all cores run at roughly the same speed, reaching the expected 4.8GHz turboboost speed for smaller loads.

During some benchmarks, the package temperature dropped from 94C to 66C, with the cores running at higher speeds and the fans running at the same speed. In other tests, the temperatures were similar, but the cores ran at faster speeds and the fans ran at slower speeds. All around improvements.

I also noticed that the underside of the heat sink had a number of purple thermal pads that were not aligned perfectly with the metal on the heat sink, so I very gently lifted them and centered them.

The service manual was helpful.

@eduncan911
Copy link
Author

eduncan911 commented Jan 22, 2022

Yep, I repasted mine immediately when I bought it. Lol.

@dnetguru
Copy link

For anyone who needs a peek inside P1 Gen4: https://imgur.com/a/4rFCFeb (not mine)

@maglighter
Copy link

I have X1 Extreme Gen 2 laptop and I undervolted it. But I have strange issues with chrome and youtube, it crashes sometimes and even freezes the system entirely even when the load is not so high. Could it be related to an undervoting and what I could try to do to fix this issue? I've already tried to decrease my values.
Now I have:
[UNDERVOLT.AC]
CORE: -125
GPU: -30
CACHE: -125
UNCORE: -80
ANALOGIO: 0

@eduncan911
Copy link
Author

@maglighter yes. I stopped undervolting because of those random crashes, lockups, and odd Docker issues.

@kylebakerio
Copy link

@pglpm
Copy link

pglpm commented Jun 22, 2022

Thank you all for the useful discussion and exchange. It's a bit above my head. I also have an X1 Extreme gen 4 with Ubuntu 20.04 and ."11th Gen Intel® Core™ i7-11800H @ 2.30GHz × 16". In my case, issuing
$ sudo rdmsr --bitfield 23:16 -d 0x00001a2
I actually get a value 100. That seems too high... but maybe I'm reading the wrong bit?

@jdchristensen
Copy link

@pglpm rdmsr also gives 100 on my X1E4, but undervolt --read gives

# undervolt --read
temperature target: -4 (96C)
...

which seems good.

@jdchristensen
Copy link

By the way, after upgrading to Ubuntu 22.04, I'm getting much better thermal behaviour. I'm running tlp, and I added the setting RUNTIME_PM_ON_AC=auto so that power management of devices (including the Nvidia GPU) happens on AC like when on battery. With this setting, all of the powertop tunables are in a good state. I also run thinkfan. The result is that my fan very rarely comes on at all when browsing the web, doing email, etc, and the machine stays very cool!

@pglpm
Copy link

pglpm commented Jun 22, 2022

Thank you @jdchristensen , very valuable information! I'm new to this X1E4 and also to a Linux OS, so I'm struggling a little to understand settings and how to change them. Unfortunately much information on the web seems either outdated or above my head. My problem has been the opposite of thermal throttling: the laptop gets extremely hot (especially around the lower-right side of the keyboard) when on AC. On the other hand it's cool and silent when on battery, and yet still powerful. I'll try the thermal repasting that you mentioned above (though a bit worried as I've never done it before). But in the meantime I'll try to lower the AC performance a little to see what happens. I didn't know that tlp could do that, thank you for the tip, that's great!

May I ask if you did any special tweaks with thinkfan or just used the default settings?

@jdchristensen
Copy link

I highly customized thinkfan, but I don't want to share my file as I think the best settings will depend on the thermal characteristics of your laptop. One thing I did is have thinkfan control the fans in just three steps: level 0 (off), level 1 (lowest) and level auto (let the BIOS do what it thinks is best). That way it's unlikely that using thinkfan will result in overheating or bad performance. (I also patched my kernel so that level 1 turns on only the right fan, but that's a hack that I don't want to share.)

@pglpm
Copy link

pglpm commented Jun 22, 2022

I set RUNTIME_PM_ON_AC=auto and also RUNTIME_PM_DRIVER_DENYLIST="mei_me" as was suggested in the tlp explanations of the configuration entries, and the heat problem has gone! It had bothered me for a month. Thank you jdchristensen and eduncan911 for the tips and for hosting this useful Readme!

@jdchristensen
Copy link

Glad to hear it! I'm curious why you needed to set RUNTIME_PM_DRIVER_DENYLIST. The default is mei_me nouveau radeon, so unless you are using nouveau or radeon, it shouldn't make a difference.

@pglpm
Copy link

pglpm commented Jun 23, 2022

Simple reason: my ignorance. Indeed that setting doesn't matter in my case. My ignorance is the reason why I try to gather as much information as possible and ask in places like this before making changes.

I want to confirm that RUNTIME_PM_ON_AC=auto has completely solved that excessive heat problem; I've been testing this for a day now.
Just wonderful!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment