Other parts of “Build a budget ML workstation”
- Part Zero: Random tidbits: a newbie’s story to build a budget machine learning workstation.
- Part One: Building: assembling the parts and how to deal with the bent of the big and heavy RTX 3090.
- Part Three: undervolt the GPU: we will see how to config the Ubuntu to achieve an undervolt effect on GPU, thus making the system more stable. This serves our need of training of models for a longer period of time.
- Bonus part: productivity for MAC users: as a long time MacOS user, we will learn how to config an almost MacOS like keyboard on Linux.
The main goal of the software config is the stability, so that the workstation can sustain running for an extended period of time without getting thermal throttled or freezing. The first thing we do would be installing relevant packages and their dependencies, this instruction by Lambda would suffice:
The following instructions should work on Ubuntu 20.04.
How to use GPU only for CUDA computing not for display
The reason to use the integrated Intel graphics for display is speed. I found that the display will become noticeably laggy if GPU usage is 100% during computation if the GPU is used for display as well.
The instruction I referred to is: Use integrated graphics for display and NVIDIA GPU for CUDA on Ubuntu 14.04.
Yet some of the info in the instruction is outdated, and does not apply to Intel 10-th gen CPUs, or any CPU with an Intel UHD 630 graphics. After installing the Nvidia VGA drivers and CUDA Toolkit. This is the
/etc/X11/xorg.conf file I am using to make this happen:
Section "ServerLayout" Identifier "layout" Screen 0 "intel" Screen 1 "nvidia" EndSection Section "Device" Identifier "intel" Driver "modesetting" Option "AccelMethod" "glamor" BusID "PCI:0:2:0" Option "TearFree" "true" Option "TripleBuffer" "true" EndSection Section "Screen" Identifier "intel" Device "intel" EndSection Section "Device" Identifier "nvidia" Driver "nvidia" BusID "PCI:1:0:0" EndSection Section "Screen" Identifier "nvidia" Device "nvidia" Option "AllowEmptyInitialConfiguration" "on" Option "Coolbits" "28" EndSection
What the setting above actually does is to let
X11 config a second screen (which does not exist). The second screen will load the video card driver so that CUDA will be triggered. Otherwise, if we choose not to use GPU for display in the Nvidia X server setting, the CUDA cannot be used either…
Option "Coolbits" "28" part unlocks the fan setting in the Thermal setting option in the Nvidia X server settings, so that we can manually set certain threshold for the fan kicking offs. By default, the fan is off and the customized setting is grayed out. Having freedom to customize the fan profile is to avoid overheating the GDDR6x memory for an extended period of model training. Note that to make the changes in
nvidia-settings permanent, we have to add the following line as a start-up application in Ubuntu.
sh -c '/usr/bin/nvidia-settings --load-config-only'