Build a budget 3090 deep learning workstation Part Two: use GPU only for CUDA not for display

2 minute read

Other parts of “Build a budget ML workstation”

Configuring Ubuntu

The main goal of the software config is the stability, so that the workstation can sustain running for an extended period of time without getting thermal throttled or freezing. The first thing we do would be installing relevant packages and their dependencies, this instruction by Lambda would suffice:

Install TensorFlow & PyTorch for RTX 3090, 3080, 3070, etc.

The following instructions should work on Ubuntu 20.04.

How to use GPU only for CUDA computing not for display

The reason to use the integrated Intel graphics for display is speed. I found that the display will become noticeably laggy if GPU usage is 100% during computation if the GPU is used for display as well.

The instruction I referred to is: Use integrated graphics for display and NVIDIA GPU for CUDA on Ubuntu 14.04.

Yet some of the info in the instruction is outdated, and does not apply to Intel 10-th gen CPUs, or any CPU with an Intel UHD 630 graphics. After installing the Nvidia VGA drivers and CUDA Toolkit. This is the /etc/X11/xorg.conf file I am using to make this happen:

Section "ServerLayout"
    Identifier "layout"
    Screen 0 "intel"
    Screen 1 "nvidia"
EndSection

Section "Device"
    Identifier "intel"
    Driver      "modesetting"    
    Option      "AccelMethod"    "glamor"
    BusID       "PCI:0:2:0"
    Option      "TearFree" "true"
    Option  "TripleBuffer" "true"
EndSection

Section "Screen"
    Identifier "intel"
    Device "intel"
EndSection

Section "Device"
    Identifier "nvidia"
    Driver "nvidia"
    BusID "PCI:1:0:0"
EndSection

Section "Screen"
    Identifier "nvidia"
    Device "nvidia"
    Option "AllowEmptyInitialConfiguration" "on"
    Option  "Coolbits" "28"
EndSection

What the setting above actually does is to let X11 config a second screen (which does not exist). The second screen will load the video card driver so that CUDA will be triggered. Otherwise, if we choose not to use GPU for display in the Nvidia X server setting, the CUDA cannot be used either…

The Option "Coolbits" "28" part unlocks the fan setting in the Thermal setting option in the Nvidia X server settings, so that we can manually set certain threshold for the fan kicking offs. By default, the fan is off and the customized setting is grayed out. Having freedom to customize the fan profile is to avoid overheating the GDDR6x memory for an extended period of model training. Note that to make the changes in nvidia-settings permanent, we have to add the following line as a start-up application in Ubuntu.

sh -c '/usr/bin/nvidia-settings --load-config-only'

Comments