torch cuda not able to identify gpu on aws g4dn.xlarge

Question

I have created an EC2 instance with GPU g4dn.xlarge and launched it. I want to run some code from command line and this code is pytorch based. While pycuda is able to identify the GPUs the pytorch is not able to identify it.

import pycuda.driver as cuda
import torch
cuda.init()
num_gpus = cuda.Device.count()
print(f"Number of GPUs: {num_gpus}")
print("is torch cuda avaialable",torch.cuda.is_available())
print("torch cuda count",torch.cuda.device_count())

the output for the above code will be

Number of GPUs: 1
is torch cuda avaialable False
torch cuda count 0

Here are the pytorch and cuda version I am using

pytorch                   2.0.1           cpu_py310h07ccb54_0
cudatoolkit               11.7.0              h254b3b0_10    nvidia
pycuda                    2021.1          py310h06b8198_3    conda-forge

score 2 · Accepted Answer · answered Oct 06 '23 at 15:00

It looks like the issue is you have the cpu version of pytorch installed instead of the gpu version. If you go to the pytorch home page: https://pytorch.org/get-started/locally/ you can use the configuration table to install the cuda 11 or cuda 12 version of pytorch and you should be good to go.

score 1 · Answer 2 · answered Oct 09 '23 at 20:07

Use nvidia-smi to troubleshoot. Depending on what it says, you need to get and install the nvidia cuda driver. If you've upgraded your kernel (or had it automatically applied) you need to reinstall the cuda driver.

If nvidia-smi shows correct GPU data, then something is wrong with pytorch install.

score 1 · Answer 3 · answered Nov 03 '23 at 16:22

1

Use nvidia-smi to match the correct nvidia driver that can work with your desired cuda toolkit

Make sure to install GPU version of the cuda toolkit. https://pytorch.org/get-started/locally/ should be able to help you choose your desired version.

answered Nov 03 '23 at 16:22

Veer7

146
4

torch cuda not able to identify gpu on aws g4dn.xlarge

3 Answers3