Questions tagged [hpc]

HPC stands for High Performance Computing, typically a cluster of supercomputers.

6 questions
4
votes
2 answers

Does spaCy support multiple GPUs?

I was wondering if spaCy supports multi-GPU via mpi4py? I am currently using spaCy's nlp.pipe for Named Entity Recognition on a high-performance-computing cluster that supports the MPI protocol and has many GPUs. It says here that I would need to…
Jinhua Wang
  • 163
  • 8
1
vote
0 answers

Monitor Model Training Progress over HPC Clusters

As a part of my research in Deep Learning, I have to frequently train models which require a lot of computing power. As such, I use my university's HPC environment to submit my jobs and to train my models. However, I run into one major issue -…
1
vote
0 answers

How to solve the issue with getting free ports in Pytorch DDP?

I am facing issues with getting a free port in the DDP setup block of PyTorch for parallelizing my deep learning training job across multiple GPUs on a Linux HPC cluster. I am trying to submit a deep learning training job to a Linux HPC cluster…
0
votes
1 answer

How do I determine the best statistical way for data transformation for standardization (like log, sq root) to remove bias between different datasets?

I'm currently working on applying data science to High Performance Computing cluster, by analyzing the log files generated and trying to see if there is a pattern that leads to a system failure(specifically STALE FILE HANDLEs for now in GPFS file…
Kraamed
  • 13
  • 2
0
votes
1 answer

Running Model on both GPUs and CPUs

I have access to a hpc node, of 3 GPU and maximum of 38 CPU. I have a transformer model which I run of a single GPU at the moment, I want to utilize all the GPUs and CPUs. I have seen couple of tutorial on Dataparrallel and DistributedDataParallel.…
Fhunmie
  • 17
  • 6
0
votes
0 answers

find speedup for different number of processes

I am new to data science I need to create code to find speedup compared with the number of processes while using a k-nearest neighbor. which (k=1,2,3,4,5,6,7). this process should be after downloading some datasets. it is preferred to use…