0

I start mpirun with command:

mpirun -np 2 prog

and get next output:

--------------------------------------------------------------------------
WARNING: It appears that your OpenFabrics subsystem is configured to only
allow registering part of your physical memory. This can cause MPI jobs to
run with erratic performance, hang, and/or crash.

This may be caused by your OpenFabrics vendor limiting the amount of
physical memory that can be registered. You should investigate the
relevant Linux kernel module parameters that control how much physical
memory can be registered, and increase them to allow registering all
physical memory on your machine.

See this Open MPI FAQ item for more information on these Linux kernel module
parameters:

http://www.open-mpi.org/faq/?category=openfabrics#ib-..

Local host: node107
Registerable memory: 32768 MiB
Total memory: 65459 MiB

Your MPI job will continue, but may be behave poorly and/or hang.
--------------------------------------------------------------------------
hello from 0
hello from 1
[node107:48993] 1 more process has sent help message help-mpi- btl-openib.txt / reg mem limit low
[node107:48993] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

Other installed soft (Intel MPI library) work fine, without any errors and using all 64GB memory.

For OpenMPI I don't use any PBS manager (Torque, slurm, etc.), I work on single node. I get to the node by command

ssh node107

For command

cat /etc/security/limits.conf

I get next output:

...
* soft rss  2000000
* soft stack    2000000
* hard stack    unlimited
* soft data     unlimited
* hard data     unlimited
* soft memlock unlimited
* hard memlock unlimited
* soft nproc   10000
* hard nproc   10000
* soft nofile   10000
* hard nofile   10000
* hard cpu unlimited 
* soft cpu unlimited 
...

For command

cat /sys/module/mlx4_core/parameters/log_num_mtt

I get output:

0

Command:

cat /sys/module/mlx4_core/parameters/log_mtts_per_seg

output:

3

Command:

getconf PAGESIZE

output:

4096    

With this params and by formula

max_reg_mem = (2^log_num_mtt) * (2^log_mtts_per_seg) * PAGE_SIZE

max_reg_mem = 32768 bytes, nor 32GB, how specified in openmpi warning.

What is the reason for this? Can openmpi don't use Mellanox and params log_num_mtt, log_mtts_per_seg? How I can configure OpenFabrics to use all 64GB memory?

r1d1
  • 469
  • 4
  • 16
  • Possible duplicate of [How can I increase OpenFabrics memory limit for Torque jobs?](http://stackoverflow.com/questions/17755433/how-can-i-increase-openfabrics-memory-limit-for-torque-jobs). You, your system administrator, or whoever has root access to the nodes should increase the value of `log_num_mtt` to 11 (I guess that `0` in your question is a typo and it should be `10`) by modifying the options for the `mlx4_core` kernel module. But that is irrelevant for single node jobs and the warning can be safely ignored. – Hristo Iliev Feb 05 '17 at 15:51

2 Answers2

0

I solve this problem by installing newest version of OpenMPI (2.0.2).

r1d1
  • 469
  • 4
  • 16
0

In /etc/modprobe.d/mlx4_core.conf, put the following module parameter:

options mlx4_core log_mtts_per_seg=5

Reload the mlx4_core module:

rmmod mlx4_ib; rmmod mlx4_core; modprobe mlx4_ib

Check if log_mtts_per_seg shows up as configured above:

cat /sys/module/mlx4_core/parameters/log_mtts_per_seg

Alex
  • 1