常见参数

1
MPIRUN $cur_dir/nccl-tests/build/all_reduce_perf -b 1G -e 1G -f 2 -g 1 -n 5000 -w 100

NCCL_MIN_NCHANNELS

The NCCL_MIN_NCHANNELS variable controls the minimum number of channels you want NCCL to use. Increasing the number of channels also increases the number of CUDA blocks NCCL uses, which may be useful to improve performance; however, it uses more CUDA compute resources.

The NCCL_MAX_NCHANNELS variable limits the number of channels NCCL can use. Reducing the number of channels also reduces the number of CUDA blocks used for communication, hence the impact on GPU computing resources.

可以看到增大 channels 的数目可以增大 bus bw

channels Bus bw(GB/s)
Qps = default default 72.1013
4 42.0354
8 56.008
12 73.1877
16 103.57
20 92.5674
32 122.732

参考:

Reference