RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one.

PyTorch多卡训练时，报错：

RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by passing the keyword argument `find_unused_parameters=True` to `torch.nn.parallel.DistributedDataParallel`, and by
making sure all `forward` function outputs participate in calculating loss.
If you already have done the above, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's `forward` function. Please include the loss function and the structure of the return value of `forward` of your module when reporting this issue (e.g. list, dict, iterable).
Parameter indices which did not receive grad for rank 1: 23 24 25 26
In addition, you can set the environment variable TORCH_DISTRIBUTED_DEBUG to either INFO or DETAIL to print out information about which particular parameters did not receive gradient on this rank as part of this error

1. 首先尝试使用单卡训练，查看是否报错，若单卡不报错，多卡报错的话，则进行如下排查：

定义了网络层却没在forward()中使用
forward()返回的参数未用于梯度计算
使用了不进行梯度回归的参数进行优化

2. 尝试设置find_unused_parameters=True：

torch.nn.parallel.DistributedDataParallel(model, device_ids=[self.local_rank], broadcast_buffers=False, find_unused_parameters=True)

3. 使用单卡调试，在loss.backward()之后optimizer.step()之前加入下面代码：

for name, param in model.named_parameters():
    if param.grad is None:
        print(name)

打印出来的参数就是没有参与loss运算的部分，他们梯度为None。