解决报错:RuntimeError: Found more than one stateful callback of type `ModelCheckpoint`.
当我使用 pytorch-lightning 的时候,突发地报了如下错误:
RuntimeError: Found more than one stateful callback of type `ModelCheckpoint`. In the current configuration, this callback does not support being saved alongside other instances of the same type. Please consult the documentation of `ModelCheckpoint` regarding valid settings for the callback state to be checkpointable. HINT: The `callback.state_key` must be unique among all callbacks in the Trainer.
Why 报错?
配置文件中的 modelcheckpoint 和 metrics_over_trainsteps_checkpoint 都有在 every_n_train_steps 属性,如果他们的数值相同就会报此error。
lightning:
modelcheckpoint:
params:
every_n_train_steps: XXX
callbacks:
metrics_over_trainsteps_checkpoint:
params:
every_n_train_steps: XXX
报错原因及解决方法
保存ckpt的时候,可以按照modelcheckpoint 的every_n_train_steps 进行保存。如果metrics_over_trainsteps_checkpoint 的 every_n_train_steps 与之一样的话,这样是无法在多个ckpt 进行打分,适当选择留 ckpt的。
所以,metrics_over_trainsteps_checkpoint 的 every_n_train_steps需要大于modelcheckpoint 的every_n_train_steps。
如若表达错误,敬请批评指正。