Update DDP for `torch.distributed.run` with `gloo` backend (#3680)
* Update DDP for `torch.distributed.run`
* Add LOCAL_RANK
* remove opt.local_rank
* backend="gloo|nccl"
* print
* print
* debug
* debug
* os.getenv
* gloo
* gloo
* gloo
* cleanup
* fix getenv
* cleanup
* cleanup destroy
* try nccl
* return opt
* add --local_rank
* add timeout
* add init_method
* gloo
* move destroy
* move destroy
* move print(opt) under if RANK
* destroy only RANK 0
* move destroy inside train()
* restore destroy outside train()
* update print(opt)
* cleanup
* nccl
* gloo with 60 second timeout
* update namespace printing
正在显示
差异被折叠。
请
注册
或者
登录
后发表评论