(2 node with 2 GPUs each) modes. If the test only uses 2 GPUs, it is important to set the distributed backend to "mp" to avoid Ray scheduling all workers in a node other than the head node, which can ...
correclty omit operations that are already at local optima).