Test failures with new configs in test_grad_scaling_autocast in test_torch.py #126638

gambiTarun · 2024-05-19T04:53:42Z

This issue tracks my observations when updating test_grad_scaling_autocast in test_torch.py with the new OptimizerInfo infrastructure (#123451).

While I was able to combine tests that call _grad_scaling_autocast_test into one test (#125538), I observe test failures when I try to use _get_optim_inputs_including_global_cliquey_kwargs to avoid hardcoded configs.

The following is the test case:

@onlyNativeDeviceTypes
@optims(
    [optim for optim in optim_db if optim.optim_cls in [torch.optim.AdamW, torch.optim.Adam, torch.optim.SGD]],
    dtypes=[torch.float32]
)
def test_grad_scaling_autocast(self, device, dtype, optim_info):
    try_pickle = False

    def run(device, data, model, optimizer, scaler, loss_fn, skip_iter, try_scaling_api):
        for i, (input, target) in enumerate(data):
            optimizer.zero_grad()
            with torch.autocast(device_type=device, dtype=torch.half, enabled=try_scaling_api):
                output = model(input)
                loss = loss_fn(output, target)
            if try_scaling_api:
                scaler.scale(loss).backward()
                if i == skip_iter and scaler.is_enabled():
                    with torch.no_grad():
                        model[1].weight.grad.fill_(float('inf'))
                scaler.step(optimizer)
                scaler.update()
                if try_pickle:
                    scaler = pickle.loads(pickle.dumps(scaler))
            else:
                loss.backward()
                if (not scaler.is_enabled()) or (i != skip_iter):
                    optimizer.step()
        return scaler

    optimizer_ctor = optim_info.optim_cls
    all_optim_inputs = _get_optim_inputs_including_global_cliquey_kwargs(
        device, dtype, optim_info, skip=("differentiable",))
    # Compares no scaling + no autocasting against scaling + autocasting.
    for optim_input in all_optim_inputs:
        
        # NOTE(mkozuki): With current way of testing, `torch.optim.Adam` is failing in spite of `foreach` and `fused`.
        #   Giving some flexibility to this test might help.
        context = contextlib.nullcontext
        if optimizer_ctor in (torch.optim.Adam, torch.optim.AdamW):
            from functools import partial
            context = partial(self.assertRaises, AssertionError)
        with context():
            # sets atol=1e-3 because we're comparing pure fp32 arithmetic vs a mixture of fp16 and fp32
            self._run_scaling_case(
                device, run, unskipped=3, skipped=1, atol=1e-3,
                optimizer_ctor=optimizer_ctor, optimizer_kwargs=optim_input.kwargs,
            )
            # this will be picked up by try_pickle within run():
            try_pickle = True
            self._run_scaling_case(
                device, run, unskipped=3, skipped=1, atol=1e-3,
                optimizer_ctor=optimizer_ctor, optimizer_kwargs=optim_input.kwargs,
            )

The following observations I made about the failing configs generated from _get_optim_inputs_including_global_cliquey_kwargs:

When optimizer_ctor is SGD, the test fails for the config {'weight_decay': 0.1, 'maximize': True, 'fused': True}.
When the context is partial(self.assertRaises, AssertionError) for Adam and AdamW, the tests fail for configs {'lr': 0.01, 'fused': False}, {'lr': 0.01, 'fused': True} with the error AssertionError: AssertionError not raised.
When I change the context to contextlib.nullcontext for Adam and AdamW (since I notice the AssertionError is not raised in observation 2), the tests fail for all the configs with the error AssertionError: Tensor-likes are not close!. In this case, I am confused as to why is the error being thrown even for the configs that failed in observation 2, the mismatch elements percentage is around 3.1% for {'lr': 0.01, 'fused': False}, {'lr': 0.01, 'fused': True} but either 39.1% or 100% for other configs.

Please let me know if I can provide any additional information or perform any other tests. I would be happy to work on this.

cc @vincentqb @jbschlosser @albanD @janeyx99 @crcrpar

The text was updated successfully, but these errors were encountered:

janeyx99 added module: optimizer Related to torch.optim triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels May 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test failures with new configs in test_grad_scaling_autocast in test_torch.py #126638

Test failures with new configs in test_grad_scaling_autocast in test_torch.py #126638

gambiTarun commented May 19, 2024 •

edited by pytorch-bot bot

Test failures with new configs in test_grad_scaling_autocast in test_torch.py #126638

Test failures with new configs in test_grad_scaling_autocast in test_torch.py #126638

Comments

gambiTarun commented May 19, 2024 • edited by pytorch-bot bot

gambiTarun commented May 19, 2024 •

edited by pytorch-bot bot