DaCe backend: set unstructured_horizontal_has_unit_stride=True#1130
DaCe backend: set unstructured_horizontal_has_unit_stride=True#1130havogt wants to merge 12 commits into
Conversation
|
cscs-ci run default |
|
cscs-ci run distributed |
|
cscs-ci run distributed |
|
cscs-ci run distributed |
|
cscs-ci run default |
|
cscs-ci run dace |
|
@msimberg Do you have any idea why the distributed CI pipeline is failing? It seems that the test passes. |
|
@edopao hmm, on the job you linked it's useful to know the following:
in this case the output is ...directly from that I can't tell if it's completely off or just something in the numerics that has changed the error slightly such that you just need to change tolerances |
|
cscs-ci run distributed |
|
cscs-ci run dace |
|
I think once #1102 is merged we should try this again. We should get a max diff on the one rank that failed in the distributed tests. I suppose this option can change code generation and thus slightly change how errors accumulate? We might just be slightly changing the error on one test with this option. |
|
I've bumped gt4py to main (which as far as I can tell includes the patch). |
|
cscs-ci run distributed |
|
Ok, it wasn't quite that easy... something in the version bump broke other things. |
|
cscs-ci run distributed |
|
cscs-ci run default |
|
cscs-ci run distributed |
|
Using This happens on one test with dace_cpu (dace_gpu is still not tested in CI, so unsure if it's affected). This looks... ok to me? It's not great if we increase the difference to serialized data, but this is exactly one element going above the tolerance. @edopao @havogt @philip-paul-mueller do you have any reason to suspect that this change would be due to a bug or should we go ahead and slightly change the tolerance on that test? |
|
I think there is no reason against increasing the tolerance. If the error moves you could also try to run it with the same cache and disabled OpenMP. |
I was working on my project and I found some bugs in the lowering to SDFG. I have tested the change in this PR together with my bug fixes (see #1247, one test fails but it's unrelated) and the CI error we saw on this PR goes away.
|
cscs-ci run distributed |
|
Just adding this as a reminder before merging: we added GT4PY_UNSTRUCTURED_HORIZONTAL_HAS_UNIT_STRIDE=1 to the readme in #1221. I guess this would allow removing that? |
Good point. Yes, we should test this as well. |
|
cscs-ci run distributed |
|
cscs-ci run dace |
|
cscs-ci run distributed |
|
Mandatory Tests Please make sure you run these tests via comment before you merge!
Optional Tests To run benchmarks you can use:
To run tests and benchmarks with the DaCe backend you can use:
To run test levels ignored by the default test suite (mostly simple datatest for static fields computations) you can use:
For more detailed information please look at CI in the EXCLAIM universe. |
No description provided.