> The bug is not found by CI because we have some benchmarks which are _expected_ to give incorrect results (the `embedded_INCORRECT` benchmarks). Maybe we should disable those benchmarks in CI so that we can fail the build if another example breaks?
Made #1045
in a place where the two have the same effect but sum is likely to be
faster. There is a lot of noise in the benchmarks so it's not
completely obvious it *is* faster.
Fixes https://github.com/microsoft/knossos-ksc/issues/967
When we make a new extension, emit the C++ file only if it has changed. Ninja will then not rebuild the extension.
Gives a 7x speedup on full tests including benchmark, and a 19x speedup on regular tests
Sends generated files to build/torch_extensions
Instead of calling the embedded ks function `vrelu3_ks_embeded_...` it
will be `vrelu3_embedded_ks_...`. This will allow us to add
`vrelu3_embedded_cpp...`.
* Use internal vmap implementation
* Adjust how candidates functions are filtered
Wrappers *values* will sometimes show internal function name.
Better to filter over the identifier
* Comment vrelu3_pytorch_nice for now
Uses functionality currently unsupported by vmap
"RuntimeError: Batching rule not implemented for aten::is_nonzero. We could not generate a fallback."
* Add note about vmap stablity
* More explicit call, discussion:
https://github.com/microsoft/knossos-ksc/pull/881#discussion_r654523527
For each function foo defined in examples/folder/myeg.py, define:
foo: Torchscript implementation
foo_pt: PyTorch reference implementation, as fast as possible, possibly ugly
foo_bench_configs: return a list of arguments at which to benchmark the function
Then call run-bench myeg foo
Some steps towards #725
Co-authored-by: Colin Gravill <colin@gravill.com>
Now that we are one-arg we don't need the latter
Related to https://github.com/microsoft/knossos-ksc/issues/681
Co-authored-by: Tom Ellis <tom-git@jaguarpaw.co.uk>
Co-authored-by: Andrew Fitzgibbon <awf@microsoft.com>