Граф коммитов

79 Коммитов

Автор SHA1 Сообщение Дата
Andrew Fitzgibbon 8fa75e67c0
More documentation (#1059)
* Improve intro, add MNIST
* Generate relu.cpp
* Add nnModule, mnist, Lambda
2021-09-09 12:17:04 +01:00
Andrew Fitzgibbon a4e2d73c15
Fix sqrl compile failure (#1057)
* Ensure script fails if one run fails
* Fix convert_to_ks per #1048
2021-09-07 09:05:23 +01:00
toelli-msft bf4c417c5f
sqrl upper bounds (#968)
Co-authored-by: Tom Ellis <tom-git@jaguarpaw.co.uk>
Co-authored-by: Andrew Fitzgibbon <awf@microsoft.com>
2021-09-06 19:04:44 +01:00
Andrew Fitzgibbon 9a87228a54
Rename, add comments, per #1023 (#1048) 2021-09-02 15:22:07 +01:00
Andrew Fitzgibbon c482f9c6b4
Doc: Add "under construction" notice 2021-09-01 20:47:02 +01:00
David Collier 0327e1a4ae
Correct order of arguments in vrelu3 aten implementation (#1034)
> The bug is not found by CI because we have some benchmarks which are _expected_ to give incorrect results (the `embedded_INCORRECT` benchmarks). Maybe we should disable those benchmarks in CI so that we can fail the build if another example breaks?

Made #1045
2021-09-01 10:11:14 +01:00
Andrew Fitzgibbon a79b395997
Implement vmap (#1019) 2021-08-27 22:04:37 +01:00
Andrew Fitzgibbon a26a38f67d
WIP Documentation tidies (#1012)
* Update AST cpp
* Move glossary into sphinx
* rm cg-todos.txt
* Restructure doc folders
2021-08-24 13:44:26 +01:00
David Collier 2d00e379c6
Fix PyTorch implementation of sqrl (#1027) 2021-08-24 11:17:53 +01:00
Tom Ellis 150f3d3c73 Remove TupledAD 2021-08-18 14:50:08 +01:00
Andrew Fitzgibbon c529d57017
Initial plumbing for vmap (#1010)
https://github.com/microsoft/knossos-ksc/pull/1010
2021-08-17 14:34:55 +01:00
David Collier 3310934e34
Rename revmap to map2 (#1016) 2021-08-17 09:08:26 +01:00
Tom Ellis d222e25ce7 Float sqrt(2.0) outside loop
Doesn't actually speed things up
2021-08-11 12:51:04 +01:00
Tom Ellis adf0f22e0f Add vgelu_embedded_cpp_aten 2021-08-11 12:51:04 +01:00
David Collier 57f32e9f4a
Add #includes to generated C++ (#1000) 2021-08-06 17:02:04 +01:00
David Collier 7f05afab62
Remove special case for ATen benchmark (#1001) 2021-08-06 16:58:59 +01:00
David Collier 146aeced61
Remove call to renamed function (#989) 2021-07-28 17:59:09 +01:00
Andrew Fitzgibbon ddd91c4a2a
vmap interface to elementwise ops (#970) 2021-07-28 16:59:30 +01:00
Tom Ellis 937d4ea8c3 Use sum in sqrl instead of mean
in a place where the two have the same effect but sum is likely to be
faster.  There is a lot of noise in the benchmarks so it's not
completely obvious it *is* faster.

Fixes https://github.com/microsoft/knossos-ksc/issues/967
2021-07-26 10:01:58 +01:00
Andrew Fitzgibbon 6cff51b0a5
Knossos.register, and delayed compilation (#960) 2021-07-24 13:38:12 +01:00
Andrew Fitzgibbon 0daf53e905
Clean pytorch_nice doc (#971) 2021-07-24 13:15:23 +01:00
Tom Ellis e79ac1db9f
Embedded C++ gelu benchmarks 2021-07-23 10:32:02 +01:00
Tom Ellis 0ab1a292db
Allow to set compiler flags in benchmarks and extend relu3 2021-07-21 10:42:14 +01:00
Andrew Fitzgibbon a498df8a59
Move pytorch examples higher (#957)
PyTorch examples were hidden below the optimized examples
2021-07-19 22:21:21 +01:00
David Collier 1305e89be8
Use torch::tensor instead of ks::tensor in entry points (#931) 2021-07-16 14:13:37 +01:00
David Collier 82aea2c764
Generate C++ entry points (#925) 2021-07-15 10:54:08 +01:00
David Collier e232149ee1
Rename module entry points (#947) 2021-07-13 15:57:56 +01:00
Andrew Fitzgibbon f179c8e485
Code health: torch.autograd Function, torch zeros, jax version (#945) 2021-07-13 15:22:16 +01:00
Andrew Fitzgibbon 90012065ec
Compile changed files only (#944)
When we make a new extension, emit the C++ file only if it has changed. Ninja will then not rebuild the extension.

Gives a 7x speedup on full tests including benchmark, and a 19x speedup on regular tests

Sends generated files to build/torch_extensions
2021-07-13 12:35:14 +01:00
Tom Ellis e115c770cf relu3 upper bounds 2021-07-12 14:27:11 +01:00
David Collier 76afa818d3 Move common parts of elementwise kernel into new header 2021-07-09 17:11:59 +01:00
David Collier 4f043d7343 Make boilerplate for elementwise operations reusable 2021-07-09 17:11:59 +01:00
David Collier 224041d409 Only support tensor elements of type ks_float 2021-07-09 17:11:59 +01:00
David Collier 940532dc50 Move checking of arguments into .cu file 2021-07-09 17:11:59 +01:00
Tom Ellis 2efb8dd94b Change embedding prefix to support embedding C++ in the future
Instead of calling the embedded ks function `vrelu3_ks_embeded_...` it
will be `vrelu3_embedded_ks_...`.  This will allow us to add
`vrelu3_embedded_cpp...`.
2021-06-22 16:38:19 +01:00
Colin Gravill bafd9c1dea
Use internal PyTorch vmap for vrelu3 (#881)
* Use internal vmap implementation

* Adjust how candidates functions are filtered
Wrappers *values* will sometimes show internal function name.
Better to filter over the identifier

* Comment vrelu3_pytorch_nice for now
Uses functionality currently unsupported by vmap
"RuntimeError: Batching rule not implemented for aten::is_nonzero. We could not generate a fallback."

* Add note about vmap stablity

* More explicit call, discussion:
https://github.com/microsoft/knossos-ksc/pull/881#discussion_r654523527
2021-06-21 16:14:04 +02:00
Tom Ellis 205daf422a Support "ks_embedded" benchmarks and add two of them 2021-06-15 18:59:10 +01:00
Tom Ellis a7cd2db5cf Complete vgelu example 2021-06-15 17:59:45 +01:00
Tom Ellis 1b48afacfe Use explicit floating point literal
Otherwise it is sent to ksc as an Integer
2021-06-15 17:58:46 +01:00
David Collier 14a1563a81 Comment out largest sqrl example due to OOM issue 2021-06-11 09:40:25 +01:00
Colin Gravill fe0771d745
Allow pytest benchmark to measure with or without CUDA transfer times (#854)
Allow benchmarking with and without CUDA transfers
Move device responsibility out of example
2021-06-07 09:28:51 +01:00
David Collier 663a5c07af
Alternative version of ts2mod based on recursively compiling needed functions (#838)
Co-authored-by: Andrew Fitzgibbon <awf@microsoft.com>
2021-06-03 16:21:31 +01:00
David Collier f20f21813a
Support tensor rank either 1 or 2 in CUDA vrelu3 (#815) 2021-05-24 15:24:33 +01:00
David Collier 7392c718ae
Add handwritten CUDA vrelu3 to benchmarks (#785) 2021-05-24 12:00:46 +01:00
Colin Gravill edb05342ad
Assorted linting fixes (#790)
* Flake8 linting fixes
* Spelling fixes
* Add imports
* Remove unused imports
* Apply autoflake, unused import mainly, but few redundant pass statements as well
2021-05-20 13:31:16 +01:00
Colin Gravill 35b9910851
Remove Black line length override (#774)
* Test removing Black line length override
i.e. make it 88
2021-05-12 15:10:41 +01:00
Colin Gravill 2f20004af4
Apply Black formatting (#757)
* Apply Black formatting

* Ensure Black formatting on PR
2021-05-06 16:21:12 +01:00
Andrew Fitzgibbon 387c9228cb
KSC Benchmarking (#645)
For each function foo defined in examples/folder/myeg.py, define:

foo: Torchscript implementation
foo_pt: PyTorch reference implementation, as fast as possible, possibly ugly
foo_bench_configs: return a list of arguments at which to benchmark the function
Then call run-bench myeg foo

Some steps towards #725 

Co-authored-by: Colin Gravill <colin@gravill.com>
2021-04-22 16:09:56 +01:00
Tom Ellis 32d3dc7980
Change edefs to take a single type rather than a list of types (#685)
Now that we are one-arg we don't need the latter

Related to https://github.com/microsoft/knossos-ksc/issues/681

Co-authored-by: Tom Ellis <tom-git@jaguarpaw.co.uk>
Co-authored-by: Andrew Fitzgibbon <awf@microsoft.com>
2021-04-01 10:12:10 +01:00
David Collier 00e33b8b2e
Add hand-optimized ks for GMM using buildFromSparse (#600) 2021-03-09 11:58:45 +00:00