ML model compiler for Cortex-M4F

Перейти к файлу

Michal Moskal 9aa4a94604 allow creating releases		2024-03-08 10:34:28 -08:00
.github/workflows	allow creating releases	2024-03-08 10:34:28 -08:00
c	…
cli	avoid node.js buffer problem; see #12	2023-03-23 10:48:35 -07:00
models	…
pxt	rollup -> esbuild	2022-11-17 10:15:38 -08:00
sample	add C generation and argmax models	2022-11-17 15:21:53 -08:00
src	add hack for nested sequential layers; fixes #16	2024-03-07 14:37:37 -08:00
.clang-format	…
.gitignore	…
CODE_OF_CONDUCT.md	…
LICENSE	…
Makefile	…
README.md	add Softmax as a separate layer	2024-03-07 14:35:09 -08:00
SECURITY.md	…
build.js	rollup -> esbuild	2022-11-17 10:15:38 -08:00
index.html	…
ml4f	rollup -> esbuild	2022-11-17 10:15:38 -08:00
package.json	v1.9.0	2024-03-08 10:29:50 -08:00
rollup.config.ts	…
tsconfig.json	rollup -> esbuild	2022-11-17 10:15:38 -08:00
yarn.lock	update tf.js: v3 -> v4	2023-08-16 17:18:57 +02:00

README.md

ML4F - Machine Learning model compiler for Cortex-M4F

ML4F takes a Keras sequential model as an input and compiles it directly to ARM Thumb machine code for Cortex-M4F and better (M7, M33 etc.). The performance (latency) is typically an order of magnitude better than the Tensorflow Lite for Microcontrollers interpreter (with float32 models).

The input model generally needs to be in Tensorflow.js format, but the command line tool can invoke Python scripts to convert from .h5 or .pb models. Once compiled, weights can be stored as float32 or float16.

The following operators are supported:

Conv1D
Conv2D
DepthwiseConv1D
DepthwiseConv2D
MaxPooling1D
MaxPooling2D
AveragePooling1D
AveragePooling2D
Dense
Softmax
Activation
BatchNormalization

Plus some no-ops:

InputLayer
Dropout
Flatten
Reshape

Feel free to report what other operators might be useful (along with example models) via GitHub Issues.

Usage

npm i -g ml4f
ml4f my-model

Typical invocation might look like this:

ml4f           --basename model-float32 my-model.h5
ml4f --float16 --basename model-float16 built/converted.tfjs

First line compiles my-model.h5 using float32 weights, with results in built/model-float32.*. The second line compiles with float16 weights, using temporary file created by the first line to speed things up (Python TensorFlow is really slow to load). Results are in built/model-float16.*.

Run ml4f --help for more info.

You can also use it as a library from a browser (in which case it can only take TF.js models).

Evaluating models

You can pass --eval test.json option to evaluated the model on given input data - this will print confusion matrix and accuracy. The test.json has two fields x and y. The field x contains a batch of input tensors, and y a batch of output tensors, with proper nesting. For example, for input of shape 2x3 and output of shape 4:

{ 
  "x": [
    [ [ 0.1, 0.2, -0.3 ], [ 0.2, -0.22, 0 ] ],
    [ [ -0.1, 0.3, 0.1 ], [ 0.32, 0.2, 1 ] ]
  ],
  "y": [
      [ 0, 1, 0, 0 ],
      [ 1, 0, 0, 0 ]
  ]
}

If you have data as NumPy arrays, you can use the following snippet to save it as JSON:

import json
class NumpyEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, np.ndarray):
            return obj.tolist()
        return json.JSONEncoder.default(self, obj)

with open('test.json', 'w') as outfile:
    json.dump({"x": xs_test, "y": ys_test}, outfile, cls=NumpyEncoder)

Evaluation stats look like the following:

Accuracy: 0.9560
  245    0    1    2
    6   84    4    0
    3    2   73    0
    4    0    0   76

model: 12.75k; code: 2.46k (19.3%); arena: 4.38k; test 0.00k
total cycles: 225149 (2.680ms at 84MHz)

Architecture

The models are loaded using TensorFlow.js library. Each layer is first compiled separately, and the generated code is run in simulation (a JavaScript function is generated, where each line corresponds to a single assembly instruction). The results are compared with running the same layer in TensorFlow.js. This process can be disabled with --no-validate option. Then layers are composed and the final binary code is generated.

The binary is position-independent and can be loaded from any word-aligned address in flash or RAM. Look in sample/ folder for example invocation from C, or check out our MakeCode extension.

Compiling

yarn install
yarn watch
# in another window
http-server -c1

Then open http://localhost:8080/

Also, run ./ml4f in this folder.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.