ML model compiler for Cortex-M4F
Перейти к файлу
Michal Moskal 0aea2494dc add C generation and argmax models 2022-11-17 15:21:53 -08:00
.github/workflows move to node 14.x 2021-09-21 10:55:35 -07:00
c Fix expf() assembly 2020-10-09 15:57:39 +02:00
cli add C generation and argmax models 2022-11-17 15:21:53 -08:00
models New sample model 2021-08-19 14:10:58 -07:00
pxt rollup -> esbuild 2022-11-17 10:15:38 -08:00
sample add C generation and argmax models 2022-11-17 15:21:53 -08:00
src add C generation and argmax models 2022-11-17 15:21:53 -08:00
.clang-format Add C "SDK" 2020-11-13 11:30:50 +01:00
.gitignore Make the CLI run 2020-10-02 18:24:45 +02:00
CODE_OF_CONDUCT.md Initial CODE_OF_CONDUCT.md commit 2020-09-14 05:57:41 -07:00
LICENSE Initial LICENSE commit 2020-09-14 05:57:42 -07:00
Makefile Fix files 2020-10-15 12:55:38 +02:00
README.md add C generation and argmax models 2022-11-17 15:21:53 -08:00
SECURITY.md Initial SECURITY.md commit 2020-09-14 05:57:45 -07:00
build.js rollup -> esbuild 2022-11-17 10:15:38 -08:00
index.html allow setting random seed from web browser test 2021-07-06 13:28:09 -07:00
ml4f rollup -> esbuild 2022-11-17 10:15:38 -08:00
package.json rollup -> esbuild 2022-11-17 10:15:38 -08:00
rollup.config.ts Fix build config 2021-08-18 16:58:09 -07:00
tsconfig.json rollup -> esbuild 2022-11-17 10:15:38 -08:00
yarn.lock rollup -> esbuild 2022-11-17 10:15:38 -08:00

README.md

ML4F - Machine Learning model compiler for Cortex-M4F

ML4F takes a Keras sequential model as an input and compiles it directly to ARM Thumb machine code for Cortex-M4F and better (M7, M33 etc.). The performance (latency) is typically an order of magnitude better than the Tensorflow Lite for Microcontrollers interpreter (with float32 models).

The input model generally needs to be in Tensorflow.js format, but the command line tool can invoke Python scripts to convert from .h5 or .pb models. Once compiled, weights can be stored as float32 or float16.

The following operators are supported:

  • Conv1D
  • Conv2D
  • DepthwiseConv1D
  • DepthwiseConv2D
  • MaxPooling1D
  • MaxPooling2D
  • AveragePooling1D
  • AveragePooling2D
  • Dense
  • Activation
  • BatchNormalization

Plus some no-ops:

  • InputLayer
  • Dropout
  • Flatten
  • Reshape

Feel free to report what other operators might be useful (along with example models) via GitHub Issues.

Usage

npm i -g ml4f
ml4f my-model

Typical invocation might look like this:

ml4f           --basename model-float32 my-model.h5
ml4f --float16 --basename model-float16 built/converted.tfjs

First line compiles my-model.h5 using float32 weights, with results in built/model-float32.*. The second line compiles with float16 weights, using temporary file created by the first line to speed things up (Python TensorFlow is really slow to load). Results are in built/model-float16.*.

Run ml4f --help for more info.

You can also use it as a library from a browser (in which case it can only take TF.js models).

Evaluating models

You can pass --eval test.json option to evaluated the model on given input data - this will print confusion matrix and accuracy. The test.json has two fields x and y. The field x contains a batch of input tensors, and y a batch of output tensors, with proper nesting. For example, for input of shape 2x3 and output of shape 4:

{ 
  "x": [
    [ [ 0.1, 0.2, -0.3 ], [ 0.2, -0.22, 0 ] ],
    [ [ -0.1, 0.3, 0.1 ], [ 0.32, 0.2, 1 ] ]
  ],
  "y": [
      [ 0, 1, 0, 0 ],
      [ 1, 0, 0, 0 ]
  ]
}

If you have data as NumPy arrays, you can use the following snippet to save it as JSON:

import json
class NumpyEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, np.ndarray):
            return obj.tolist()
        return json.JSONEncoder.default(self, obj)

with open('test.json', 'w') as outfile:
    json.dump({"x": xs_test, "y": ys_test}, outfile, cls=NumpyEncoder)

Evaluation stats look like the following:

Accuracy: 0.9560
  245    0    1    2
    6   84    4    0
    3    2   73    0
    4    0    0   76

model: 12.75k; code: 2.46k (19.3%); arena: 4.38k; test 0.00k
total cycles: 225149 (2.680ms at 84MHz)

Architecture

The models are loaded using TensorFlow.js library. Each layer is first compiled separately, and the generated code is run in simulation (a JavaScript function is generated, where each line corresponds to a single assembly instruction). The results are compared with running the same layer in TensorFlow.js. This process can be disabled with --no-validate option. Then layers are composed and the final binary code is generated.

The binary is position-independent and can be loaded from any word-aligned address in flash or RAM. Look in sample/ folder for example invocation from C, or check out our MakeCode extension.

Compiling

yarn install
yarn watch
# in another window
http-server -c1

Then open http://localhost:8080/

Also, run ./ml4f in this folder.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.