### Description
<!-- Describe your changes. -->
Temporarily disable fp16 gemm on CPU because it usually needs a
following Cast which offsets the gain. Need more fp16 operators
implementation and performance tuning.
Also fix a fusion error of LayerNormalization.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
This patch adds a closing curly bracket at the end of `settings.json`.
### Motivation and Context
`settings.json` is just not closed. It was accidentally removed at
4e6ea730d6
### Description
Enhanced SkipLayerNorm by implementing broadcasting for both CPU and
CUDA
### Motivation and Context
The input and skip tensors no longer have to be the same size which
means that it can accept data where the skip shape can be the same size
as the input shape, have a shape of {1, sequence_length, hidden_size},
or {sequence_length, hidden_size}.
---------
Co-authored-by: Tianlei Wu <tlwu@microsoft.com>
- Enable pyright and pylint (https://github.com/microsoft/pyright) in CI
- Enable pyright, pylint and bandit by default in VS code
Pylint has some good style checks. pyright is Microsoft's static type checker.