Use accelerator apis to select device in setup.py and set visible
devices env in runner.py
---------
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
Fixed the Windows build.
Fixes applied:
- Remove some more ops that don't build on Windows.
- Remove the use of symlinks that didn't work correctly and replace with
`shutil.copytree()`.
- Small fixes to make the C++ code compile.
Tested with Python 3.9 and CUDA 12.1.
---------
Co-authored-by: Costin Eseanu <costineseanu@gmail.com>
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
Order of parameters in create_dir_symlink method looks wrong. Because
this we get the error "PermissionError: [WinError 5] Denied access:
'.\\deepspeed\\ops\\csrc'" when install deepspeed >= 0.4.0 on Windows
enviroment.
Please check this out @eltonzheng and @jeffra.
---------
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
This is an WIP PR that make op builder detection adapt to accelerator
change. This is followup of
https://github.com/microsoft/DeepSpeed/issues/5173
Currently, DeepSpeed generate `installed_ops` and `compatible_ops` at
setup time. If the system change to a different accelerator at DeepSpeed
launch time, these two list would contain incorrect information.
This PR intend to solve this problem with more flexity ops detection.
* For `installed_ops`, DeepSpeed should disable all installed ops if
accelerator detected at setup time is different from launch time.
* For `compatible_ops`, DeepSpeed should refresh the list for each
launch to avoid impact of accelerator change.
In the first step, nv-inference workflow is temporary change to emulate
the scenario that the system is setup with CPU_Accelerator, then launch
with CUDA_Accelerator. And CPU_Accelerator is modified to make Intel
Extension for PyTorch and oneCCL binding for PyTorch not mandatory.
Starting from here we can reconstruct installed_ops and compatible_ops
to follow the design above.
---------
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
#5192 reports an issue with the latest DeepSpeed release (0.13.3)
related to pre-compilation and the recently re-enabled `ninja` support
in #5088. Reverting to disabling `ninja` by default, but users can still
enable it with `DS_ENABLE_NINJA=1` until we can further debug to
understand the problem.
Deepspeed have too many ops now, and it take too many time to pre-build
all ops.
I notice deepspeed disabled `ninja` 4 years ago
(https://github.com/microsoft/DeepSpeed/pull/298) and I think we should
consider enable it now.
The issue mentioned in https://github.com/microsoft/DeepSpeed/pull/298
can be solved by resolving `include_dirs` to absolute path.
---------
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
Co-authored-by: Logan Adams <loadams@microsoft.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
* skip cpu support unimplemented error and update cpu inference workflow
* add torch.bfloat16 to cuda_accelerator
* remove UtilsBuilder skip
* fused adam can build
* use cpu adam to implement fused adam
* enable zero stage 1 and 2 for synchronized accelerator (a.k.a. CPU)
* remove unused parameters
* remove skip FusedAdamBuilder; add suported_dtypes
* fix format
* Revert "fix format"
Revert "remove skip FusedAdamBuilder; add suported_dtypes"
Revert "remove unused parameters"
Revert "enable zero stage 1 and 2 for synchronized accelerator (a.k.a. CPU)"
Revert "use cpu adam to implement fused adam"
Revert "fused adam can build"
---------
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Ma, Guokai <guokai.ma@intel.com>
* Abstract accelerator (step 2)
* more flex op_builder path for both installation and runtime
* add SpatialInferenceBuilder into cuda_accelerator.py
* use reflection to make cuda_accelerator adapt to CUDA op builder change automatically
* clean up deepspeed/__init__.py
* add comments in cuda_accelerator for no torch path
* Update deepspeed/env_report.py
Change env_report.py according to suggestion
Co-authored-by: Michael Wyatt <mrwyattii@gmail.com>
* reduce the range of try...except for better code clarity
* Add porting for deepspeed/ops/random_ltd/dropping_utils.py
* move accelerator to top directory and create symlink under deepspeed
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Michael Wyatt <mrwyattii@gmail.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
* remove any cupy install when setting up environments
* revert previous changes to run on cu111 runners
* fix for when no cupy is installed
* remove cupy uninstall for workflows not using latest torch version
* update to cu116 for inference tests
* fix pip uninstall line
* move python environment list to after DS install
* remove cupy uninstall
* re-add --forked
* fix how we get cupy version (should be based on nvcc version)
* Fix build issues on Windows
* small fix to complie with new version of Microsoft C++ Build Tools
Co-authored-by: Reza Yazdani <reyazda@microsoft.com>
Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
* Fix typos in docs/
* Fix typos in code comments and output strings
* Fix typos in the code itself
* Fix typos in tests/
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
* aio: test for libaio with various package managers
* aio: note typical tool used to install libaio package
* setup: abort with error if cannot build requested op
* setup: define op_envvar to return op build environment variable
* setup: call is_compatible once for each op
* setup: only print suggestion to disable op when its envvar not set
* setup: add method to abort from fatal error
* Revert "setup: add method to abort from fatal error"
This reverts commit 0e4cde6b0a.
* setup: add method to abort from fatal error
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>