15d8ff2de2 | ||
---|---|---|
.. | ||
AdvancedLabs | ||
BasicLabs | ||
Prerequisites.md | ||
README.md |
README.md
Labs
The labs consist of basic labs and advanced labs. In this session, we designed experimental courses from the perspective of system research.
Encourage students to implement and optimize system modules by operating and applying mainstream and latest frameworks, platforms and tools to improve their ability to solve practical problems, not just understanding the use of tools.
Target users
- Junior and Senior students in colleges
- Graduate students
Experimental design goals
This experimental course is designed from the perspective of system research. Through the operation and application of mainstream and latest frameworks, platforms and tools, students are encouraged to implement and optimize system modules to improve their ability to solve practical problems, rather than just understanding the use of tools.
Experimental design features
-
Provide a unified framework, platform and tools.
-
Design an operable experiments content.
-
The experiment content of universal design is convenient to deepen and improve according to the characteristics of the universities.
-
Get started with practical engineering projects and deepen the understanding of AI systems.
Contents
Basic Labs
Lab No. |
Lab Name | Remarks |
Prerequisites | Setup Environment | Setup envoironment for the experiments |
Lab 1 | A simple end-to-end AI example, from a system perspective |
Understand the systems from debug info and system logs |
Lab 2 | Customize operators | Design and implement a customized operator (both forward and backward) in python |
Lab 3 | CUDA implementation | Add a CUDA implementation for the customized operator |
Lab 4 | AllReduce implementation | Improve AllReduce on Horovod: implement a lossy compression (3LC) on GPU for low-bandwidth network |
Lab 5 | Configure containers for customized training and inference | Configure containers |
Advanced Labs
Lab No. |
Lab Name | Remarks |
Lab 6 | Scheduling and resource management system | Get familiar with OpenPAI or KubeFlow |
Lab 7 | Distributed training | Try different kinds of all reduce implementations |
Lab 8 | AutoML | Search for a new neural network structure for Image/NLP tasks |
Lab 9 | RL Systems | Configure and get familiar with one of the following RL Systems: RLlib, … |