зеркало из https://github.com/microsoft/pai.git
37b7823660 | ||
---|---|---|
.. | ||
bin | ||
conf | ||
doc | ||
src | ||
README.md | ||
build-internal.bat | ||
build.bat | ||
build.sh | ||
pom.xml |
README.md
Microsoft FrameworkLauncher
FrameworkLauncher (or Launcher for short) is built to enable running Large-Scale Long-Running Services inside YARN Containers without making changes to the Services themselves. It also supports Batch Jobs, such as TensorFlow, CNTK, etc.
Features
-
High Availability
- All Launcher and Hadoop components are Recoverable and Work Preserving. So, User Services is by designed No Down Time, i.e. always uninterrupted when our components shutdown, crash, upgrade, or even any kinds of outage for a long time.
- Launcher can tolerate many unexpected errors and has well defined Failure Model, such as dependent components shutdown, machine error, network error, configuration error, environment error, corrupted internal data, etc.
- User Services can be ensured to Retry on Transient Failures, Migrate to another Machine per User's Request, etc.
-
High Usability
- No User code changes needed to run the existing executable inside Container. User only need to setup the FrameworkDescription in Json format.
- RestAPI is supported.
- Work Preserving FrameworkDescription Update, such as change TaskNumber, add TaskRole on the fly.
- Migrate running Task per User's Request
- Override default ApplicationProgress per User's Request
-
Services Requirements
- Versioned Service Deployment
- ServiceDiscovery
- AntiaffinityAllocation: Services running on different Machines
-
Batch Jobs Requirements
- GPU as a Resource
- Port as a Resource
- GangAllocation: Start Services together
- KillAllOnAnyCompleted and KillAllOnAnyServiceCompleted
- Framework Tree Management: DeleteOnParentDeleted, StopOnParentStopped
- DataPartition
Build and Start
Dependencies
Compile-time dependencies:
- Apache Maven
- JDK 1.8+
Run-time dependencies:
- Hadoop 2.7.2 with YARN-7481 is required to support GPU as a Resource and Port as a Resource, if you do not need it, any Hadoop 2.7+ is fine.
- Apache Zookeeper
Build Launcher Distribution
Launcher Distribution is built into folder .\dist.
Windows cmd line:
.\build.bat
GNU/Linux cmd line:
./build.sh
Start Launcher Service
Launcher Distribution is required before Start Launcher Service.
Windows cmd line:
.\dist\start.bat
GNU/Linux cmd line:
./dist/start.sh
User Manual
See User Manual to learn how to use Launcher Service to Launch Framework.