Generic batch processing framework for managing the orchestration, dispatch, fault tolerance, and monitoring of arbitrary work items against many endpoints. Extensible via dependency injection. Includes examples against Cognitive Service containers for ML eval workloads.
Перейти к файлу
Marko Radmilac 7664949619
Update packages (#39)
2024-07-10 07:54:35 -07:00
batchkit Add LID automatic processing in batchkit (#31) 2022-09-19 18:53:34 -07:00
batchkit_examples Update packages (#39) 2024-07-10 07:54:35 -07:00
tests Add LID automatic processing in batchkit (#31) 2022-09-19 18:53:34 -07:00
.gitignore Add LID automatic processing in batchkit (#31) 2022-09-19 18:53:34 -07:00
CODE_OF_CONDUCT.md Initial check-in (#1) 2020-07-02 08:18:44 -07:00
CONTRIBUTING.md Initial check-in (#1) 2020-07-02 08:18:44 -07:00
LICENSE Initial check-in (#1) 2020-07-02 08:18:44 -07:00
README.md Update README.md 2021-08-09 11:04:00 -07:00
SECURITY.md Initial check-in (#1) 2020-07-02 08:18:44 -07:00
requirements.txt Update packages (#39) 2024-07-10 07:54:35 -07:00
run-stress-tests.py Fix disconnected speechcontainers endpointstatus + other minor dev env refreshes (#38) 2024-03-20 04:07:05 -07:00
run-tests Fix disconnected speechcontainers endpointstatus + other minor dev env refreshes (#38) 2024-03-20 04:07:05 -07:00
setup_batchkit.py Add support for gsm audio file format (#29) 2021-09-10 20:40:12 -07:00
setup_examples.py Fix disconnected speechcontainers endpointstatus + other minor dev env refreshes (#38) 2024-03-20 04:07:05 -07:00
version Fix disconnected speechcontainers endpointstatus + other minor dev env refreshes (#38) 2024-03-20 04:07:05 -07:00

README.md

Introduction

Generic batch processing framework for managing the orchestration, dispatch, fault tolerance, and monitoring of arbitrary work items against many endpoints. Extensible via dependency injection. Worker endpoints can be local, remote, containers, cloud APIs, different processes, or even just different listener sockets in the same process.

Includes examples against Azure Cognitive Service containers for ML eval workloads.

Consuming

The framework can be built on via template method pattern and dependency injection. One simply needs to provide concrete implementation for the following types:

WorkItemRequest: Encapsulates all the details needed by the WorkItemProcessor to process a work item.

WorkItemResult: Representation of the outcome of an attempt to process a WorkItemRequest.

WorkItemProcessor: Provides implementation on how to process a WorkItemRequest against an endpoint.

BatchRequest: Represents a batch of work items to do. Produces a collection of WorkItemRequests.

BatchConfig: Details needed for a BatchRequest to produce the collection of WorkItemRequests.

BatchRunSummarizer: Implements a near-real-time status updater based on WorkItemResults as the batch progresses.

EndpointStatusChecker: Specifies how to determine whether an endpoint is healthy and ready to take on work from a WorkItemProcessor.

The Speech Batch Kit is currently our prime example for consuming the framework.

The batchkit package is available as an ordinary pypi package. See versions here: https://pypi.org/project/batchkit

Dev Environment

This project is developed for and consumed in Linux environments. Consumers also use WSL2, and other POSIX platforms may be compatible but are untested. For development and deployment outside of a container, we recommend using a Python virtual environment to install the requirements.txt. The Speech Batch Kit example builds a container.

Tests

This project uses both unit tests run-tests and stress tests run-stress-tests for functional verification.

Building

There are currently 3 artifacts:

  • The pypi library of the batchkit framework as a library.

  • The pypi library of the batchkit-examples-speechsdk.

  • Docker container image for speech-batch-kit.

Examples

Speech Batch Kit

The Speech Batch Kit (batchkit_examples/speech_sdk) uses the framework to produce a tool that can be used for transcription of very large numbers of audio files against Azure Cognitive Service Speech containers or cloud endpoints.

For introduction, see the Azure Cognitive Services page.

For detailed information, see the Speech Batch Kit's README.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.

When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repositories using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.