microsoft/rDSN: Robust Distributed System Nucleus (rDSN) is an open framework for quickly building and managing high performance and robust distributed systems. - rDSN

Robust Distributed System Nucleus (rDSN) is an open framework for quickly building and managing high performance and robust distributed systems.

Перейти к файлу

HX L 359bf38c48 Use std::snprintf instead of sprintf. (#226 )		2024-05-13 18:33:16 +08:00
bin	Some build fixes. (#223 )	2024-05-13 11:15:43 +08:00
deploy/docker	…
doc	fix bug in http message parser; fix documentation	2016-07-29 16:41:35 +08:00
ext	Some build fixes. (#223 )	2024-05-13 11:15:43 +08:00
include	Use std::snprintf instead of sprintf. (#226 )	2024-05-13 18:33:16 +08:00
resources	add rDSN full intro	2016-12-02 12:26:36 +08:00
src	Fix a warning that memset clears an object of non-trivial type. (#224 )	2024-05-13 11:32:31 +08:00
tutorial	- add scenario configurations	2016-10-12 16:23:51 +08:00
.gitignore	Support building with VS 2019 and Travis-CI Windows environment (#217 )	2019-04-08 16:20:07 +08:00
.gitmodules	add tools.log.monitor plugin and update all external plugins to their latest commits	2016-09-29 12:34:02 +08:00
.travis.yml	Support building with VS 2019 and Travis-CI Windows environment (#217 )	2019-04-08 16:20:07 +08:00
CMakeLists.txt	enable cross platform config.onecluster.ini	2016-10-11 13:10:40 +08:00
LICENSE	…
README.md	Update README.md	2016-12-02 12:37:57 +08:00
Release Note.txt	…
SECURITY.md	Microsoft mandatory file	2023-06-12 19:14:02 +00:00
appveyor.yml	fix zk start routine so they are friendly to appveyor	2016-08-15 09:37:35 +08:00
compile_thrift.py	enable compile_thrift.py running on Windows as well	2016-07-01 11:33:30 +08:00
run.cmd	Fix the issue that rDSN source directory contains blanks.	2017-08-13 21:37:41 +08:00
run.sh	move scripts to bin dir so they can be used by rDSN plugins	2016-08-10 21:12:29 +08:00

README.md

Robust Distributed System Nucleus (rDSN) is a framework for quickly building robust distributed systems. It has a microkernel for pluggable components, including applications, distributed frameworks, devops tools, and local runtime/resource providers, enabling their independent development and seamless integration. The project was originally developed for Microsoft Bing, and now has been adopted in production both inside and outside Microsoft.

[PPT]rDSN Full Introduction
What are the existing modules I can immediately use?
What scenaios are enabled by combining these modules differently?
How does rDSN build robustness?
Related papers
[Case] RocksDB made replicated using rDSN!
[Tutorial] A one-box cluster demo to understand how rDSN helps service registration, deployment, monitoring etc..
[Tutorial] Build a counter service with built-in tools (e.g., codegen, auto-test, fault injection, bug replay, tracing)
[Tutorial] Build a scalable and reliable counter service with built-in replication support
API Reference
Installation

Existing pluggable modules (and growing)

The core of rDSN is a service kernel with which we can develop (via Service API and Tool API) and plugin lots of different application, framework, tool, and local runtime modules, so that they can seamlessly benefit each other. Here is an incomplete list of the pluggable modules.

Pluggable modules	Description	Release
dsn.core	rDSN service kernel	1.0.0
dsn.dist.service.stateless	scale-out and fail-over for stateless services (e.g., micro services)	1.0.0
dsn.dist.service.stateful.type1	scale-out, replicate, and fail-over for stateful services (e.g., storage)	1.0.0
dsn.dist.service.meta_server	membership, load balance, and machine pool management for the above service frameworks	1.0.0
dsn.dist.uri.resolver	a client-side helper module that resolves service URL to target machine	1.0.0
dsn.dist.traffic.router	fine-grain RPC request routing/splitting/forking to multiple services (e.g., A/B test)	todo
dsn.tools.common	deployment runtime (e.g., network, aio, lock, timer, perf counters, loggers) for both Windows and Linux; simple toollets, such as tracer, profiler, and fault-injector	1.0.0
dsn.tools.nfs	an implementation of remote file copy based on rpc and aio	1.0.0
dsn.tools.emulator	an emulation runtime for whole distributed system emulation with auto-test, replay, global state checking, etc.	1.0.0
dsn.tools.hpc	high performance counterparts for the modules as implemented in tools.common	todo
dsn.tools.explorer	extracts task-level dependencies automatically	1.0.0
dsn.tools.log.monitor	collect critical logs (e.g., log-level >= WARNING) in cluster	1.0.0
dsn.app.simple_kv	an example application module	1.0.0

Scenarios by different module combination and configuration

rDSN provides flexible configuration so that developers can combine and configure the modules differently to enable different scenarios. All modules are loaded by dsn.svchost, a common process runner in rDSN, with the given configuration file. The following table lists some examples (note dsn.core is always required therefore omitted in Modules column).

Scenarios	Modules	Config	Demo
logic correctness development	dsn.app.simple_kv + dsn.tools.emulator + dsn.tools.common	config	todo
logic correctness with failure	dsn.app.simple_kv + dsn.tools.emulator + dsn.tools.common	config	todo
performance tuning	dsn.app.simple_kv + dsn.tools.common	config	todo
progressive performance tuning	dsn.app.simple_kv + dsn.tools.common + dsn.tools.emulator	config	todo
Paxos enabled stateful service	dsn.app.simple_kv + dsn.tools.common + dsn.tools.emulator + dsn.dist.uri.resolver + dsn.dist.serivce.meta_server + dsn.dist.service.stateful.type1	config	todo

There are a lot more possibilities. rDSN provides a web portal to enable quick deployment of these scenarios in a cluster, and allow easy operations through simple clicks as well as rich visualization. Deployment scenarios are defined here, and developers can add more on demand.

How does rDSN build robustness?

reduced system complexity via microkernel architecture: applications, frameworks (e.g., replication, scale-out, fail-over), local runtime libraries (e.g., network libraries, locks), and tools are all pluggable modules into a microkernel to enable independent development and seamless integration (therefore modules are reusable and transparently benefit each other)

flexible configuration with global deploy-time view: tailor the module instances and their connections on demand with configurable system complexity and resource allocation (e.g., run all nodes in one simulator for testing, allocate CPU resources appropriately for avoiding resource contention, debug with progressively added system complexity)

transparent tooling support: dedicated tool API for tool development; built-in plugged tools for understanding, testing, debugging, and monitoring the upper applications and frameworks

auto-handled distributed system challenges: built-in frameworks to achieve scalability, reliability, availability, and consistency etc. for the applications

Research papers

rDSN borrows the idea in many research work, from both our own and the others, and tries to make them real in production in a coherent way; we greatly appreciate the researchers who did these work.

Failure Recovery: When the Cure Is Worse Than the Disease, HotOS'13
SEDA: An Architecture for Well-Conditioned, Scalable Internet Services, SOSP'01
PacificA: replication in log-based distributed storage systems, MSR Tech Report
Rex: Replication at the Speed of Multi-core, Eurosys'14
Arming Cloud Services with Task Aspects, MSR Tech Report
D3S: Debugging Deployed Distributed Systems, NSDI'08
MoDIST: Transparent Model Checking of Unmodified Distributed Systems, NSDI'09
G2: A Graph Processing System for Diagnosing Distributed Systems, USENIX ATC'11
R2: An Application-Level Kernel for Record and Replay, OSDI'08
WiDS: an Integrated Toolkit for Distributed System Development, HotOS'05

License and Support

rDSN is provided on Windows and Linux, with the MIT open source license. You can use the "issues" tab in GitHub to report bugs.