История

Daiyi Peng 24e25d2040 Merged PR 310247: Remove broadcastSync and executeSync. Remove broadcastSync and executeSync.		2017-06-29 19:25:36 +00:00
..
README.md	Merged PR 293569: Implement anonymous function execution.	2017-06-15 18:10:11 +00:00
bench-utils.ts	Merged PR 291952: Add benchmark to napajs.	2017-06-12 17:10:21 +00:00
bench.ts	Merged PR 291952: Add benchmark to napajs.	2017-06-12 17:10:21 +00:00
execute-overhead.ts	Merged PR 310247: Remove broadcastSync and executeSync.	2017-06-29 19:25:36 +00:00
execute-scalability.ts	Merged PR 310247: Remove broadcastSync and executeSync.	2017-06-29 19:25:36 +00:00
index.ts	Merged PR 291952: Add benchmark to napajs.	2017-06-12 17:10:21 +00:00
node-napa-perf-comparison.ts	Merged PR 310247: Remove broadcastSync and executeSync.	2017-06-29 19:25:36 +00:00
package.json	Merged PR 291952: Add benchmark to napajs.	2017-06-12 17:10:21 +00:00
store-overhead.ts	Merged PR 291952: Add benchmark to napajs.	2017-06-12 17:10:21 +00:00
transport-overhead.ts	Merged PR 291952: Add benchmark to napajs.	2017-06-12 17:10:21 +00:00
tsconfig.json	Merged PR 291952: Add benchmark to napajs.	2017-06-12 17:10:21 +00:00

README.md

Benchmark

Summary

JavaScript execution in napajs is on par with node, using the same version of V8, which is expected.
zone.execute scales linearly on number of workers, which is expected.
The overhead of calling zone.execute from node is around 0.1ms after warm-up, zone.executeSync is around 0.2ms. The cost of using anonymous function is neglectable.
transport.marshall cost on small plain JavaScript values is about 3x of JSON.stringify.
The overhead of store.set and store.get is around 0.06ms plus transport overhead on the objecs.

Napa vs. Node on JavaScript execution

Please refer to node-napa-perf-comparison.ts.

node time	napa time
3026.76	3025.81

Linear scalability

zone.execute scales linearly on number of workers. We performed 1M CRC32 calls on a 1024-length string on each worker, here are the numbers. We still need to understand why the time of more workers running parallel would beat less workers.

	node	napa - 1 worker	napa - 2 workers	napa - 4 workers	napa - 8 workers
time	8,649521600	6146.98	4912.57	4563.48	6168.41
cpu%	~15%	~15%	~27%	~55%	~99%

Please refer to execute-scalability.ts for test details.

Execute overhead

The overhead of zone.execute includes

marshalling cost of arguments in caller thread.
queuing time before a worker can execute.
unmarshalling cost of arguments in target worker.
marshalling cost of return value from target worker.
(executeSync only) internal delay in waiting for execute finish.
queuing time before caller callback is notified.
unmarshalling cost of return value in caller thread.

In this section we will examine #2, #5 and #6. So we use empty function with no arguments and no return value.

Transport overhead (#1, #3, #4, #7) varies by size and complexity of payload, will be benchmarked separately in Transport Overhead section.

Please refer to execute-overhead.ts for test details.

Overhead after warm-up

Average overhead is around 0.06ms to 0.12ms for zone.execute, and around 0.16ms for zone.executeSync.

repeat	zone.execute (ms)	zone.executeSync (ms)
200	24.932	31.55
5000	456.893	905.972
10000	810.687	1799.866
50000	3387.361	8169.023

*10000 times of zone.executeSync on anonymouse function is 1780.241ms. The gap is within range of bench noise.

Overhead during warm-up:

Sequence of call	Time (ms)
1	6.040
2	4.065
3	5.250
4	4.652
5	1.572
6	1.366
7	1.403
8	1.213
9	0.450
10	0.324
11	0.193
12	0.238
13	0.191
14	0.230
15	0.203
16	0.188
17	0.188
18	0.181
19	0.185
20	0.182

Transport overhead

The overhead of transport.marshall includes

overhead of needing replacer callback during JSON.stringify. (even empty callback will slowdown JSON.stringfiy significantly)
traverse every value during JSON.stringify, to check value type and get cid to put into payload.
- a. If value doesn't need special care.
- b. If value is a transportable object that needs special care.

2.b is related to individual transportable classes, which may vary per individual class. Thus we examine #1 and #2.a in this test.

The overhead of transport.unmarshall includes

overhead of needing reviver callback during JSON.parse.
traverse every value during JSON.parse, to check if object has _cid property.
- a. If value doesn't have property _cid.
- b. Otherwise, find constructor and call the Transportable.marshall.

We also evaluate only #1, #2.a in this test.

Please refer to transport-overhead.ts for test details.

*All operations are repeated for 1000 times.

payload type	size	JSON.stringify (ms)	transport.marshall (ms)	JSON.parse (ms)	transport.unmarshall (ms)
1 level - 10 integers	91	4.90	18.05 (3.68x)	3.50	17.98 (5.14x)
1 level - 100 integers	1081	65.45	92.78 (1.42x)	20.45	122.25 (5.98x)
10 level - 2 integers	18415	654.40	2453.37 (3.75x)	995.02	2675.72 (2.69x)
2 level - 10 integers	991	19.74	66.82 (3.39x)	27.85	138.45 (4.97x)
3 level - 5 integers	1396	33.66	146.33 (4.35x)	51.54	189.07 (3.67x)
1 level - 10 strings - length 10	201	3.81	10.17 (2.67x)	9.46	20.81 (2.20x)
1 level - 100 strings - length 10	2191	76.53	115.74 (1.51x)	77.71	181.24 (2.33x)
2 level - 10 strings - length 10	2091	30.15	97.65 (3.24x)	95.51	213.20 (2.23x)
3 level - 5 strings - length 10	2646	41.95	155.42 (3.71x)	123.82	227.90 (1.84x)
1 level - 10 strings - length 100	1101	7.74	12.19 (1.57x)	17.34	29.83 (1.72x)
1 level - 100 strings - length 100	11191	66.17	112.83 (1.71x)	197.67	282.63 (1.43x)
2 level - 10 strings - length 100	11091	68.46	149.99 (2.19x)	202.85	298.19 (1.47x)
3 level - 5 integers	13896	89.46	208.21 (2.33x)	265.25	418.42 (1.58x)
1 level - 10 booleans	126	2.84	8.14 (2.87x)	3.06	14.20 (4.65x)
1 level - 100 booleans	1341	20.28	59.36 (2.93x)	21.59	121.15 (5.61x)
2 level - 10 booleans	1341	23.92	89.62 (3.75x)	31.84	137.92 (4.33x)
3 level - 5 booleans	1821	36.15	138.24 (3.82x)	55.71	195.50 (3.51x)

Store access overhead

The overhead of store.set includes

overhead of calling transport.marshall on value.
overhead of put marshalled data and transport context into C++ map (with exclusive_lock).

The overhead of store.get includes

overhead of getting marshalled data and transport context from C++ map (with shared_lock).
overhead of calling transport.unmarshall on marshalled data.

For store.set, numbers below indicates the cost beyond marshall is around 0.07~0.4ms varies per payload size. (10B to 18KB). store.get takes a bit more: 0.06~0.9ms with the same payload size varance. If the value in store is not updated frequently, it's always good to cache it in JavaScript world.

Please refer to store-overhead.ts for test details.

*All operations are repeated for 1000 times.

payload type	size	transport.marshall (ms)	store.save (ms)	transport.unmarshall (ms)	store.get (ms)
1 level - 1 integers	10	2.54	73.85	3.98	65.57
1 level - 10 integers	91	8.27	98.55	17.23	90.89
1 level - 100 integers	1081	97.10	185.31	144.75	274.39
10 level - 2 integers	18415	2525.18	2973.17	3093.06	3927.80
2 level - 10 integers	991	71.22	174.01	154.76	276.04
3 level - 5 integers	1396	127.06	219.73	182.27	337.59
1 level - 10 strings - length 10	201	14.43	79.68	31.28	84.71
1 level - 100 strings - length 10	2191	104.40	212.44	173.32	239.09
2 level - 10 strings - length 10	2091	79.54	188.72	189.29	252.83
3 level - 5 strings - length 10	2646	155.14	257.78	276.22	342.95
1 level - 10 strings - length 100	1101	15.22	89.84	30.87	88.18
1 level - 100 strings - length 100	11191	119.89	284.05	287.17	403.77
2 level - 10 strings - length 100	11091	137.10	299.32	244.13	297.12
3 level - 5 integers	13896	183.84	310.89	285.80	363.50
1 level - 10 booleans	126	5.74	49.89	22.69	97.27
1 level - 100 booleans	1341	57.41	157.80	106.30	218.05
2 level - 10 booleans	1341	76.93	150.25	104.02	185.82
3 level - 5 booleans	1821	102.47	171.44	150.42	207.27