2016-06-19 13:03:32 +03:00
|
|
|
# Multiverso Torch Binding API
|
|
|
|
|
2016-06-29 08:22:59 +03:00
|
|
|
## init(sync)
|
2016-06-19 13:03:32 +03:00
|
|
|
|
|
|
|
Initialize mutliverso.
|
|
|
|
|
|
|
|
This should be called only once before training at the beginning of the whole
|
|
|
|
project.
|
|
|
|
|
2016-06-29 08:22:59 +03:00
|
|
|
If sync is `true`, a sync server will be created. Otherwise an async server
|
|
|
|
will be created.
|
|
|
|
|
2016-07-04 08:53:02 +03:00
|
|
|
If a sync server is created, you *must* make sure every process call
|
|
|
|
`add` and `get` in the same order and for the same times. Otherwise some
|
|
|
|
processes will be blocked. In sync server mode, all `get` method will
|
|
|
|
return *exactly the same results*.
|
|
|
|
If a async server is created, there won't be limitations like a sync
|
|
|
|
server. But we can't make sure `get` method will return the same results.
|
|
|
|
If you want to get the same results in async server mode, you should use
|
|
|
|
`barrier` and `get` with the argument `sync` set to `true` to sync the
|
|
|
|
processes.
|
|
|
|
|
2016-06-19 13:03:32 +03:00
|
|
|
## barrier()
|
|
|
|
|
|
|
|
Set a barrier for all workers to wait.
|
|
|
|
|
|
|
|
Workers will wait until all workers reach a specific barrier.
|
|
|
|
|
|
|
|
## shutdown()
|
|
|
|
|
|
|
|
Shutdown multiverso.
|
|
|
|
|
|
|
|
This should be called only once after finishing training at the end of the whole
|
|
|
|
project.
|
|
|
|
|
|
|
|
## num_workers()
|
|
|
|
|
|
|
|
Return the total number of workers.
|
|
|
|
|
|
|
|
## worker_id()
|
|
|
|
|
|
|
|
Return the id (zero-based index) for current worker.
|
|
|
|
|
|
|
|
## TableHandler
|
|
|
|
|
|
|
|
`TableHandler` is an interface to sync different kinds of values.
|
|
|
|
|
|
|
|
In most cases, you are supposed to sync models (for initialization) and
|
|
|
|
gradients (during training) so as to let multiverso help you manage the models
|
|
|
|
in distributed environments. Currently, two types of `TableHandler` are
|
|
|
|
supported, namely `ArrayTableHandler` and `MatrixTableHandler`.
|
|
|
|
|
|
|
|
### ArrayTableHandler
|
|
|
|
|
|
|
|
`ArrayTableHandler` is used to sync array-like (one-dimensional) value.
|
|
|
|
|
2016-08-01 12:58:41 +03:00
|
|
|
Although the model tends to be a matrix, when using `torch.nn` package we can
|
2016-06-19 13:03:32 +03:00
|
|
|
get the flattened parameters and gradients with
|
|
|
|
[module.getParameters()](https://github.com/torch/nn/blob/master/doc/module.md#flatparameters-flatgradparameters-getparameters).
|
|
|
|
So in most cases, we should use `ArrayTableHandler` instead of
|
|
|
|
`MatrixTableHandler` we will introduce soon.
|
|
|
|
|
|
|
|
#### ArrayTableHandler:new(size)
|
|
|
|
|
|
|
|
Create a `ArrayTableHandler` for syncing array-like (one-dimensional) value.
|
|
|
|
|
|
|
|
The `size` should be a `number` equal to the size of value we want to sync.
|
|
|
|
|
2016-07-04 08:53:02 +03:00
|
|
|
If init_value is nil, zeros will be used to initialize the table, otherwise
|
|
|
|
the table will be initialized as the init_value.
|
2016-08-01 12:58:41 +03:00
|
|
|
*Notice*: Only the init_value from the master will be used!
|
2016-07-04 08:53:02 +03:00
|
|
|
|
2016-06-29 05:39:53 +03:00
|
|
|
#### ArrayTableHandler:add(data, sync)
|
2016-06-19 13:03:32 +03:00
|
|
|
|
|
|
|
Add a array-like (one-dimensional) data to the server.
|
|
|
|
|
|
|
|
The `data` should be a `torch.Tensor` or Lua `table`. During training process,
|
|
|
|
the data should be the gradients (delta value). The size of `data` must be equal
|
|
|
|
to the size specified in initialization.
|
|
|
|
|
2016-06-29 05:39:53 +03:00
|
|
|
`sync` should be a boolean value. The default value is false. If `sync` is
|
2016-06-29 08:22:59 +03:00
|
|
|
`true`, this call will blocked by IO until the call finish. Otherwise it will
|
2016-06-29 05:39:53 +03:00
|
|
|
return immediately
|
|
|
|
|
2016-06-19 13:03:32 +03:00
|
|
|
#### ArrayTableHandler:get()
|
|
|
|
|
|
|
|
Get the array-like (one-dimensional) value from the server.
|
|
|
|
|
|
|
|
The value we get will be a `torch.Tensor`. Usually, we are supposed to use
|
|
|
|
[Tensor:copy()](https://github.com/torch/torch7/blob/master/doc/tensor.md#self-copytensor)
|
|
|
|
to assign the value to desired destination.
|
|
|
|
|
|
|
|
### MatrixTableHandler
|
|
|
|
|
|
|
|
`MatrixTableHandler` is used to sync matrix-like (two-dimensional) value.
|
|
|
|
|
2016-07-04 08:53:02 +03:00
|
|
|
#### MatrixTableHandler:New(num_row, num_col, init_value)
|
2016-06-19 13:03:32 +03:00
|
|
|
|
|
|
|
Create a `MatrixTableHandler` for syncing matrix-like (two-dimensional) value.
|
|
|
|
|
|
|
|
The `num_row` should be the number of rows and the `num_col` should be the
|
|
|
|
number of columns. Both of them should be a `number` equal to the exact size of
|
|
|
|
value we want to sync.
|
|
|
|
|
2016-07-04 08:53:02 +03:00
|
|
|
If init_value is nil, zeros will be used to initialize the table, otherwise
|
|
|
|
the table will be initialized as the init_value.
|
|
|
|
*Notice*: if the init_value is different in different processes, the average of
|
|
|
|
them will be used.
|
|
|
|
|
2016-06-29 05:39:53 +03:00
|
|
|
#### MatrixTableHandler:add(data, row_ids, sync)
|
2016-06-19 13:03:32 +03:00
|
|
|
|
|
|
|
Add a matrix-like (two-dimensional) data to the server.
|
|
|
|
|
|
|
|
Same as the clarification in `ArrayTableHandler`, the `data` should be a
|
|
|
|
`torch.Tensor` or Lua `table` and we should pass the gradients (delta value) not
|
|
|
|
the exact value to it. The `row_ids` is an optional parameter and it should be
|
|
|
|
an array of 'row_id' numbers when specified. If specified, multiverso will only
|
|
|
|
update the value in specific rows and the size of `data` should be equal to the
|
|
|
|
size of value we want to update.
|
|
|
|
|
2016-06-29 05:39:53 +03:00
|
|
|
`sync` should be a boolean value. The default value is false. If `sync` is
|
2016-06-29 08:22:59 +03:00
|
|
|
`true`, this call will blocked by IO until the call finish. Otherwise it will
|
2016-06-29 05:39:53 +03:00
|
|
|
return immediately
|
|
|
|
|
2016-06-19 13:03:32 +03:00
|
|
|
#### MatrixTableHandler:get(row_ids)
|
|
|
|
|
|
|
|
Get the matrix-like (two-dimensional) value from the server.
|
|
|
|
|
|
|
|
The `row_ids` is an optional parameter and the interface works the same way as
|
|
|
|
`ArrayTableHandler` when `row_ids` is not specified. But when we pass an array
|
|
|
|
of `row_id` numbers, we will only get the value form specific rows. In this way,
|
|
|
|
we can not do a `Tensor:copy()` but have to deal with the value manually.
|