4.8 KiB
Multiverso Torch Binding API
init(sync)
Initialize mutliverso.
This should be called only once before training at the beginning of the whole project.
If sync is true
, a sync server will be created. Otherwise an async server
will be created.
If a sync server is created, you must make sure every process call
add
and get
in the same order and for the same times. Otherwise some
processes will be blocked. In sync server mode, all get
method will
return exactly the same results.
If a async server is created, there won't be limitations like a sync
server. But we can't make sure get
method will return the same results.
If you want to get the same results in async server mode, you should use
barrier
and get
with the argument sync
set to true
to sync the
processes.
barrier()
Set a barrier for all workers to wait.
Workers will wait until all workers reach a specific barrier.
shutdown()
Shutdown multiverso.
This should be called only once after finishing training at the end of the whole project.
num_workers()
Return the total number of workers.
worker_id()
Return the id (zero-based index) for current worker.
TableHandler
TableHandler
is an interface to sync different kinds of values.
In most cases, you are supposed to sync models (for initialization) and
gradients (during training) so as to let multiverso help you manage the models
in distributed environments. Currently, two types of TableHandler
are
supported, namely ArrayTableHandler
and MatrixTableHandler
.
ArrayTableHandler
ArrayTableHandler
is used to sync array-like (one-dimensional) value.
Although the model tends to be a matrix, when using torch.nn
package we can
get the flattened parameters and gradients with
module.getParameters().
So in most cases, we should use ArrayTableHandler
instead of
MatrixTableHandler
we will introduce soon.
ArrayTableHandler:new(size)
Create a ArrayTableHandler
for syncing array-like (one-dimensional) value.
The size
should be a number
equal to the size of value we want to sync.
If init_value is nil, zeros will be used to initialize the table, otherwise the table will be initialized as the init_value. Notice: Only the init_value from the master will be used!
ArrayTableHandler:add(data, sync)
Add a array-like (one-dimensional) data to the server.
The data
should be a torch.Tensor
or Lua table
. During training process,
the data should be the gradients (delta value). The size of data
must be equal
to the size specified in initialization.
sync
should be a boolean value. The default value is false. If sync
is
true
, this call will blocked by IO until the call finish. Otherwise it will
return immediately
ArrayTableHandler:get()
Get the array-like (one-dimensional) value from the server.
The value we get will be a torch.Tensor
. Usually, we are supposed to use
Tensor:copy()
to assign the value to desired destination.
MatrixTableHandler
MatrixTableHandler
is used to sync matrix-like (two-dimensional) value.
MatrixTableHandler:New(num_row, num_col, init_value)
Create a MatrixTableHandler
for syncing matrix-like (two-dimensional) value.
The num_row
should be the number of rows and the num_col
should be the
number of columns. Both of them should be a number
equal to the exact size of
value we want to sync.
If init_value is nil, zeros will be used to initialize the table, otherwise the table will be initialized as the init_value. Notice: if the init_value is different in different processes, the average of them will be used.
MatrixTableHandler:add(data, row_ids, sync)
Add a matrix-like (two-dimensional) data to the server.
Same as the clarification in ArrayTableHandler
, the data
should be a
torch.Tensor
or Lua table
and we should pass the gradients (delta value) not
the exact value to it. The row_ids
is an optional parameter and it should be
an array of 'row_id' numbers when specified. If specified, multiverso will only
update the value in specific rows and the size of data
should be equal to the
size of value we want to update.
sync
should be a boolean value. The default value is false. If sync
is
true
, this call will blocked by IO until the call finish. Otherwise it will
return immediately
MatrixTableHandler:get(row_ids)
Get the matrix-like (two-dimensional) value from the server.
The row_ids
is an optional parameter and the interface works the same way as
ArrayTableHandler
when row_ids
is not specified. But when we pass an array
of row_id
numbers, we will only get the value form specific rows. In this way,
we can not do a Tensor:copy()
but have to deal with the value manually.