distributed_word_embedding

История

Guoqing Liu (MSR Student-Person Consulting) 856ce6c648 add some comments		2015-11-09 12:45:33 +08:00
..
Readme.txt	input arguments format doc	2015-11-08 18:22:59 +08:00
run.bat	add some comments	2015-11-09 12:45:33 +08:00

Readme.txt

 Usage:
-size: word embedding size, e.g. 300
-train_file: the training corpus file, e.g.enwik2014
-read_vocab: the file to read all the vocab counts info
-binary: 0 or 1,indicates whether to write all the embeddings vectors into binary format
-cbow: 0 or 1, default 1, whether to use cbow, otherwise skip-gram
-alpha: initial learning rate, usually set to 0.025
-output: the output file to store all the embedding vectors
-window: the window size
-sample: the sub - sample size, usually set to 0
-hs: 0 or 1, default 1, whether to use hierarchical softmax, otherwise negative-sampling
-negative: the negative word count in negative sampling, please set it to 0 when - hs = 1
-threads: the thread number to run in one machine
-min_count: words with lower frequency than min_count is removed from dictionary
-epoch: the epoch number
-stopwords: 0 or 1, whether to avoid training stop words
-sw_file: the stop words file storing all the stop words, valid when -stopwords = 1
-use_adagrad: 0 or 1, whether to use adagrad to adjust learning rate
-data_block_size: default 1MB, the maximum bytes which a data block will store
-max_preload_data_size: default 8GB, the maximum data size(bytes) which multiverse_WordEmbedding will preload
-num_servers: default 0, the parameter of multiverso.Separately, 0 indicates all precesses are servers
-num_aggregator: default 1, number of aggregation threads in a process
-max_delay: default 0, the delay bound(max staleness)
-num_lock: default 100, number of locks in Locked option
-is_pipeline: 0 or 1, whether to use pipeline
-lock_option: default 0, Lock option. 0 : the trheads do not write and there is no contention; 1:there is no lock for thread contention; 2:normal lock for thread contention
-server_endpoint_file: default "", server ZMQ socket endpoint file in MPI - free version