hlums
aeb9486a6a
Updated wikigold utils to be consistent with other datasets.
2019-06-23 21:57:04 +00:00
hlums
7c35f670d2
Added probabilities output to BERT token classifier.
2019-06-23 21:55:51 +00:00
hlums
7b827d6d5b
Updated ner token preprocessing for Chinese text.
2019-06-23 21:53:06 +00:00
hlums
480a08f544
Updated NER notebook with new tokenizer api.
2019-06-23 21:51:34 +00:00
Said Bleik
35fc04c383
Merge pull request #113 from microsoft/hlu/two_sequence_utils_and_XNLI_notebook
...
Hlu/two sequence utils and xnli notebook
2019-06-21 17:39:07 -04:00
Said Bleik
9c3a95159c
add sequential loader test
2019-06-21 17:37:03 -04:00
Hong Lu
fb3f7ddef6
Moved _truncate_seq_pair outside of if else block.
2019-06-21 14:48:10 -04:00
Said Bleik
399707e747
meh
2019-06-21 14:43:09 -04:00
Said Bleik
65f1c82a81
arg name change
2019-06-21 14:41:39 -04:00
Said Bleik
8b56eec5cd
updated defaults for predict's output
2019-06-21 14:24:40 -04:00
Said Bleik
aec2ffbfe7
add namedtuple preds output
2019-06-21 13:17:04 -04:00
Said Bleik
136cadf0fe
lm name changes
2019-06-21 13:16:43 -04:00
Hong Lu
c53edc41b3
Added training and prediction time to notebook.
2019-06-21 10:50:20 -04:00
Said Bleik
c0951fae7c
rem data_loader
2019-06-20 14:44:18 -04:00
Said Bleik
d53c17e1ed
minor edit to preds
2019-06-20 14:37:47 -04:00
Said Bleik
fad2564604
added optional prob dist predictions
2019-06-20 14:23:09 -04:00
Hong Lu
2010daf637
Removed redundant code.
2019-06-20 14:08:40 -04:00
Hong Lu
76fa9d7ed4
Fixed formatting.
2019-06-20 14:01:48 -04:00
Said Bleik
4388e6c588
add whole-word pretrained models
2019-06-20 13:28:10 -04:00
hlums
ed3415b320
Updated utils of XNLI dataset.
2019-06-19 21:06:38 +00:00
hlums
593bb4eb5d
Added convert_to_unicode helper function.
2019-06-19 21:06:15 +00:00
hlums
946a687729
Resolved confict with staging.
2019-06-19 21:05:53 +00:00
Said Bleik
b407204c79
add sequential loader
2019-06-19 16:12:20 -04:00
Said Bleik
5245afafdd
add dask data loader
2019-06-19 14:54:33 -04:00
Casey Hong
236a64e9d8
update stsbenchmark 📓
2019-06-18 17:20:47 -04:00
Janhavi Mahajan
074bca3619
black formatter
2019-06-18 16:49:08 -04:00
Janhavi Mahajan
7f1bb8a039
bug fix: sts-benchmark has extra tabs in some rows which caused incorrect reading of pandas df or azureml dataflow object
2019-06-18 16:49:08 -04:00
hlums
e09a54512a
Resolved conflict and merged from staging.
2019-06-18 15:54:55 +00:00
hlums
7cba858a42
Updated docstring.
2019-06-18 14:57:49 +00:00
hlums
e5b5af4c78
Added warmup and support for two-sequence classification.
2019-06-17 22:40:24 +00:00
Miguel González-Fierro
818f2ef3d3
Merge pull request #78 from microsoft/liqun-first-pull
...
GenSen on AML deep dive notebook (sentence similarity)
2019-06-17 23:46:44 +02:00
Miguel González-Fierro
800ab9ac00
Merge pull request #103 from microsoft/bleik
...
update xnli dataset utils
2019-06-17 23:40:56 +02:00
Liqun Shao
0af38f7ccb
make changes on notebook based on the structure change
2019-06-17 16:57:09 -04:00
Liqun Shao
cc0fabcd55
add docstring for training on gpu only
2019-06-17 16:57:09 -04:00
Liqun Shao
70733e9d77
change the structure
2019-06-17 16:57:09 -04:00
Abhiram E
f7a14146ea
Fixed model state empty bug
2019-06-17 16:57:09 -04:00
Liqun Shao
cc6cf46a28
format the files
2019-06-17 16:57:09 -04:00
Liqun Shao
a7e0555235
fix the path
2019-06-17 16:57:09 -04:00
Liqun Shao
d200d525fe
remove unwanted logs; fix the bug for getting training time
2019-06-17 16:57:09 -04:00
Liqun Shao
ae2f8f4d2a
Make the following changes to increase the performance of horovod
...
training.
1. Add random seeds for iterators
2. learning rate=lr*hvd.size()
3. sync the optimizer
4. remove DataParallel
2019-06-17 16:57:09 -04:00
Liqun Shao
9e504182c3
fix the issue min_epoch_loss not updated during training, then it will never stop; min_epoch_loss always eqals to val_epoch_loss
2019-06-17 16:57:09 -04:00
Liqun Shao
4fa8bcc50f
fix import
2019-06-17 16:57:09 -04:00
Abhiram E
a065ae9bb0
Removed nested path joins
2019-06-17 16:57:09 -04:00
Abhiram E
eb6719d9ec
Minor fix to docstrings
2019-06-17 16:57:08 -04:00
Liqun Shao
0cff6731bb
remove unwanted log
2019-06-17 16:57:08 -04:00
Abhiram E
f8ffc290b7
Refactored gensen train.py
2019-06-17 16:57:08 -04:00
Liqun Shao
4369138371
add docstring to utils.py
2019-06-17 16:57:08 -04:00
Abhiram E
1ab6490348
Resolved comments on the Gensen code
2019-06-17 16:57:08 -04:00
Abhiram E
13ec0a36a9
Moved prints to logging
2019-06-17 16:57:08 -04:00
Abhiram E
f9a0bd1435
Added docstrings to Gensen code and refactored based on code review comments
2019-06-17 16:57:08 -04:00
Liqun Shao
b670abbb88
add original code source to all the code
2019-06-17 16:57:08 -04:00
Liqun Shao
45811ef9b6
add aml explanation
2019-06-17 16:57:08 -04:00
Liqun Shao
349958bafb
1. correct typo in the notebook
...
2. add header to all the python files
3. add comments for train.py to explain what does it do
2019-06-17 16:57:08 -04:00
Liqun Shao
01fdc9c82a
The HyperDrive will --> The HyperDrive run automatically shows...
2019-06-17 16:57:08 -04:00
Liqun Shao
9e6c4680fe
put create workspace in the first place
2019-06-17 16:57:08 -04:00
Liqun Shao
0c79d68381
remove all unnecessary labels
2019-06-17 16:57:08 -04:00
Liqun Shao
2c4cc44839
include imports
2019-06-17 16:57:08 -04:00
Liqun Shao
c7cc976063
1. Move similarity explaining to README
...
2. Separate model.py into two
3. Remove unneccessary imports
2019-06-17 16:57:08 -04:00
Liqun Shao
9e212bc832
add explanation on tuning results
2019-06-17 16:57:07 -04:00
Liqun Shao
635eab8cb6
change the name of saving model
2019-06-17 16:57:07 -04:00
Liqun Shao
b440661f6a
fix the bug for stopping the training
2019-06-17 16:57:07 -04:00
Liqun Shao
6f3c89f0a7
Fixed the bug on training not converging
2019-06-17 16:57:07 -04:00
Liqun Shao
ead07a6551
remove auto loader and save the best model state
2019-06-17 16:56:25 -04:00
Liqun Shao
786f8de629
change the stopping condition to when the validation loss is small
2019-06-17 16:56:25 -04:00
Liqun Shao
fde487ea89
use adam optimizer instead of SGD
2019-06-17 16:56:25 -04:00
Liqun Shao
c5687f9159
add README to gensen repo
2019-06-17 16:56:25 -04:00
Liqun Shao
3c43c229b2
Resolved conflicts
2019-06-17 16:56:25 -04:00
Liqun Shao
5137d91c46
add horovod distributed training to the gensen model and make the training stop with small validation loss
2019-06-17 16:56:25 -04:00
Abhiram E
2d5bfe6862
Refactored gensen related files by Maluuba
2019-06-17 16:56:24 -04:00
Liqun Shao
51ccc72cd3
first push
2019-06-17 16:56:24 -04:00
Said Bleik
f12aabd5b0
add xnli dataset utils
2019-06-17 12:25:01 -04:00
Hong Lu
dfb8553c5b
Resolved conflict and merged staging.
2019-06-17 12:01:10 -04:00
Said Bleik
a514025f5d
Merge pull request #99 from microsoft/bleik
...
ar TC example
2019-06-14 16:21:03 -04:00
Said Bleik
0929c37d56
removed unused arg
2019-06-14 16:18:50 -04:00
Said Bleik
b0ead86bf2
added dataset utils
2019-06-14 15:11:17 -04:00
Said Bleik
51c22b9607
minor fixes
2019-06-13 22:26:29 -04:00
Said Bleik
9e85c2923f
added missing imports
2019-06-13 15:30:53 -04:00
Hong Lu
4de4ece15c
Changed python version in pre-commit-config back to 3.6
2019-06-13 14:46:57 -04:00
Ubuntu
5438d76596
Added test code for NER utils.
2019-06-13 18:25:27 +00:00
Hong Lu
6d671b6221
Started adding test code for NER.
2019-06-12 15:33:12 -04:00
Casey Hong
e031d3d225
suppress nltk messages
2019-06-12 12:37:43 -04:00
Said Bleik
98a7071294
Merge pull request #85 from microsoft/casey-senteval
...
SentEval examples (local and with azureml support)
2019-06-11 14:39:51 -04:00
Casey Hong
ba3ba5b5a8
use azureml_utils for workspace creation
2019-06-11 13:05:39 -04:00
Casey Hong
e5b12c6f32
resolve merge conflicts
2019-06-11 11:45:30 -04:00
Chaoyu Guan
f0d6a2f55c
rename files and revise README for #62
2019-06-08 13:15:23 +00:00
Chaoyu Guan
f4f3591668
add explain-NLP-model part for issue #62
2019-06-08 12:51:44 +00:00
Hong Lu
26fcc3cbe4
Added random seed option to wikigold util function.
2019-06-07 17:32:49 -04:00
Hong Lu
049ddf6442
Added BERT prefix to classifier names and some minor docstring updates.
2019-06-07 17:08:29 -04:00
Hong Lu
4e7ac8adc1
Minor updates in token classifier.
2019-06-07 10:56:49 -04:00
Hong Lu
e40e9636f3
Removed old data utils script.
2019-06-07 10:42:27 -04:00
Hong Lu
a8feb91a89
Removed common_ner.py
2019-06-07 10:35:12 -04:00
Hong Lu
2593620633
Added utility functions for token classification.
2019-06-07 10:34:09 -04:00
Hong Lu
fbf15e64c6
Merge remote-tracking branch 'origin/staging' into hlu/BERT_NER_utils
2019-06-06 16:45:09 -04:00
Said Bleik
b040c481eb
Merge pull request #86 from microsoft/abhiram-requests-fix
...
Minor fix suggested in Recommenders repo
2019-06-06 16:03:59 -04:00
Abhiram E
802188e115
Minor fix suggested in Recommenders repo
2019-06-06 15:46:17 -04:00
Casey Hong
23d9635230
senteval local and azureml 📓
2019-06-06 10:57:05 -07:00
Abhiram E
f0db07fb3a
Minor change.
2019-06-06 10:20:57 -07:00
Abhiram E
5b1ed5f447
FastText loader - Code changes and unit tests.
...
1. Added methods to download, extract and load glove vectors.
2. Added units test to test the public method.
Other changes
1. Refactored files to add return types to docstrings.
2. Minor changes to path variables.
2019-06-06 10:20:57 -07:00
Abhiram E
2498dbaaa1
Minor changes
2019-06-06 10:18:13 -07:00
abeswara
008bfa2c57
Glove loader - Code changes and unit tests.
...
1. Added methods to download, extract and load glove vectors.
2. Added units tests to test the public methods.
Other changes
1. Made download and extract methods private.
2. Refactored Word2vec unit tests to exclude private methods.
2019-06-06 10:16:46 -07:00
abeswara
ae31e05a84
Word2vec loader - Code changes and unit tests.
...
1. Refactored word2vec loader to perform existing file checks before downloading or extracting.
2. Added units tests to load, download and extract functions.
2019-06-06 10:12:29 -07:00
Said Bleik
9269ef5482
merge staging
2019-06-06 13:01:07 -04:00
Said Bleik
c518d6a735
updated tc notebook and some utils
2019-06-05 21:37:16 -04:00
Abhiram E
3ac927edfa
Using tqdm to show progress bar
2019-06-05 13:08:23 -04:00
Abhiram E
0e296b6291
Changed url fetch from urlretrieve to requests
2019-06-04 16:26:35 -04:00
Said Bleik
ee9134d96f
minor updates to seq classification
2019-06-03 10:03:49 -04:00
Hong Lu
9bcad55d20
Updated NER notebook with wikigold data.
2019-05-31 18:44:01 -04:00
Said Bleik
61b66a57aa
updated device utils
2019-05-31 16:08:36 -04:00
Hong Lu
320b08d9af
Removed BERT image.
2019-05-31 13:46:34 -04:00
Said Bleik
1a96bce557
added missing assignment
2019-05-30 10:42:30 -04:00
Hong Lu
aaf0114cd7
Removed old scripts.
2019-05-29 14:57:23 -04:00
Hong Lu
52bd027555
Added helper function for postprocessing token classification results.
2019-05-29 14:39:58 -04:00
Said Bleik
5a81055e70
updated device utils and bert seq classifier
2019-05-28 23:16:19 -04:00
Abhiram E
36d7411bec
Fix to limit the memory usage when using fasttext embedding loaders. Code changes to use the simpler version
2019-05-28 12:04:57 -04:00
Hong Lu
52cc16fb9b
Updated token classifier api.
2019-05-24 18:09:56 -04:00
Hong Lu
5258c9cd7e
Added some utility functions to the common script. Will be merged with common.py later.
2019-05-24 18:09:04 -04:00
Casey Hong
1cd36ccff7
fix snli noblank bug and add preprocessing tests
2019-05-21 23:00:56 -04:00
Said Bleik
63e546ab3c
updated prerocessing, utils, classification
2019-05-21 16:45:23 -04:00
Hong Lu
2473e1a75c
Black auto formatting.
2019-05-20 18:53:57 -04:00
Hong Lu
3d1c1862d9
Removed old data utils script.
2019-05-20 14:08:39 -04:00
Hong Lu
4a41ec41e8
Added a constant file.
2019-05-20 14:00:12 -04:00
Hong Lu
1393c74fb3
Minor updates for data class updates.
2019-05-20 13:59:38 -04:00
Hong Lu
9919a7bd35
Remived InputFeature class. Use namedtuple instead of class for input data.
2019-05-20 13:58:54 -04:00
Hong Lu
e81138ad08
Changed optimizer and number of epochs configuration.
2019-05-20 13:58:16 -04:00
Said Bleik
49bb116474
update seq classifer
2019-05-17 10:04:46 -04:00
Hong Lu
eef85dea41
Consolidated all configuration classes into a single class.
2019-05-16 18:11:21 -04:00
Hong Lu
7ca29691ae
Consolidated some utility functions into BertTokenClassifier.
2019-05-16 18:10:47 -04:00
Hong Lu
d87dfbc2af
Minor edits and added docstring.
2019-05-16 18:10:14 -04:00
Hong Lu
14543fbd52
Added yaml configuration file for NER example.
2019-05-16 18:08:50 -04:00
Abhiram E
52d720e9bf
Added option to limit number of word vectors for glove and word2vec
2019-05-15 00:22:37 -04:00
Janhavi Mahajan
1ed2c4dc0a
feat(bug fix) updated snli notebook with to_lowercase_all() instead of to_lowercase() that expects a column name list. Fixed None object returning in to_lowercase when column name list is not passed
2019-05-13 18:14:31 -04:00
Said Bleik
e9c17a961e
update BERTSequenceClassifier and notebook
2019-05-13 15:18:21 -04:00
Said Bleik
7430e3b178
updated BERTSequenceClassifier + documentation
2019-05-13 14:38:54 -04:00
Said Bleik
7d2d74f975
BERTSequenceClassifier
2019-05-13 16:31:58 +00:00
Janhavi Mahajan
bb5764a56a
feat(code fix) rm_nltk_stop_words now expects sentences and stop_word column names
2019-05-10 16:50:34 -04:00
Janhavi Mahajan
197d771208
feat(code review comments) generalize nltk utils tokenize, remove_sto_words to more than 2 sentences
2019-05-10 16:27:48 -04:00
Janhavi Mahajan
6e3523810a
feat(code review) fix to_nltk_tokens, add to_lowercase_all and to_lowercase as per said's comments
2019-05-10 16:27:48 -04:00
Abhiram E
49595b8666
Moved urls to module constants for pretrained embedding utils.
2019-05-09 14:58:12 -04:00
Casey Hong
faf924b45b
token_cols bugfix
2019-05-09 14:58:11 -04:00
Abhiram E
6ba272308b
Minor change.
2019-05-09 14:58:11 -04:00
Abhiram E
2502d91e1b
FastText loader - Code changes and unit tests.
...
1. Added methods to download, extract and load glove vectors.
2. Added units test to test the public method.
Other changes
1. Refactored files to add return types to docstrings.
2. Minor changes to path variables.
2019-05-09 14:58:11 -04:00
Abhiram E
4e480026a0
Minor changes
2019-05-09 14:58:10 -04:00
abeswara
8025b4449d
Glove loader - Code changes and unit tests.
...
1. Added methods to download, extract and load glove vectors.
2. Added units tests to test the public methods.
Other changes
1. Made download and extract methods private.
2. Refactored Word2vec unit tests to exclude private methods.
2019-05-09 14:58:10 -04:00
abeswara
8408d7cce2
Word2vec loader - Code changes and unit tests.
...
1. Refactored word2vec loader to perform existing file checks before downloading or extracting.
2. Added units tests to load, download and extract functions.
2019-05-09 14:58:10 -04:00
Abhiram E
9895dd41d7
Reformated files
2019-05-09 14:58:10 -04:00
Abhiram E
47ada0d03c
Added support to download and extract word2vec pretrained vectors
2019-05-09 14:58:10 -04:00
Abhiram E
48adc4f619
Initial commit for word embeddings
2019-05-09 14:58:10 -04:00
miguelgfierro
3c3ce8c14a
got timer from recommenders
2019-05-09 17:25:44 +01:00
Hong Lu
2af4d4a008
Moved notebooks to example folder.
2019-05-07 10:22:48 -04:00
Hong Lu
6e5b060e08
Added utils path to system path.
2019-05-07 10:20:03 -04:00