nlp-recipes

Граф коммитов

Автор	SHA1	Сообщение	Дата
hlums	aeb9486a6a	Updated wikigold utils to be consistent with other datasets.	2019-06-23 21:57:04 +00:00
hlums	7c35f670d2	Added probabilities output to BERT token classifier.	2019-06-23 21:55:51 +00:00
hlums	7b827d6d5b	Updated ner token preprocessing for Chinese text.	2019-06-23 21:53:06 +00:00
hlums	480a08f544	Updated NER notebook with new tokenizer api.	2019-06-23 21:51:34 +00:00
Said Bleik	35fc04c383	Merge pull request #113 from microsoft/hlu/two_sequence_utils_and_XNLI_notebook Hlu/two sequence utils and xnli notebook	2019-06-21 17:39:07 -04:00
Said Bleik	9c3a95159c	add sequential loader test	2019-06-21 17:37:03 -04:00
Hong Lu	fb3f7ddef6	Moved _truncate_seq_pair outside of if else block.	2019-06-21 14:48:10 -04:00
Said Bleik	399707e747	meh	2019-06-21 14:43:09 -04:00
Said Bleik	65f1c82a81	arg name change	2019-06-21 14:41:39 -04:00
Said Bleik	8b56eec5cd	updated defaults for predict's output	2019-06-21 14:24:40 -04:00
Said Bleik	aec2ffbfe7	add namedtuple preds output	2019-06-21 13:17:04 -04:00
Said Bleik	136cadf0fe	lm name changes	2019-06-21 13:16:43 -04:00
Hong Lu	c53edc41b3	Added training and prediction time to notebook.	2019-06-21 10:50:20 -04:00
Said Bleik	c0951fae7c	rem data_loader	2019-06-20 14:44:18 -04:00
Said Bleik	d53c17e1ed	minor edit to preds	2019-06-20 14:37:47 -04:00
Said Bleik	fad2564604	added optional prob dist predictions	2019-06-20 14:23:09 -04:00
Hong Lu	2010daf637	Removed redundant code.	2019-06-20 14:08:40 -04:00
Hong Lu	76fa9d7ed4	Fixed formatting.	2019-06-20 14:01:48 -04:00
Said Bleik	4388e6c588	add whole-word pretrained models	2019-06-20 13:28:10 -04:00
hlums	ed3415b320	Updated utils of XNLI dataset.	2019-06-19 21:06:38 +00:00
hlums	593bb4eb5d	Added convert_to_unicode helper function.	2019-06-19 21:06:15 +00:00
hlums	946a687729	Resolved confict with staging.	2019-06-19 21:05:53 +00:00
Said Bleik	b407204c79	add sequential loader	2019-06-19 16:12:20 -04:00
Said Bleik	5245afafdd	add dask data loader	2019-06-19 14:54:33 -04:00
Casey Hong	236a64e9d8	update stsbenchmark 📓	2019-06-18 17:20:47 -04:00
Janhavi Mahajan	074bca3619	black formatter	2019-06-18 16:49:08 -04:00
Janhavi Mahajan	7f1bb8a039	bug fix: sts-benchmark has extra tabs in some rows which caused incorrect reading of pandas df or azureml dataflow object	2019-06-18 16:49:08 -04:00
hlums	e09a54512a	Resolved conflict and merged from staging.	2019-06-18 15:54:55 +00:00
hlums	7cba858a42	Updated docstring.	2019-06-18 14:57:49 +00:00
hlums	e5b5af4c78	Added warmup and support for two-sequence classification.	2019-06-17 22:40:24 +00:00
Miguel González-Fierro	818f2ef3d3	Merge pull request #78 from microsoft/liqun-first-pull GenSen on AML deep dive notebook (sentence similarity)	2019-06-17 23:46:44 +02:00
Miguel González-Fierro	800ab9ac00	Merge pull request #103 from microsoft/bleik update xnli dataset utils	2019-06-17 23:40:56 +02:00
Liqun Shao	0af38f7ccb	make changes on notebook based on the structure change	2019-06-17 16:57:09 -04:00
Liqun Shao	cc0fabcd55	add docstring for training on gpu only	2019-06-17 16:57:09 -04:00
Liqun Shao	70733e9d77	change the structure	2019-06-17 16:57:09 -04:00
Abhiram E	f7a14146ea	Fixed model state empty bug	2019-06-17 16:57:09 -04:00
Liqun Shao	cc6cf46a28	format the files	2019-06-17 16:57:09 -04:00
Liqun Shao	a7e0555235	fix the path	2019-06-17 16:57:09 -04:00
Liqun Shao	d200d525fe	remove unwanted logs; fix the bug for getting training time	2019-06-17 16:57:09 -04:00
Liqun Shao	ae2f8f4d2a	Make the following changes to increase the performance of horovod training. 1. Add random seeds for iterators 2. learning rate=lr*hvd.size() 3. sync the optimizer 4. remove DataParallel	2019-06-17 16:57:09 -04:00
Liqun Shao	9e504182c3	fix the issue min_epoch_loss not updated during training, then it will never stop; min_epoch_loss always eqals to val_epoch_loss	2019-06-17 16:57:09 -04:00
Liqun Shao	4fa8bcc50f	fix import	2019-06-17 16:57:09 -04:00
Abhiram E	a065ae9bb0	Removed nested path joins	2019-06-17 16:57:09 -04:00
Abhiram E	eb6719d9ec	Minor fix to docstrings	2019-06-17 16:57:08 -04:00
Liqun Shao	0cff6731bb	remove unwanted log	2019-06-17 16:57:08 -04:00
Abhiram E	f8ffc290b7	Refactored gensen train.py	2019-06-17 16:57:08 -04:00
Liqun Shao	4369138371	add docstring to utils.py	2019-06-17 16:57:08 -04:00
Abhiram E	1ab6490348	Resolved comments on the Gensen code	2019-06-17 16:57:08 -04:00
Abhiram E	13ec0a36a9	Moved prints to logging	2019-06-17 16:57:08 -04:00
Abhiram E	f9a0bd1435	Added docstrings to Gensen code and refactored based on code review comments	2019-06-17 16:57:08 -04:00
Liqun Shao	b670abbb88	add original code source to all the code	2019-06-17 16:57:08 -04:00
Liqun Shao	45811ef9b6	add aml explanation	2019-06-17 16:57:08 -04:00
Liqun Shao	349958bafb	1. correct typo in the notebook 2. add header to all the python files 3. add comments for train.py to explain what does it do	2019-06-17 16:57:08 -04:00
Liqun Shao	01fdc9c82a	The HyperDrive will --> The HyperDrive run automatically shows...	2019-06-17 16:57:08 -04:00
Liqun Shao	9e6c4680fe	put create workspace in the first place	2019-06-17 16:57:08 -04:00
Liqun Shao	0c79d68381	remove all unnecessary labels	2019-06-17 16:57:08 -04:00
Liqun Shao	2c4cc44839	include imports	2019-06-17 16:57:08 -04:00
Liqun Shao	c7cc976063	1. Move similarity explaining to README 2. Separate model.py into two 3. Remove unneccessary imports	2019-06-17 16:57:08 -04:00
Liqun Shao	9e212bc832	add explanation on tuning results	2019-06-17 16:57:07 -04:00
Liqun Shao	635eab8cb6	change the name of saving model	2019-06-17 16:57:07 -04:00
Liqun Shao	b440661f6a	fix the bug for stopping the training	2019-06-17 16:57:07 -04:00
Liqun Shao	6f3c89f0a7	Fixed the bug on training not converging	2019-06-17 16:57:07 -04:00
Liqun Shao	ead07a6551	remove auto loader and save the best model state	2019-06-17 16:56:25 -04:00
Liqun Shao	786f8de629	change the stopping condition to when the validation loss is small	2019-06-17 16:56:25 -04:00
Liqun Shao	fde487ea89	use adam optimizer instead of SGD	2019-06-17 16:56:25 -04:00
Liqun Shao	c5687f9159	add README to gensen repo	2019-06-17 16:56:25 -04:00
Liqun Shao	3c43c229b2	Resolved conflicts	2019-06-17 16:56:25 -04:00
Liqun Shao	5137d91c46	add horovod distributed training to the gensen model and make the training stop with small validation loss	2019-06-17 16:56:25 -04:00
Abhiram E	2d5bfe6862	Refactored gensen related files by Maluuba	2019-06-17 16:56:24 -04:00
Liqun Shao	51ccc72cd3	first push	2019-06-17 16:56:24 -04:00
Said Bleik	f12aabd5b0	add xnli dataset utils	2019-06-17 12:25:01 -04:00
Hong Lu	dfb8553c5b	Resolved conflict and merged staging.	2019-06-17 12:01:10 -04:00
Said Bleik	a514025f5d	Merge pull request #99 from microsoft/bleik ar TC example	2019-06-14 16:21:03 -04:00
Said Bleik	0929c37d56	removed unused arg	2019-06-14 16:18:50 -04:00
Said Bleik	b0ead86bf2	added dataset utils	2019-06-14 15:11:17 -04:00
Said Bleik	51c22b9607	minor fixes	2019-06-13 22:26:29 -04:00
Said Bleik	9e85c2923f	added missing imports	2019-06-13 15:30:53 -04:00
Hong Lu	4de4ece15c	Changed python version in pre-commit-config back to 3.6	2019-06-13 14:46:57 -04:00
Ubuntu	5438d76596	Added test code for NER utils.	2019-06-13 18:25:27 +00:00
Hong Lu	6d671b6221	Started adding test code for NER.	2019-06-12 15:33:12 -04:00
Casey Hong	e031d3d225	suppress nltk messages	2019-06-12 12:37:43 -04:00
Said Bleik	98a7071294	Merge pull request #85 from microsoft/casey-senteval SentEval examples (local and with azureml support)	2019-06-11 14:39:51 -04:00
Casey Hong	ba3ba5b5a8	use azureml_utils for workspace creation	2019-06-11 13:05:39 -04:00
Casey Hong	e5b12c6f32	resolve merge conflicts	2019-06-11 11:45:30 -04:00
Chaoyu Guan	f0d6a2f55c	rename files and revise README for #62	2019-06-08 13:15:23 +00:00
Chaoyu Guan	f4f3591668	add explain-NLP-model part for issue #62	2019-06-08 12:51:44 +00:00
Hong Lu	26fcc3cbe4	Added random seed option to wikigold util function.	2019-06-07 17:32:49 -04:00
Hong Lu	049ddf6442	Added BERT prefix to classifier names and some minor docstring updates.	2019-06-07 17:08:29 -04:00
Hong Lu	4e7ac8adc1	Minor updates in token classifier.	2019-06-07 10:56:49 -04:00
Hong Lu	e40e9636f3	Removed old data utils script.	2019-06-07 10:42:27 -04:00
Hong Lu	a8feb91a89	Removed common_ner.py	2019-06-07 10:35:12 -04:00
Hong Lu	2593620633	Added utility functions for token classification.	2019-06-07 10:34:09 -04:00
Hong Lu	fbf15e64c6	Merge remote-tracking branch 'origin/staging' into hlu/BERT_NER_utils	2019-06-06 16:45:09 -04:00
Said Bleik	b040c481eb	Merge pull request #86 from microsoft/abhiram-requests-fix Minor fix suggested in Recommenders repo	2019-06-06 16:03:59 -04:00
Abhiram E	802188e115	Minor fix suggested in Recommenders repo	2019-06-06 15:46:17 -04:00
Casey Hong	23d9635230	senteval local and azureml 📓	2019-06-06 10:57:05 -07:00
Abhiram E	f0db07fb3a	Minor change.	2019-06-06 10:20:57 -07:00
Abhiram E	5b1ed5f447	FastText loader - Code changes and unit tests. 1. Added methods to download, extract and load glove vectors. 2. Added units test to test the public method. Other changes 1. Refactored files to add return types to docstrings. 2. Minor changes to path variables.	2019-06-06 10:20:57 -07:00
Abhiram E	2498dbaaa1	Minor changes	2019-06-06 10:18:13 -07:00
abeswara	008bfa2c57	Glove loader - Code changes and unit tests. 1. Added methods to download, extract and load glove vectors. 2. Added units tests to test the public methods. Other changes 1. Made download and extract methods private. 2. Refactored Word2vec unit tests to exclude private methods.	2019-06-06 10:16:46 -07:00
abeswara	ae31e05a84	Word2vec loader - Code changes and unit tests. 1. Refactored word2vec loader to perform existing file checks before downloading or extracting. 2. Added units tests to load, download and extract functions.	2019-06-06 10:12:29 -07:00
Said Bleik	9269ef5482	merge staging	2019-06-06 13:01:07 -04:00
Said Bleik	c518d6a735	updated tc notebook and some utils	2019-06-05 21:37:16 -04:00
Abhiram E	3ac927edfa	Using tqdm to show progress bar	2019-06-05 13:08:23 -04:00
Abhiram E	0e296b6291	Changed url fetch from urlretrieve to requests	2019-06-04 16:26:35 -04:00
Said Bleik	ee9134d96f	minor updates to seq classification	2019-06-03 10:03:49 -04:00
Hong Lu	9bcad55d20	Updated NER notebook with wikigold data.	2019-05-31 18:44:01 -04:00
Said Bleik	61b66a57aa	updated device utils	2019-05-31 16:08:36 -04:00
Hong Lu	320b08d9af	Removed BERT image.	2019-05-31 13:46:34 -04:00
Said Bleik	1a96bce557	added missing assignment	2019-05-30 10:42:30 -04:00
Hong Lu	aaf0114cd7	Removed old scripts.	2019-05-29 14:57:23 -04:00
Hong Lu	52bd027555	Added helper function for postprocessing token classification results.	2019-05-29 14:39:58 -04:00
Said Bleik	5a81055e70	updated device utils and bert seq classifier	2019-05-28 23:16:19 -04:00
Abhiram E	36d7411bec	Fix to limit the memory usage when using fasttext embedding loaders. Code changes to use the simpler version	2019-05-28 12:04:57 -04:00
Hong Lu	52cc16fb9b	Updated token classifier api.	2019-05-24 18:09:56 -04:00
Hong Lu	5258c9cd7e	Added some utility functions to the common script. Will be merged with common.py later.	2019-05-24 18:09:04 -04:00
Casey Hong	1cd36ccff7	fix snli noblank bug and add preprocessing tests	2019-05-21 23:00:56 -04:00
Said Bleik	63e546ab3c	updated prerocessing, utils, classification	2019-05-21 16:45:23 -04:00
Hong Lu	2473e1a75c	Black auto formatting.	2019-05-20 18:53:57 -04:00
Hong Lu	3d1c1862d9	Removed old data utils script.	2019-05-20 14:08:39 -04:00
Hong Lu	4a41ec41e8	Added a constant file.	2019-05-20 14:00:12 -04:00
Hong Lu	1393c74fb3	Minor updates for data class updates.	2019-05-20 13:59:38 -04:00
Hong Lu	9919a7bd35	Remived InputFeature class. Use namedtuple instead of class for input data.	2019-05-20 13:58:54 -04:00
Hong Lu	e81138ad08	Changed optimizer and number of epochs configuration.	2019-05-20 13:58:16 -04:00
Said Bleik	49bb116474	update seq classifer	2019-05-17 10:04:46 -04:00
Hong Lu	eef85dea41	Consolidated all configuration classes into a single class.	2019-05-16 18:11:21 -04:00
Hong Lu	7ca29691ae	Consolidated some utility functions into BertTokenClassifier.	2019-05-16 18:10:47 -04:00
Hong Lu	d87dfbc2af	Minor edits and added docstring.	2019-05-16 18:10:14 -04:00
Hong Lu	14543fbd52	Added yaml configuration file for NER example.	2019-05-16 18:08:50 -04:00
Abhiram E	52d720e9bf	Added option to limit number of word vectors for glove and word2vec	2019-05-15 00:22:37 -04:00
Janhavi Mahajan	1ed2c4dc0a	feat(bug fix) updated snli notebook with to_lowercase_all() instead of to_lowercase() that expects a column name list. Fixed None object returning in to_lowercase when column name list is not passed	2019-05-13 18:14:31 -04:00
Said Bleik	e9c17a961e	update BERTSequenceClassifier and notebook	2019-05-13 15:18:21 -04:00
Said Bleik	7430e3b178	updated BERTSequenceClassifier + documentation	2019-05-13 14:38:54 -04:00
Said Bleik	7d2d74f975	BERTSequenceClassifier	2019-05-13 16:31:58 +00:00
Janhavi Mahajan	bb5764a56a	feat(code fix) rm_nltk_stop_words now expects sentences and stop_word column names	2019-05-10 16:50:34 -04:00
Janhavi Mahajan	197d771208	feat(code review comments) generalize nltk utils tokenize, remove_sto_words to more than 2 sentences	2019-05-10 16:27:48 -04:00
Janhavi Mahajan	6e3523810a	feat(code review) fix to_nltk_tokens, add to_lowercase_all and to_lowercase as per said's comments	2019-05-10 16:27:48 -04:00
Abhiram E	49595b8666	Moved urls to module constants for pretrained embedding utils.	2019-05-09 14:58:12 -04:00
Casey Hong	faf924b45b	token_cols bugfix	2019-05-09 14:58:11 -04:00
Abhiram E	6ba272308b	Minor change.	2019-05-09 14:58:11 -04:00
Abhiram E	2502d91e1b	FastText loader - Code changes and unit tests. 1. Added methods to download, extract and load glove vectors. 2. Added units test to test the public method. Other changes 1. Refactored files to add return types to docstrings. 2. Minor changes to path variables.	2019-05-09 14:58:11 -04:00
Abhiram E	4e480026a0	Minor changes	2019-05-09 14:58:10 -04:00
abeswara	8025b4449d	Glove loader - Code changes and unit tests. 1. Added methods to download, extract and load glove vectors. 2. Added units tests to test the public methods. Other changes 1. Made download and extract methods private. 2. Refactored Word2vec unit tests to exclude private methods.	2019-05-09 14:58:10 -04:00
abeswara	8408d7cce2	Word2vec loader - Code changes and unit tests. 1. Refactored word2vec loader to perform existing file checks before downloading or extracting. 2. Added units tests to load, download and extract functions.	2019-05-09 14:58:10 -04:00
Abhiram E	9895dd41d7	Reformated files	2019-05-09 14:58:10 -04:00
Abhiram E	47ada0d03c	Added support to download and extract word2vec pretrained vectors	2019-05-09 14:58:10 -04:00
Abhiram E	48adc4f619	Initial commit for word embeddings	2019-05-09 14:58:10 -04:00
miguelgfierro	3c3ce8c14a	got timer from recommenders	2019-05-09 17:25:44 +01:00
Hong Lu	2af4d4a008	Moved notebooks to example folder.	2019-05-07 10:22:48 -04:00
Hong Lu	6e5b060e08	Added utils path to system path.	2019-05-07 10:20:03 -04:00

1 2 3 4 5 ...

293 Коммитов