nlp-recipes

Граф коммитов

Автор	SHA1	Сообщение	Дата
Liqun Shao	b670abbb88	add original code source to all the code	2019-06-17 16:57:08 -04:00
Liqun Shao	45811ef9b6	add aml explanation	2019-06-17 16:57:08 -04:00
Liqun Shao	349958bafb	1. correct typo in the notebook 2. add header to all the python files 3. add comments for train.py to explain what does it do	2019-06-17 16:57:08 -04:00
Liqun Shao	01fdc9c82a	The HyperDrive will --> The HyperDrive run automatically shows...	2019-06-17 16:57:08 -04:00
Liqun Shao	9e6c4680fe	put create workspace in the first place	2019-06-17 16:57:08 -04:00
Liqun Shao	0c79d68381	remove all unnecessary labels	2019-06-17 16:57:08 -04:00
Liqun Shao	2c4cc44839	include imports	2019-06-17 16:57:08 -04:00
Liqun Shao	c7cc976063	1. Move similarity explaining to README 2. Separate model.py into two 3. Remove unneccessary imports	2019-06-17 16:57:08 -04:00
Liqun Shao	9e212bc832	add explanation on tuning results	2019-06-17 16:57:07 -04:00
Liqun Shao	635eab8cb6	change the name of saving model	2019-06-17 16:57:07 -04:00
Liqun Shao	b440661f6a	fix the bug for stopping the training	2019-06-17 16:57:07 -04:00
Liqun Shao	6f3c89f0a7	Fixed the bug on training not converging	2019-06-17 16:57:07 -04:00
Liqun Shao	ead07a6551	remove auto loader and save the best model state	2019-06-17 16:56:25 -04:00
Liqun Shao	786f8de629	change the stopping condition to when the validation loss is small	2019-06-17 16:56:25 -04:00
Liqun Shao	fde487ea89	use adam optimizer instead of SGD	2019-06-17 16:56:25 -04:00
Liqun Shao	c5687f9159	add README to gensen repo	2019-06-17 16:56:25 -04:00
Liqun Shao	3c43c229b2	Resolved conflicts	2019-06-17 16:56:25 -04:00
Liqun Shao	5137d91c46	add horovod distributed training to the gensen model and make the training stop with small validation loss	2019-06-17 16:56:25 -04:00
Abhiram E	2d5bfe6862	Refactored gensen related files by Maluuba	2019-06-17 16:56:24 -04:00
Liqun Shao	51ccc72cd3	first push	2019-06-17 16:56:24 -04:00
Said Bleik	f12aabd5b0	add xnli dataset utils	2019-06-17 12:25:01 -04:00
Hong Lu	dfb8553c5b	Resolved conflict and merged staging.	2019-06-17 12:01:10 -04:00
Said Bleik	a514025f5d	Merge pull request #99 from microsoft/bleik ar TC example	2019-06-14 16:21:03 -04:00
Said Bleik	0929c37d56	removed unused arg	2019-06-14 16:18:50 -04:00
Said Bleik	b0ead86bf2	added dataset utils	2019-06-14 15:11:17 -04:00
Said Bleik	51c22b9607	minor fixes	2019-06-13 22:26:29 -04:00
Said Bleik	9e85c2923f	added missing imports	2019-06-13 15:30:53 -04:00
Hong Lu	4de4ece15c	Changed python version in pre-commit-config back to 3.6	2019-06-13 14:46:57 -04:00
Ubuntu	5438d76596	Added test code for NER utils.	2019-06-13 18:25:27 +00:00
Hong Lu	6d671b6221	Started adding test code for NER.	2019-06-12 15:33:12 -04:00
Casey Hong	e031d3d225	suppress nltk messages	2019-06-12 12:37:43 -04:00
Said Bleik	98a7071294	Merge pull request #85 from microsoft/casey-senteval SentEval examples (local and with azureml support)	2019-06-11 14:39:51 -04:00
Casey Hong	ba3ba5b5a8	use azureml_utils for workspace creation	2019-06-11 13:05:39 -04:00
Casey Hong	e5b12c6f32	resolve merge conflicts	2019-06-11 11:45:30 -04:00
Chaoyu Guan	f0d6a2f55c	rename files and revise README for #62	2019-06-08 13:15:23 +00:00
Chaoyu Guan	f4f3591668	add explain-NLP-model part for issue #62	2019-06-08 12:51:44 +00:00
Hong Lu	26fcc3cbe4	Added random seed option to wikigold util function.	2019-06-07 17:32:49 -04:00
Hong Lu	049ddf6442	Added BERT prefix to classifier names and some minor docstring updates.	2019-06-07 17:08:29 -04:00
Hong Lu	4e7ac8adc1	Minor updates in token classifier.	2019-06-07 10:56:49 -04:00
Hong Lu	e40e9636f3	Removed old data utils script.	2019-06-07 10:42:27 -04:00
Hong Lu	a8feb91a89	Removed common_ner.py	2019-06-07 10:35:12 -04:00
Hong Lu	2593620633	Added utility functions for token classification.	2019-06-07 10:34:09 -04:00
Hong Lu	fbf15e64c6	Merge remote-tracking branch 'origin/staging' into hlu/BERT_NER_utils	2019-06-06 16:45:09 -04:00
Said Bleik	b040c481eb	Merge pull request #86 from microsoft/abhiram-requests-fix Minor fix suggested in Recommenders repo	2019-06-06 16:03:59 -04:00
Abhiram E	802188e115	Minor fix suggested in Recommenders repo	2019-06-06 15:46:17 -04:00
Casey Hong	23d9635230	senteval local and azureml 📓	2019-06-06 10:57:05 -07:00
Abhiram E	f0db07fb3a	Minor change.	2019-06-06 10:20:57 -07:00
Abhiram E	5b1ed5f447	FastText loader - Code changes and unit tests. 1. Added methods to download, extract and load glove vectors. 2. Added units test to test the public method. Other changes 1. Refactored files to add return types to docstrings. 2. Minor changes to path variables.	2019-06-06 10:20:57 -07:00
Abhiram E	2498dbaaa1	Minor changes	2019-06-06 10:18:13 -07:00
abeswara	008bfa2c57	Glove loader - Code changes and unit tests. 1. Added methods to download, extract and load glove vectors. 2. Added units tests to test the public methods. Other changes 1. Made download and extract methods private. 2. Refactored Word2vec unit tests to exclude private methods.	2019-06-06 10:16:46 -07:00
abeswara	ae31e05a84	Word2vec loader - Code changes and unit tests. 1. Refactored word2vec loader to perform existing file checks before downloading or extracting. 2. Added units tests to load, download and extract functions.	2019-06-06 10:12:29 -07:00
Said Bleik	9269ef5482	merge staging	2019-06-06 13:01:07 -04:00
Said Bleik	c518d6a735	updated tc notebook and some utils	2019-06-05 21:37:16 -04:00
Abhiram E	3ac927edfa	Using tqdm to show progress bar	2019-06-05 13:08:23 -04:00
Abhiram E	0e296b6291	Changed url fetch from urlretrieve to requests	2019-06-04 16:26:35 -04:00
Said Bleik	ee9134d96f	minor updates to seq classification	2019-06-03 10:03:49 -04:00
Hong Lu	9bcad55d20	Updated NER notebook with wikigold data.	2019-05-31 18:44:01 -04:00
Said Bleik	61b66a57aa	updated device utils	2019-05-31 16:08:36 -04:00
Hong Lu	320b08d9af	Removed BERT image.	2019-05-31 13:46:34 -04:00
Said Bleik	1a96bce557	added missing assignment	2019-05-30 10:42:30 -04:00
Hong Lu	aaf0114cd7	Removed old scripts.	2019-05-29 14:57:23 -04:00
Hong Lu	52bd027555	Added helper function for postprocessing token classification results.	2019-05-29 14:39:58 -04:00
Said Bleik	5a81055e70	updated device utils and bert seq classifier	2019-05-28 23:16:19 -04:00
Abhiram E	36d7411bec	Fix to limit the memory usage when using fasttext embedding loaders. Code changes to use the simpler version	2019-05-28 12:04:57 -04:00
Hong Lu	52cc16fb9b	Updated token classifier api.	2019-05-24 18:09:56 -04:00
Hong Lu	5258c9cd7e	Added some utility functions to the common script. Will be merged with common.py later.	2019-05-24 18:09:04 -04:00
Casey Hong	1cd36ccff7	fix snli noblank bug and add preprocessing tests	2019-05-21 23:00:56 -04:00
Said Bleik	63e546ab3c	updated prerocessing, utils, classification	2019-05-21 16:45:23 -04:00
Hong Lu	2473e1a75c	Black auto formatting.	2019-05-20 18:53:57 -04:00
Hong Lu	3d1c1862d9	Removed old data utils script.	2019-05-20 14:08:39 -04:00
Hong Lu	4a41ec41e8	Added a constant file.	2019-05-20 14:00:12 -04:00
Hong Lu	1393c74fb3	Minor updates for data class updates.	2019-05-20 13:59:38 -04:00
Hong Lu	9919a7bd35	Remived InputFeature class. Use namedtuple instead of class for input data.	2019-05-20 13:58:54 -04:00
Hong Lu	e81138ad08	Changed optimizer and number of epochs configuration.	2019-05-20 13:58:16 -04:00
Said Bleik	49bb116474	update seq classifer	2019-05-17 10:04:46 -04:00
Hong Lu	eef85dea41	Consolidated all configuration classes into a single class.	2019-05-16 18:11:21 -04:00
Hong Lu	7ca29691ae	Consolidated some utility functions into BertTokenClassifier.	2019-05-16 18:10:47 -04:00
Hong Lu	d87dfbc2af	Minor edits and added docstring.	2019-05-16 18:10:14 -04:00
Hong Lu	14543fbd52	Added yaml configuration file for NER example.	2019-05-16 18:08:50 -04:00
Abhiram E	52d720e9bf	Added option to limit number of word vectors for glove and word2vec	2019-05-15 00:22:37 -04:00
Janhavi Mahajan	1ed2c4dc0a	feat(bug fix) updated snli notebook with to_lowercase_all() instead of to_lowercase() that expects a column name list. Fixed None object returning in to_lowercase when column name list is not passed	2019-05-13 18:14:31 -04:00
Said Bleik	e9c17a961e	update BERTSequenceClassifier and notebook	2019-05-13 15:18:21 -04:00
Said Bleik	7430e3b178	updated BERTSequenceClassifier + documentation	2019-05-13 14:38:54 -04:00
Said Bleik	7d2d74f975	BERTSequenceClassifier	2019-05-13 16:31:58 +00:00
Janhavi Mahajan	bb5764a56a	feat(code fix) rm_nltk_stop_words now expects sentences and stop_word column names	2019-05-10 16:50:34 -04:00
Janhavi Mahajan	197d771208	feat(code review comments) generalize nltk utils tokenize, remove_sto_words to more than 2 sentences	2019-05-10 16:27:48 -04:00
Janhavi Mahajan	6e3523810a	feat(code review) fix to_nltk_tokens, add to_lowercase_all and to_lowercase as per said's comments	2019-05-10 16:27:48 -04:00
Abhiram E	49595b8666	Moved urls to module constants for pretrained embedding utils.	2019-05-09 14:58:12 -04:00
Casey Hong	faf924b45b	token_cols bugfix	2019-05-09 14:58:11 -04:00
Abhiram E	6ba272308b	Minor change.	2019-05-09 14:58:11 -04:00
Abhiram E	2502d91e1b	FastText loader - Code changes and unit tests. 1. Added methods to download, extract and load glove vectors. 2. Added units test to test the public method. Other changes 1. Refactored files to add return types to docstrings. 2. Minor changes to path variables.	2019-05-09 14:58:11 -04:00
Abhiram E	4e480026a0	Minor changes	2019-05-09 14:58:10 -04:00
abeswara	8025b4449d	Glove loader - Code changes and unit tests. 1. Added methods to download, extract and load glove vectors. 2. Added units tests to test the public methods. Other changes 1. Made download and extract methods private. 2. Refactored Word2vec unit tests to exclude private methods.	2019-05-09 14:58:10 -04:00
abeswara	8408d7cce2	Word2vec loader - Code changes and unit tests. 1. Refactored word2vec loader to perform existing file checks before downloading or extracting. 2. Added units tests to load, download and extract functions.	2019-05-09 14:58:10 -04:00
Abhiram E	9895dd41d7	Reformated files	2019-05-09 14:58:10 -04:00
Abhiram E	47ada0d03c	Added support to download and extract word2vec pretrained vectors	2019-05-09 14:58:10 -04:00
Abhiram E	48adc4f619	Initial commit for word embeddings	2019-05-09 14:58:10 -04:00
miguelgfierro	3c3ce8c14a	got timer from recommenders	2019-05-09 17:25:44 +01:00
Hong Lu	2af4d4a008	Moved notebooks to example folder.	2019-05-07 10:22:48 -04:00
Hong Lu	6e5b060e08	Added utils path to system path.	2019-05-07 10:20:03 -04:00
Hong Lu	bd4e805733	Updates to expose BERT objects to the user.	2019-05-07 10:01:55 -04:00
Said Bleik	23dad01abb	Merge pull request #35 from Microsoft/maidap-sentence-similarity Sentence Similarity Datasets with New Folder Structure	2019-05-03 20:49:18 -04:00
Casey Hong	d65afe27f8	make colnames args in preprocess	2019-05-03 16:47:43 -04:00
Hong Lu	b15b0a4dfd	Fixed a few minor issues found during testing.	2019-05-02 17:59:47 -04:00
abeswara	84ac44cbc0	Resolved code review comments	2019-05-02 12:06:52 -04:00
Hong Lu	d5ee6d46cb	Initial check in of bert utility functions.	2019-05-02 10:50:30 -04:00
Said Bleik	10adf59777	update env, yahoo_answers, & classification eval	2019-05-01 22:49:41 +00:00
Janhavi Mahajan	338e606c5e	feat(code review comments) refactoring based on Miguel's comments	2019-05-01 18:40:44 -04:00
Casey Hong	810beb6f2c	organize stsbenchmark under new folder structure	2019-05-01 18:35:02 -04:00
Casey Hong	25a176b2cc	rm_stopwords suffix	2019-04-30 15:05:17 -04:00
Said Bleik	757e7d063d	Merge pull request #28 from Microsoft/maidap-sentence-similarity Sentence similarity dataset	2019-04-30 12:26:04 -04:00
Said Bleik	f2467d5286	folder structure & example utils	2019-04-30 15:51:47 +00:00
Casey Hong	dc4eac5aee	refactor for consistency between snli <=> sts notebooks, add gensen-specific preprocessing for snli	2019-04-29 14:59:52 -04:00
Casey Hong	1aa60a3a00	begin snli-sts consistency refactoring	2019-04-29 14:59:52 -04:00
Janhavi Mahajan	1498bfb853	feat(code refactoring) moving code around as per the new structure decided.	2019-04-26 12:11:55 -04:00
Janhavi Mahajan	ba2ad0cbfa	feat(code reformat) deleted snli from util_nlp	2019-04-26 12:11:55 -04:00
Janhavi Mahajan	f0070819ea	feat(code reformat) Formatting code based on new folder structure	2019-04-26 12:11:55 -04:00
Janhavi Mahajan	4aadf66654	feat(code reformat) moved nltk utils to preprocess.py	2019-04-26 12:11:55 -04:00
Janhavi Mahajan	faa26b3c54	feat(doc strings) fixed doc string format	2019-04-26 12:11:55 -04:00
Janhavi Mahajan	88e5a3d724	feat(code format) formatted file with black	2019-04-26 12:11:55 -04:00
Janhavi Mahajan	c969085424	feat(code format) added doc strings, rewrite clean_snli function	2019-04-26 12:11:55 -04:00
Janhavi Mahajan	44db348fe5	feat(data prep) save dataframe to csv and renamed folder from nltk to nltk_utils	2019-04-26 12:11:55 -04:00
Janhavi Mahajan	6e46eade15	feat(data_prep) SNLI notebook showcasing data prep, Corrected nltk util for column_name	2019-04-26 12:11:55 -04:00
Janhavi Mahajan	3964c04a7c	feat(data prep) NLTK tokenizer util file and notebook, deleted some redundant files, updated snli util with cleaner data prep functions	2019-04-26 12:11:55 -04:00
Janhavi Mahajan	f7b487cfbd	feat(data_prep) Added SNLI dataset prep utility	2019-04-26 12:11:55 -04:00
Abhiram E	84443d478c	Refactored STS notebooks, updated utils_nlp files with the latest code from utils_ss and deleted utils_ss	2019-04-24 17:16:06 -04:00
Abhiram E	ffb38ea42b	Refactored code according to new structure, moved files and modified imports	2019-04-24 15:33:41 -04:00
Abhiram E	d4db5a1860	Resolving code review comments. 1. Refactored and renamed msrpc_load notebook. 2. Removed redundant parameter to load_pandas_df function	2019-04-24 15:05:53 -04:00
Abhiram E	f66ee268c0	Refactoring changes to MSRPC	2019-04-24 15:05:52 -04:00
Abhiram E	b9fce4ae61	Notebooks and Tests 1. Added Jupyter Notebook for MSR-PC dataset quickstart task 2. Added unit tests for downloading the dataset and loading pandas df 3. Changes to MSRPC to take in path to the dataset if it already exists.	2019-04-24 15:05:00 -04:00
Abhiram E	ac0abdfd61	Data loader for MSR PC 1. Added data downloader for MSR PC 2. Added support to clean data and load specified datasets as a pandas dataframe. 3. Updates to environment.yml for newly added packages.	2019-04-24 15:03:41 -04:00
Casey Hong	d20081766d	Add preprocessing notebook	2019-04-24 15:02:26 -04:00
Casey Hong	abacb5d022	Add tokenization with spacy	2019-04-24 15:02:26 -04:00
Casey Hong	b2bed84e0d	Include score column in dataframe	2019-04-24 15:02:26 -04:00
Casey Hong	f06630a55d	Download and clean stsbenchmark data	2019-04-24 15:02:26 -04:00
Casey Hong	819f0a215b	moving files to the sentence_similarity scenario directory	2019-04-24 13:54:53 -04:00
Casey Hong	6793a77608	clip docstring line length at 120	2019-04-22 17:47:59 -04:00
Casey Hong	81980e9eb6	Add and format docstrings	2019-04-22 17:47:59 -04:00
Casey Hong	42a9c11ac7	Add docstrings	2019-04-22 14:14:12 -04:00
Casey Hong	b31b7c3b13	Fix merge conflicts for rebase	2019-04-22 14:14:12 -04:00
Casey Hong	7176d7812e	Create sentence similarity branch	2019-04-18 15:10:46 -04:00
miguelgfierro	2effbfcfcb	cleaning	2019-04-16 19:53:15 +01:00
Richin Jain	2c5b8e587e	Intial commit to put the receipe template in	2019-04-05 13:55:58 -04:00

... 2 3 4 5 6

293 Коммитов