Граф коммитов

293 Коммитов

Автор SHA1 Сообщение Дата
Liqun Shao b670abbb88 add original code source to all the code 2019-06-17 16:57:08 -04:00
Liqun Shao 45811ef9b6 add aml explanation 2019-06-17 16:57:08 -04:00
Liqun Shao 349958bafb 1. correct typo in the notebook
2. add header to all the python files
3. add comments for train.py to explain what does it do
2019-06-17 16:57:08 -04:00
Liqun Shao 01fdc9c82a The HyperDrive will --> The HyperDrive run automatically shows... 2019-06-17 16:57:08 -04:00
Liqun Shao 9e6c4680fe put create workspace in the first place 2019-06-17 16:57:08 -04:00
Liqun Shao 0c79d68381 remove all unnecessary labels 2019-06-17 16:57:08 -04:00
Liqun Shao 2c4cc44839 include imports 2019-06-17 16:57:08 -04:00
Liqun Shao c7cc976063 1. Move similarity explaining to README
2. Separate model.py into two
3. Remove unneccessary imports
2019-06-17 16:57:08 -04:00
Liqun Shao 9e212bc832 add explanation on tuning results 2019-06-17 16:57:07 -04:00
Liqun Shao 635eab8cb6 change the name of saving model 2019-06-17 16:57:07 -04:00
Liqun Shao b440661f6a fix the bug for stopping the training 2019-06-17 16:57:07 -04:00
Liqun Shao 6f3c89f0a7 Fixed the bug on training not converging 2019-06-17 16:57:07 -04:00
Liqun Shao ead07a6551 remove auto loader and save the best model state 2019-06-17 16:56:25 -04:00
Liqun Shao 786f8de629 change the stopping condition to when the validation loss is small 2019-06-17 16:56:25 -04:00
Liqun Shao fde487ea89 use adam optimizer instead of SGD 2019-06-17 16:56:25 -04:00
Liqun Shao c5687f9159 add README to gensen repo 2019-06-17 16:56:25 -04:00
Liqun Shao 3c43c229b2 Resolved conflicts 2019-06-17 16:56:25 -04:00
Liqun Shao 5137d91c46 add horovod distributed training to the gensen model and make the training stop with small validation loss 2019-06-17 16:56:25 -04:00
Abhiram E 2d5bfe6862 Refactored gensen related files by Maluuba 2019-06-17 16:56:24 -04:00
Liqun Shao 51ccc72cd3 first push 2019-06-17 16:56:24 -04:00
Said Bleik f12aabd5b0 add xnli dataset utils 2019-06-17 12:25:01 -04:00
Hong Lu dfb8553c5b Resolved conflict and merged staging. 2019-06-17 12:01:10 -04:00
Said Bleik a514025f5d
Merge pull request #99 from microsoft/bleik
ar TC example
2019-06-14 16:21:03 -04:00
Said Bleik 0929c37d56 removed unused arg 2019-06-14 16:18:50 -04:00
Said Bleik b0ead86bf2 added dataset utils 2019-06-14 15:11:17 -04:00
Said Bleik 51c22b9607 minor fixes 2019-06-13 22:26:29 -04:00
Said Bleik 9e85c2923f added missing imports 2019-06-13 15:30:53 -04:00
Hong Lu 4de4ece15c Changed python version in pre-commit-config back to 3.6 2019-06-13 14:46:57 -04:00
Ubuntu 5438d76596 Added test code for NER utils. 2019-06-13 18:25:27 +00:00
Hong Lu 6d671b6221 Started adding test code for NER. 2019-06-12 15:33:12 -04:00
Casey Hong e031d3d225 suppress nltk messages 2019-06-12 12:37:43 -04:00
Said Bleik 98a7071294
Merge pull request #85 from microsoft/casey-senteval
SentEval examples (local and with azureml support)
2019-06-11 14:39:51 -04:00
Casey Hong ba3ba5b5a8 use azureml_utils for workspace creation 2019-06-11 13:05:39 -04:00
Casey Hong e5b12c6f32 resolve merge conflicts 2019-06-11 11:45:30 -04:00
Chaoyu Guan f0d6a2f55c rename files and revise README for #62 2019-06-08 13:15:23 +00:00
Chaoyu Guan f4f3591668 add explain-NLP-model part for issue #62 2019-06-08 12:51:44 +00:00
Hong Lu 26fcc3cbe4 Added random seed option to wikigold util function. 2019-06-07 17:32:49 -04:00
Hong Lu 049ddf6442 Added BERT prefix to classifier names and some minor docstring updates. 2019-06-07 17:08:29 -04:00
Hong Lu 4e7ac8adc1 Minor updates in token classifier. 2019-06-07 10:56:49 -04:00
Hong Lu e40e9636f3 Removed old data utils script. 2019-06-07 10:42:27 -04:00
Hong Lu a8feb91a89 Removed common_ner.py 2019-06-07 10:35:12 -04:00
Hong Lu 2593620633 Added utility functions for token classification. 2019-06-07 10:34:09 -04:00
Hong Lu fbf15e64c6 Merge remote-tracking branch 'origin/staging' into hlu/BERT_NER_utils 2019-06-06 16:45:09 -04:00
Said Bleik b040c481eb
Merge pull request #86 from microsoft/abhiram-requests-fix
Minor fix suggested in Recommenders repo
2019-06-06 16:03:59 -04:00
Abhiram E 802188e115 Minor fix suggested in Recommenders repo 2019-06-06 15:46:17 -04:00
Casey Hong 23d9635230 senteval local and azureml 📓 2019-06-06 10:57:05 -07:00
Abhiram E f0db07fb3a Minor change. 2019-06-06 10:20:57 -07:00
Abhiram E 5b1ed5f447 FastText loader - Code changes and unit tests.
1. Added methods to download, extract and load glove vectors.
2. Added units test to test the public method.

Other changes
 1. Refactored files to add return types to docstrings.
 2. Minor changes to path variables.
2019-06-06 10:20:57 -07:00
Abhiram E 2498dbaaa1 Minor changes 2019-06-06 10:18:13 -07:00
abeswara 008bfa2c57 Glove loader - Code changes and unit tests.
1. Added methods to download, extract and load glove vectors.
2. Added units tests to test the public methods.

Other changes
 1. Made download and extract methods private.
 2. Refactored Word2vec unit tests to exclude private methods.
2019-06-06 10:16:46 -07:00
abeswara ae31e05a84 Word2vec loader - Code changes and unit tests.
1. Refactored word2vec loader to perform existing file checks before downloading or extracting.

2. Added units tests to load, download and extract functions.
2019-06-06 10:12:29 -07:00
Said Bleik 9269ef5482 merge staging 2019-06-06 13:01:07 -04:00
Said Bleik c518d6a735 updated tc notebook and some utils 2019-06-05 21:37:16 -04:00
Abhiram E 3ac927edfa Using tqdm to show progress bar 2019-06-05 13:08:23 -04:00
Abhiram E 0e296b6291 Changed url fetch from urlretrieve to requests 2019-06-04 16:26:35 -04:00
Said Bleik ee9134d96f minor updates to seq classification 2019-06-03 10:03:49 -04:00
Hong Lu 9bcad55d20 Updated NER notebook with wikigold data. 2019-05-31 18:44:01 -04:00
Said Bleik 61b66a57aa updated device utils 2019-05-31 16:08:36 -04:00
Hong Lu 320b08d9af Removed BERT image. 2019-05-31 13:46:34 -04:00
Said Bleik 1a96bce557 added missing assignment 2019-05-30 10:42:30 -04:00
Hong Lu aaf0114cd7 Removed old scripts. 2019-05-29 14:57:23 -04:00
Hong Lu 52bd027555 Added helper function for postprocessing token classification results. 2019-05-29 14:39:58 -04:00
Said Bleik 5a81055e70 updated device utils and bert seq classifier 2019-05-28 23:16:19 -04:00
Abhiram E 36d7411bec Fix to limit the memory usage when using fasttext embedding loaders. Code changes to use the simpler version 2019-05-28 12:04:57 -04:00
Hong Lu 52cc16fb9b Updated token classifier api. 2019-05-24 18:09:56 -04:00
Hong Lu 5258c9cd7e Added some utility functions to the common script. Will be merged with common.py later. 2019-05-24 18:09:04 -04:00
Casey Hong 1cd36ccff7 fix snli noblank bug and add preprocessing tests 2019-05-21 23:00:56 -04:00
Said Bleik 63e546ab3c updated prerocessing, utils, classification 2019-05-21 16:45:23 -04:00
Hong Lu 2473e1a75c Black auto formatting. 2019-05-20 18:53:57 -04:00
Hong Lu 3d1c1862d9 Removed old data utils script. 2019-05-20 14:08:39 -04:00
Hong Lu 4a41ec41e8 Added a constant file. 2019-05-20 14:00:12 -04:00
Hong Lu 1393c74fb3 Minor updates for data class updates. 2019-05-20 13:59:38 -04:00
Hong Lu 9919a7bd35 Remived InputFeature class. Use namedtuple instead of class for input data. 2019-05-20 13:58:54 -04:00
Hong Lu e81138ad08 Changed optimizer and number of epochs configuration. 2019-05-20 13:58:16 -04:00
Said Bleik 49bb116474 update seq classifer 2019-05-17 10:04:46 -04:00
Hong Lu eef85dea41 Consolidated all configuration classes into a single class. 2019-05-16 18:11:21 -04:00
Hong Lu 7ca29691ae Consolidated some utility functions into BertTokenClassifier. 2019-05-16 18:10:47 -04:00
Hong Lu d87dfbc2af Minor edits and added docstring. 2019-05-16 18:10:14 -04:00
Hong Lu 14543fbd52 Added yaml configuration file for NER example. 2019-05-16 18:08:50 -04:00
Abhiram E 52d720e9bf Added option to limit number of word vectors for glove and word2vec 2019-05-15 00:22:37 -04:00
Janhavi Mahajan 1ed2c4dc0a feat(bug fix) updated snli notebook with to_lowercase_all() instead of to_lowercase() that expects a column name list. Fixed None object returning in to_lowercase when column name list is not passed 2019-05-13 18:14:31 -04:00
Said Bleik e9c17a961e update BERTSequenceClassifier and notebook 2019-05-13 15:18:21 -04:00
Said Bleik 7430e3b178 updated BERTSequenceClassifier + documentation 2019-05-13 14:38:54 -04:00
Said Bleik 7d2d74f975 BERTSequenceClassifier 2019-05-13 16:31:58 +00:00
Janhavi Mahajan bb5764a56a feat(code fix) rm_nltk_stop_words now expects sentences and stop_word column names 2019-05-10 16:50:34 -04:00
Janhavi Mahajan 197d771208 feat(code review comments) generalize nltk utils tokenize, remove_sto_words to more than 2 sentences 2019-05-10 16:27:48 -04:00
Janhavi Mahajan 6e3523810a feat(code review) fix to_nltk_tokens, add to_lowercase_all and to_lowercase as per said's comments 2019-05-10 16:27:48 -04:00
Abhiram E 49595b8666 Moved urls to module constants for pretrained embedding utils. 2019-05-09 14:58:12 -04:00
Casey Hong faf924b45b token_cols bugfix 2019-05-09 14:58:11 -04:00
Abhiram E 6ba272308b Minor change. 2019-05-09 14:58:11 -04:00
Abhiram E 2502d91e1b FastText loader - Code changes and unit tests.
1. Added methods to download, extract and load glove vectors.
2. Added units test to test the public method.

Other changes
 1. Refactored files to add return types to docstrings.
 2. Minor changes to path variables.
2019-05-09 14:58:11 -04:00
Abhiram E 4e480026a0 Minor changes 2019-05-09 14:58:10 -04:00
abeswara 8025b4449d Glove loader - Code changes and unit tests.
1. Added methods to download, extract and load glove vectors.
2. Added units tests to test the public methods.

Other changes
 1. Made download and extract methods private.
 2. Refactored Word2vec unit tests to exclude private methods.
2019-05-09 14:58:10 -04:00
abeswara 8408d7cce2 Word2vec loader - Code changes and unit tests.
1. Refactored word2vec loader to perform existing file checks before downloading or extracting.

2. Added units tests to load, download and extract functions.
2019-05-09 14:58:10 -04:00
Abhiram E 9895dd41d7 Reformated files 2019-05-09 14:58:10 -04:00
Abhiram E 47ada0d03c Added support to download and extract word2vec pretrained vectors 2019-05-09 14:58:10 -04:00
Abhiram E 48adc4f619 Initial commit for word embeddings 2019-05-09 14:58:10 -04:00
miguelgfierro 3c3ce8c14a got timer from recommenders 2019-05-09 17:25:44 +01:00
Hong Lu 2af4d4a008 Moved notebooks to example folder. 2019-05-07 10:22:48 -04:00
Hong Lu 6e5b060e08 Added utils path to system path. 2019-05-07 10:20:03 -04:00
Hong Lu bd4e805733 Updates to expose BERT objects to the user. 2019-05-07 10:01:55 -04:00
Said Bleik 23dad01abb
Merge pull request #35 from Microsoft/maidap-sentence-similarity
Sentence Similarity Datasets with New Folder Structure
2019-05-03 20:49:18 -04:00
Casey Hong d65afe27f8 make colnames args in preprocess 2019-05-03 16:47:43 -04:00
Hong Lu b15b0a4dfd Fixed a few minor issues found during testing. 2019-05-02 17:59:47 -04:00
abeswara 84ac44cbc0 Resolved code review comments 2019-05-02 12:06:52 -04:00
Hong Lu d5ee6d46cb Initial check in of bert utility functions. 2019-05-02 10:50:30 -04:00
Said Bleik 10adf59777 update env, yahoo_answers, & classification eval 2019-05-01 22:49:41 +00:00
Janhavi Mahajan 338e606c5e feat(code review comments) refactoring based on Miguel's comments 2019-05-01 18:40:44 -04:00
Casey Hong 810beb6f2c organize stsbenchmark under new folder structure 2019-05-01 18:35:02 -04:00
Casey Hong 25a176b2cc rm_stopwords suffix 2019-04-30 15:05:17 -04:00
Said Bleik 757e7d063d
Merge pull request #28 from Microsoft/maidap-sentence-similarity
Sentence similarity dataset
2019-04-30 12:26:04 -04:00
Said Bleik f2467d5286 folder structure & example utils 2019-04-30 15:51:47 +00:00
Casey Hong dc4eac5aee refactor for consistency between snli <=> sts notebooks, add gensen-specific preprocessing for snli 2019-04-29 14:59:52 -04:00
Casey Hong 1aa60a3a00 begin snli-sts consistency refactoring 2019-04-29 14:59:52 -04:00
Janhavi Mahajan 1498bfb853 feat(code refactoring) moving code around as per the new structure decided. 2019-04-26 12:11:55 -04:00
Janhavi Mahajan ba2ad0cbfa feat(code reformat) deleted snli from util_nlp 2019-04-26 12:11:55 -04:00
Janhavi Mahajan f0070819ea feat(code reformat) Formatting code based on new folder structure 2019-04-26 12:11:55 -04:00
Janhavi Mahajan 4aadf66654 feat(code reformat) moved nltk utils to preprocess.py 2019-04-26 12:11:55 -04:00
Janhavi Mahajan faa26b3c54 feat(doc strings) fixed doc string format 2019-04-26 12:11:55 -04:00
Janhavi Mahajan 88e5a3d724 feat(code format) formatted file with black 2019-04-26 12:11:55 -04:00
Janhavi Mahajan c969085424 feat(code format) added doc strings, rewrite clean_snli function 2019-04-26 12:11:55 -04:00
Janhavi Mahajan 44db348fe5 feat(data prep) save dataframe to csv and renamed folder from nltk to nltk_utils 2019-04-26 12:11:55 -04:00
Janhavi Mahajan 6e46eade15 feat(data_prep) SNLI notebook showcasing data prep, Corrected nltk util for column_name 2019-04-26 12:11:55 -04:00
Janhavi Mahajan 3964c04a7c feat(data prep) NLTK tokenizer util file and notebook, deleted some redundant files, updated snli util with cleaner data prep functions 2019-04-26 12:11:55 -04:00
Janhavi Mahajan f7b487cfbd feat(data_prep) Added SNLI dataset prep utility 2019-04-26 12:11:55 -04:00
Abhiram E 84443d478c Refactored STS notebooks, updated utils_nlp files with the latest code from utils_ss and deleted utils_ss 2019-04-24 17:16:06 -04:00
Abhiram E ffb38ea42b Refactored code according to new structure, moved files and modified imports 2019-04-24 15:33:41 -04:00
Abhiram E d4db5a1860 Resolving code review comments.
1. Refactored and renamed msrpc_load notebook.
2. Removed redundant parameter to load_pandas_df function
2019-04-24 15:05:53 -04:00
Abhiram E f66ee268c0 Refactoring changes to MSRPC 2019-04-24 15:05:52 -04:00
Abhiram E b9fce4ae61 Notebooks and Tests
1. Added Jupyter Notebook for MSR-PC dataset quickstart task
2. Added unit tests for downloading the dataset and loading pandas df
3. Changes to MSRPC to take in path to the dataset if it already exists.
2019-04-24 15:05:00 -04:00
Abhiram E ac0abdfd61 Data loader for MSR PC
1. Added data downloader for MSR PC
2. Added support to clean data and load specified datasets as a
pandas dataframe.
3. Updates to environment.yml for newly added packages.
2019-04-24 15:03:41 -04:00
Casey Hong d20081766d Add preprocessing notebook 2019-04-24 15:02:26 -04:00
Casey Hong abacb5d022 Add tokenization with spacy 2019-04-24 15:02:26 -04:00
Casey Hong b2bed84e0d Include score column in dataframe 2019-04-24 15:02:26 -04:00
Casey Hong f06630a55d Download and clean stsbenchmark data 2019-04-24 15:02:26 -04:00
Casey Hong 819f0a215b moving files to the sentence_similarity scenario directory 2019-04-24 13:54:53 -04:00
Casey Hong 6793a77608 clip docstring line length at 120 2019-04-22 17:47:59 -04:00
Casey Hong 81980e9eb6 Add and format docstrings 2019-04-22 17:47:59 -04:00
Casey Hong 42a9c11ac7 Add docstrings 2019-04-22 14:14:12 -04:00
Casey Hong b31b7c3b13 Fix merge conflicts for rebase 2019-04-22 14:14:12 -04:00
Casey Hong 7176d7812e Create sentence similarity branch 2019-04-18 15:10:46 -04:00
miguelgfierro 2effbfcfcb cleaning 2019-04-16 19:53:15 +01:00
Richin Jain 2c5b8e587e Intial commit to put the receipe template in 2019-04-05 13:55:58 -04:00