Граф коммитов

2242 Коммитов

Автор SHA1 Сообщение Дата
Hong Lu 52cc16fb9b Updated token classifier api. 2019-05-24 18:09:56 -04:00
Hong Lu 5258c9cd7e Added some utility functions to the common script. Will be merged with common.py later. 2019-05-24 18:09:04 -04:00
Richin Jain 620f3ebe8c Adding component governance tool to build pipeline. 2019-05-24 15:12:19 -04:00
miguelgfierro 3c1708a21d readme update 📝 2019-05-24 18:32:28 +00:00
miguelgfierro c8fc93d4b6 🐛 2019-05-24 17:57:53 +00:00
miguelgfierro f0936bd9b1 added papermill 2019-05-24 16:54:17 +00:00
miguelgfierro 0f2fcd4f83 added new notebooks 2019-05-24 15:00:18 +00:00
miguelgfierro f03f712cfa added data integration tests with notebooks 2019-05-24 14:26:36 +00:00
miguelgfierro 03b3b387a6 refactoring tests 2019-05-24 14:06:22 +01:00
Said Bleik 2daeb1716e
Merge pull request #64 from microsoft/casey-gensen-noblank
Gensen noblank bugfix + Add preprocessing tests
2019-05-22 15:14:42 -04:00
Casey Hong a1da16f391 use fixture directly 2019-05-22 12:42:01 -04:00
Casey Hong 1cd36ccff7 fix snli noblank bug and add preprocessing tests 2019-05-21 23:00:56 -04:00
Said Bleik 63e546ab3c updated prerocessing, utils, classification 2019-05-21 16:45:23 -04:00
Hong Lu 2473e1a75c Black auto formatting. 2019-05-20 18:53:57 -04:00
Hong Lu 3d1c1862d9 Removed old data utils script. 2019-05-20 14:08:39 -04:00
Hong Lu 4a41ec41e8 Added a constant file. 2019-05-20 14:00:12 -04:00
Hong Lu 1393c74fb3 Minor updates for data class updates. 2019-05-20 13:59:38 -04:00
Hong Lu 9919a7bd35 Remived InputFeature class. Use namedtuple instead of class for input data. 2019-05-20 13:58:54 -04:00
Hong Lu e81138ad08 Changed optimizer and number of epochs configuration. 2019-05-20 13:58:16 -04:00
Said Bleik 49bb116474 update seq classifer 2019-05-17 10:04:46 -04:00
Hong Lu eef85dea41 Consolidated all configuration classes into a single class. 2019-05-16 18:11:21 -04:00
Hong Lu 7ca29691ae Consolidated some utility functions into BertTokenClassifier. 2019-05-16 18:10:47 -04:00
Hong Lu d87dfbc2af Minor edits and added docstring. 2019-05-16 18:10:14 -04:00
Hong Lu 2732da2717 Updated NER notebook with new BertTokenClassifier class. 2019-05-16 18:09:40 -04:00
Hong Lu 14543fbd52 Added yaml configuration file for NER example. 2019-05-16 18:08:50 -04:00
Miguel González-Fierro 9bd941d2f8
Merge pull request #61 from microsoft/abhiram-gensim-limit
Added option to limit number of word vectors for glove and word2vec
2019-05-16 13:04:41 +01:00
Abhiram E ce6d783adf Separated the asserts in tests 2019-05-15 10:51:56 -04:00
Miguel González-Fierro 7aa740606d
Merge pull request #59 from microsoft/issue_template
issue template
2019-05-15 15:17:51 +01:00
Abhiram E 52d720e9bf Added option to limit number of word vectors for glove and word2vec 2019-05-15 00:22:37 -04:00
miguelgfierro a5144f2626 issue template 2019-05-14 12:21:40 +01:00
Said Bleik 33da65e0a3
Merge pull request #58 from microsoft/janhavi-update-snliNB
[Fix] SNLI notebook and preprocess.py
2019-05-13 21:45:01 -04:00
Janhavi Mahajan 1ed2c4dc0a feat(bug fix) updated snli notebook with to_lowercase_all() instead of to_lowercase() that expects a column name list. Fixed None object returning in to_lowercase when column name list is not passed 2019-05-13 18:14:31 -04:00
Said Bleik e9c17a961e update BERTSequenceClassifier and notebook 2019-05-13 15:18:21 -04:00
Said Bleik 7430e3b178 updated BERTSequenceClassifier + documentation 2019-05-13 14:38:54 -04:00
Said Bleik 7d2d74f975 BERTSequenceClassifier 2019-05-13 16:31:58 +00:00
Said Bleik 07ca05dd04
Merge pull request #47 from microsoft/maidap-sentence-similarity
Baseline model notebook and embeddings trainer notebook
2019-05-11 01:09:28 +00:00
Janhavi Mahajan 9338f40cdc
Merge pull request #57 from microsoft/janhavi-fix-preprocessing-file
Preprocess utils
2019-05-10 16:53:37 -04:00
Janhavi Mahajan bb5764a56a feat(code fix) rm_nltk_stop_words now expects sentences and stop_word column names 2019-05-10 16:50:34 -04:00
Janhavi Mahajan 197d771208 feat(code review comments) generalize nltk utils tokenize, remove_sto_words to more than 2 sentences 2019-05-10 16:27:48 -04:00
Janhavi Mahajan 6e3523810a feat(code review) fix to_nltk_tokens, add to_lowercase_all and to_lowercase as per said's comments 2019-05-10 16:27:48 -04:00
Courtney Cochrane 2058c77a2c Changing structure 2019-05-10 10:04:13 -04:00
Courtney Cochrane 0812acf1be Change based folder structure adjusment 2019-05-10 10:00:57 -04:00
Courtney Cochrane b1b5ec1b97 Add in timer to time each of the embedding trainers 2019-05-09 15:16:13 -04:00
Courtney Cochrane 17dee0d7a3 Add detail to data loading part, add links to original papers 2019-05-09 14:58:12 -04:00
Courtney Cochrane 12125a350d add links to original model papers 2019-05-09 14:58:12 -04:00
Casey Hong 4517c0697c persist clean_snli 2019-05-09 14:58:12 -04:00
Abhiram E 49595b8666 Moved urls to module constants for pretrained embedding utils. 2019-05-09 14:58:12 -04:00
Courtney Cochrane bc2f51193c revert Readme to latest branch changes 2019-05-09 14:58:12 -04:00
Courtney Cochrane aca561829f README edit 2019-05-09 14:58:12 -04:00
Courtney Cochrane d1b99225a4 Integrate fastText embedding loader and small edits 2019-05-09 14:58:12 -04:00