Граф коммитов

2209 Коммитов

Автор SHA1 Сообщение Дата
Said Bleik 7430e3b178 updated BERTSequenceClassifier + documentation 2019-05-13 14:38:54 -04:00
Said Bleik 7d2d74f975 BERTSequenceClassifier 2019-05-13 16:31:58 +00:00
Said Bleik 07ca05dd04
Merge pull request #47 from microsoft/maidap-sentence-similarity
Baseline model notebook and embeddings trainer notebook
2019-05-11 01:09:28 +00:00
Janhavi Mahajan 9338f40cdc
Merge pull request #57 from microsoft/janhavi-fix-preprocessing-file
Preprocess utils
2019-05-10 16:53:37 -04:00
Janhavi Mahajan bb5764a56a feat(code fix) rm_nltk_stop_words now expects sentences and stop_word column names 2019-05-10 16:50:34 -04:00
Janhavi Mahajan 197d771208 feat(code review comments) generalize nltk utils tokenize, remove_sto_words to more than 2 sentences 2019-05-10 16:27:48 -04:00
Janhavi Mahajan 6e3523810a feat(code review) fix to_nltk_tokens, add to_lowercase_all and to_lowercase as per said's comments 2019-05-10 16:27:48 -04:00
Courtney Cochrane 2058c77a2c Changing structure 2019-05-10 10:04:13 -04:00
Courtney Cochrane 0812acf1be Change based folder structure adjusment 2019-05-10 10:00:57 -04:00
Courtney Cochrane b1b5ec1b97 Add in timer to time each of the embedding trainers 2019-05-09 15:16:13 -04:00
Courtney Cochrane 17dee0d7a3 Add detail to data loading part, add links to original papers 2019-05-09 14:58:12 -04:00
Courtney Cochrane 12125a350d add links to original model papers 2019-05-09 14:58:12 -04:00
Casey Hong 4517c0697c persist clean_snli 2019-05-09 14:58:12 -04:00
Abhiram E 49595b8666 Moved urls to module constants for pretrained embedding utils. 2019-05-09 14:58:12 -04:00
Courtney Cochrane bc2f51193c revert Readme to latest branch changes 2019-05-09 14:58:12 -04:00
Courtney Cochrane aca561829f README edit 2019-05-09 14:58:12 -04:00
Courtney Cochrane d1b99225a4 Integrate fastText embedding loader and small edits 2019-05-09 14:58:12 -04:00
Courtney Cochrane be929e2408 Edits to import statements and grammar edits 2019-05-09 14:58:12 -04:00
Courtney Cochrane 54fc9c1334 Adding infastText loader 2019-05-09 14:58:12 -04:00
Courtney Cochrane 6a07b6f36e Adding file to new place in folder structure 2019-05-09 14:58:12 -04:00
Courtney Cochrane 2aaa9dd955 New file structure 2019-05-09 14:58:12 -04:00
Courtney Cochrane 9de6161e5e Revert back README after take out my comments 2019-05-09 14:58:12 -04:00
Courtney Cochrane b75b612195 Revert back README after take out my comments 2019-05-09 14:58:11 -04:00
Courtney Cochrane 45dfcd8d62 Changes to paths and revert back mistaken change to gitignore 2019-05-09 14:58:11 -04:00
Courtney Cochrane 60877c0d05 Move embeddings notebook into embeddings folder 2019-05-09 14:58:11 -04:00
Courtney Cochrane 10650b6e50 Changing file location based on new folder structure 2019-05-09 14:58:11 -04:00
Courtney Cochrane 9fb013824f Edits to embedding and baselines notebooks and integration of glove embedding loader 2019-05-09 14:58:11 -04:00
Courtney Cochrane 5d8583601c Edits to embedding trainer and baselines 2019-05-09 14:58:11 -04:00
Courtney Cochrane 1f52f797b4 Adding link to word embeddings 2019-05-09 14:58:11 -04:00
Courtney Cochrane f2d17e438a Cleaning README of AS IS statements that we don't need aymore 2019-05-09 14:58:11 -04:00
irshaffe 7d83aac99b Update baseline_deep_dive.ipynb
Removed typo of s
2019-05-09 14:58:11 -04:00
Courtney Cochrane 0ed1eeaf16 moved code to new folder structure 2019-05-09 14:58:11 -04:00
Courtney Cochrane 22c2d5db34 Edits on baselines and finished glove embedding trainer 2019-05-09 14:58:11 -04:00
Courtney Cochrane 92c81e0c98 Small change to baselines notebook 2019-05-09 14:58:11 -04:00
Courtney Cochrane 37608031ff Fix preprocessing in baselines, move files to new folder structure, and creating embedding trainer notebook 2019-05-09 14:58:11 -04:00
Courtney Cochrane 9f594e27dc Spacing in README 2019-05-09 14:58:11 -04:00
Courtney Cochrane 848d400a2a Spacing in README 2019-05-09 14:58:11 -04:00
Courtney Cochrane 5811935143 Spacing in README 2019-05-09 14:58:11 -04:00
Courtney Cochrane 22e5cdf37c Black formatting and embedding attribution in README 2019-05-09 14:58:11 -04:00
Courtney Cochrane 9e249aa209 Add fastText and reorganize notebook 2019-05-09 14:58:11 -04:00
Courtney Cochrane 8a0cb41d8b Adding processing util and edits 2019-05-09 14:58:11 -04:00
Courtney Cochrane 8ae3d9db93 Baseline model deep dive notebook 2019-05-09 14:58:11 -04:00
Casey Hong b4b405ad9d Download and clean stsbenchmark data 2019-05-09 14:58:11 -04:00
Casey Hong faf924b45b token_cols bugfix 2019-05-09 14:58:11 -04:00
Abhiram E b955e53d0d Added smoke tests to verify extracted sizes of pretrained vectors 2019-05-09 14:58:11 -04:00
Abhiram E 6ba272308b Minor change. 2019-05-09 14:58:11 -04:00
Abhiram E 2502d91e1b FastText loader - Code changes and unit tests.
1. Added methods to download, extract and load glove vectors.
2. Added units test to test the public method.

Other changes
 1. Refactored files to add return types to docstrings.
 2. Minor changes to path variables.
2019-05-09 14:58:11 -04:00
Abhiram E 4e480026a0 Minor changes 2019-05-09 14:58:10 -04:00
abeswara 8025b4449d Glove loader - Code changes and unit tests.
1. Added methods to download, extract and load glove vectors.
2. Added units tests to test the public methods.

Other changes
 1. Made download and extract methods private.
 2. Refactored Word2vec unit tests to exclude private methods.
2019-05-09 14:58:10 -04:00
abeswara 8203b0150d Word2vec loader - Code changes and unit tests.
1. Refactored word2vec loader to perform existing file checks before downloading or extracting.

2. Added units tests to load, download and extract functions.
2019-05-09 14:58:10 -04:00