Said Bleik
7430e3b178
updated BERTSequenceClassifier + documentation
2019-05-13 14:38:54 -04:00
Said Bleik
7d2d74f975
BERTSequenceClassifier
2019-05-13 16:31:58 +00:00
Said Bleik
07ca05dd04
Merge pull request #47 from microsoft/maidap-sentence-similarity
...
Baseline model notebook and embeddings trainer notebook
2019-05-11 01:09:28 +00:00
Janhavi Mahajan
9338f40cdc
Merge pull request #57 from microsoft/janhavi-fix-preprocessing-file
...
Preprocess utils
2019-05-10 16:53:37 -04:00
Janhavi Mahajan
bb5764a56a
feat(code fix) rm_nltk_stop_words now expects sentences and stop_word column names
2019-05-10 16:50:34 -04:00
Janhavi Mahajan
197d771208
feat(code review comments) generalize nltk utils tokenize, remove_sto_words to more than 2 sentences
2019-05-10 16:27:48 -04:00
Janhavi Mahajan
6e3523810a
feat(code review) fix to_nltk_tokens, add to_lowercase_all and to_lowercase as per said's comments
2019-05-10 16:27:48 -04:00
Courtney Cochrane
2058c77a2c
Changing structure
2019-05-10 10:04:13 -04:00
Courtney Cochrane
0812acf1be
Change based folder structure adjusment
2019-05-10 10:00:57 -04:00
Courtney Cochrane
b1b5ec1b97
Add in timer to time each of the embedding trainers
2019-05-09 15:16:13 -04:00
Courtney Cochrane
17dee0d7a3
Add detail to data loading part, add links to original papers
2019-05-09 14:58:12 -04:00
Courtney Cochrane
12125a350d
add links to original model papers
2019-05-09 14:58:12 -04:00
Casey Hong
4517c0697c
persist clean_snli
2019-05-09 14:58:12 -04:00
Abhiram E
49595b8666
Moved urls to module constants for pretrained embedding utils.
2019-05-09 14:58:12 -04:00
Courtney Cochrane
bc2f51193c
revert Readme to latest branch changes
2019-05-09 14:58:12 -04:00
Courtney Cochrane
aca561829f
README edit
2019-05-09 14:58:12 -04:00
Courtney Cochrane
d1b99225a4
Integrate fastText embedding loader and small edits
2019-05-09 14:58:12 -04:00
Courtney Cochrane
be929e2408
Edits to import statements and grammar edits
2019-05-09 14:58:12 -04:00
Courtney Cochrane
54fc9c1334
Adding infastText loader
2019-05-09 14:58:12 -04:00
Courtney Cochrane
6a07b6f36e
Adding file to new place in folder structure
2019-05-09 14:58:12 -04:00
Courtney Cochrane
2aaa9dd955
New file structure
2019-05-09 14:58:12 -04:00
Courtney Cochrane
9de6161e5e
Revert back README after take out my comments
2019-05-09 14:58:12 -04:00
Courtney Cochrane
b75b612195
Revert back README after take out my comments
2019-05-09 14:58:11 -04:00
Courtney Cochrane
45dfcd8d62
Changes to paths and revert back mistaken change to gitignore
2019-05-09 14:58:11 -04:00
Courtney Cochrane
60877c0d05
Move embeddings notebook into embeddings folder
2019-05-09 14:58:11 -04:00
Courtney Cochrane
10650b6e50
Changing file location based on new folder structure
2019-05-09 14:58:11 -04:00
Courtney Cochrane
9fb013824f
Edits to embedding and baselines notebooks and integration of glove embedding loader
2019-05-09 14:58:11 -04:00
Courtney Cochrane
5d8583601c
Edits to embedding trainer and baselines
2019-05-09 14:58:11 -04:00
Courtney Cochrane
1f52f797b4
Adding link to word embeddings
2019-05-09 14:58:11 -04:00
Courtney Cochrane
f2d17e438a
Cleaning README of AS IS statements that we don't need aymore
2019-05-09 14:58:11 -04:00
irshaffe
7d83aac99b
Update baseline_deep_dive.ipynb
...
Removed typo of s
2019-05-09 14:58:11 -04:00
Courtney Cochrane
0ed1eeaf16
moved code to new folder structure
2019-05-09 14:58:11 -04:00
Courtney Cochrane
22c2d5db34
Edits on baselines and finished glove embedding trainer
2019-05-09 14:58:11 -04:00
Courtney Cochrane
92c81e0c98
Small change to baselines notebook
2019-05-09 14:58:11 -04:00
Courtney Cochrane
37608031ff
Fix preprocessing in baselines, move files to new folder structure, and creating embedding trainer notebook
2019-05-09 14:58:11 -04:00
Courtney Cochrane
9f594e27dc
Spacing in README
2019-05-09 14:58:11 -04:00
Courtney Cochrane
848d400a2a
Spacing in README
2019-05-09 14:58:11 -04:00
Courtney Cochrane
5811935143
Spacing in README
2019-05-09 14:58:11 -04:00
Courtney Cochrane
22e5cdf37c
Black formatting and embedding attribution in README
2019-05-09 14:58:11 -04:00
Courtney Cochrane
9e249aa209
Add fastText and reorganize notebook
2019-05-09 14:58:11 -04:00
Courtney Cochrane
8a0cb41d8b
Adding processing util and edits
2019-05-09 14:58:11 -04:00
Courtney Cochrane
8ae3d9db93
Baseline model deep dive notebook
2019-05-09 14:58:11 -04:00
Casey Hong
b4b405ad9d
Download and clean stsbenchmark data
2019-05-09 14:58:11 -04:00
Casey Hong
faf924b45b
token_cols bugfix
2019-05-09 14:58:11 -04:00
Abhiram E
b955e53d0d
Added smoke tests to verify extracted sizes of pretrained vectors
2019-05-09 14:58:11 -04:00
Abhiram E
6ba272308b
Minor change.
2019-05-09 14:58:11 -04:00
Abhiram E
2502d91e1b
FastText loader - Code changes and unit tests.
...
1. Added methods to download, extract and load glove vectors.
2. Added units test to test the public method.
Other changes
1. Refactored files to add return types to docstrings.
2. Minor changes to path variables.
2019-05-09 14:58:11 -04:00
Abhiram E
4e480026a0
Minor changes
2019-05-09 14:58:10 -04:00
abeswara
8025b4449d
Glove loader - Code changes and unit tests.
...
1. Added methods to download, extract and load glove vectors.
2. Added units tests to test the public methods.
Other changes
1. Made download and extract methods private.
2. Refactored Word2vec unit tests to exclude private methods.
2019-05-09 14:58:10 -04:00
abeswara
8203b0150d
Word2vec loader - Code changes and unit tests.
...
1. Refactored word2vec loader to perform existing file checks before downloading or extracting.
2. Added units tests to load, download and extract functions.
2019-05-09 14:58:10 -04:00