Граф коммитов

195 Коммитов

Автор SHA1 Сообщение Дата
Casey Hong e5b12c6f32 resolve merge conflicts 2019-06-11 11:45:30 -04:00
Casey Hong 23d9635230 senteval local and azureml 📓 2019-06-06 10:57:05 -07:00
Abhiram E f0db07fb3a Minor change. 2019-06-06 10:20:57 -07:00
Abhiram E 5b1ed5f447 FastText loader - Code changes and unit tests.
1. Added methods to download, extract and load glove vectors.
2. Added units test to test the public method.

Other changes
 1. Refactored files to add return types to docstrings.
 2. Minor changes to path variables.
2019-06-06 10:20:57 -07:00
Abhiram E 2498dbaaa1 Minor changes 2019-06-06 10:18:13 -07:00
abeswara 008bfa2c57 Glove loader - Code changes and unit tests.
1. Added methods to download, extract and load glove vectors.
2. Added units tests to test the public methods.

Other changes
 1. Made download and extract methods private.
 2. Refactored Word2vec unit tests to exclude private methods.
2019-06-06 10:16:46 -07:00
abeswara c9006c8b65 Word2vec loader - Code changes and unit tests.
1. Refactored word2vec loader to perform existing file checks before downloading or extracting.

2. Added units tests to load, download and extract functions.
2019-06-06 10:13:10 -07:00
abeswara ae31e05a84 Word2vec loader - Code changes and unit tests.
1. Refactored word2vec loader to perform existing file checks before downloading or extracting.

2. Added units tests to load, download and extract functions.
2019-06-06 10:12:29 -07:00
Said Bleik c391c0bba7
Merge pull request #84 from microsoft/abhiram-requests-fix
Using tqdm to show progress bar
2019-06-05 13:27:51 -04:00
Abhiram E 3ac927edfa Using tqdm to show progress bar 2019-06-05 13:08:23 -04:00
Said Bleik 30edbfe28f
Merge pull request #82 from microsoft/abhiram-requests-fix
Changed url fetch from urlretrieve to requests
2019-06-04 16:51:53 -04:00
Said Bleik 9e716a60b3
Merge pull request #81 from microsoft/setup
Create conda generator and setup.md
2019-06-04 16:34:27 -04:00
Abhiram E 0e296b6291 Changed url fetch from urlretrieve to requests 2019-06-04 16:26:35 -04:00
miguelgfierro 6464d246e8 gitignore and conda file 2019-06-04 14:51:35 +01:00
miguelgfierro 0c24b8d7ab 🐛 2019-06-04 11:37:44 +01:00
miguelgfierro c06fa8e170 updated setup 📝 2019-06-04 11:29:40 +01:00
miguelgfierro 8c239a61cb updated setup 📝 2019-06-04 11:29:25 +01:00
miguelgfierro eaf24ac5d5 conda file 2019-06-04 11:22:35 +01:00
miguelgfierro af9f671645 update setup 2019-06-04 11:09:15 +01:00
Said Bleik ba716d109a
Merge pull request #70 from microsoft/datasets
Datasets
2019-05-28 13:39:39 -04:00
Said Bleik 96b3015096
Merge pull request #72 from microsoft/abhiram-embedding-fix
Fix to limit the memory usage when using fasttext embedding loaders
2019-05-28 13:02:39 -04:00
Abhiram E 36d7411bec Fix to limit the memory usage when using fasttext embedding loaders. Code changes to use the simpler version 2019-05-28 12:04:57 -04:00
miguelgfierro 7ffc3cb6f6 Merge branch 'datasets' of https://github.com/Microsoft/NLP into datasets 2019-05-28 16:06:52 +01:00
miguelgfierro 835492509b minor 🐛 in readme 2019-05-28 15:57:56 +01:00
miguelgfierro aee2197db4 add bigger tolerance 2019-05-28 13:58:05 +00:00
miguelgfierro 403457b3c3 refactor 💥 2019-05-28 13:38:02 +01:00
Said Bleik 2dc37f87eb
Merge pull request #67 from microsoft/test
Test
2019-05-24 23:34:03 -04:00
Said Bleik 4fa1aa8bcd
Merge pull request #66 from microsoft/rijai/componentgovernance
Adding component governance tool to build pipeline.
2019-05-24 23:31:16 -04:00
Richin Jain 620f3ebe8c Adding component governance tool to build pipeline. 2019-05-24 15:12:19 -04:00
miguelgfierro 3c1708a21d readme update 📝 2019-05-24 18:32:28 +00:00
miguelgfierro c8fc93d4b6 🐛 2019-05-24 17:57:53 +00:00
miguelgfierro f0936bd9b1 added papermill 2019-05-24 16:54:17 +00:00
miguelgfierro 0f2fcd4f83 added new notebooks 2019-05-24 15:00:18 +00:00
miguelgfierro f03f712cfa added data integration tests with notebooks 2019-05-24 14:26:36 +00:00
miguelgfierro 03b3b387a6 refactoring tests 2019-05-24 14:06:22 +01:00
Said Bleik 2daeb1716e
Merge pull request #64 from microsoft/casey-gensen-noblank
Gensen noblank bugfix + Add preprocessing tests
2019-05-22 15:14:42 -04:00
Casey Hong a1da16f391 use fixture directly 2019-05-22 12:42:01 -04:00
Casey Hong 1cd36ccff7 fix snli noblank bug and add preprocessing tests 2019-05-21 23:00:56 -04:00
Miguel González-Fierro 9bd941d2f8
Merge pull request #61 from microsoft/abhiram-gensim-limit
Added option to limit number of word vectors for glove and word2vec
2019-05-16 13:04:41 +01:00
Abhiram E ce6d783adf Separated the asserts in tests 2019-05-15 10:51:56 -04:00
Miguel González-Fierro 7aa740606d
Merge pull request #59 from microsoft/issue_template
issue template
2019-05-15 15:17:51 +01:00
Abhiram E 52d720e9bf Added option to limit number of word vectors for glove and word2vec 2019-05-15 00:22:37 -04:00
miguelgfierro a5144f2626 issue template 2019-05-14 12:21:40 +01:00
Said Bleik 33da65e0a3
Merge pull request #58 from microsoft/janhavi-update-snliNB
[Fix] SNLI notebook and preprocess.py
2019-05-13 21:45:01 -04:00
Janhavi Mahajan 1ed2c4dc0a feat(bug fix) updated snli notebook with to_lowercase_all() instead of to_lowercase() that expects a column name list. Fixed None object returning in to_lowercase when column name list is not passed 2019-05-13 18:14:31 -04:00
Said Bleik 07ca05dd04
Merge pull request #47 from microsoft/maidap-sentence-similarity
Baseline model notebook and embeddings trainer notebook
2019-05-11 01:09:28 +00:00
Janhavi Mahajan 9338f40cdc
Merge pull request #57 from microsoft/janhavi-fix-preprocessing-file
Preprocess utils
2019-05-10 16:53:37 -04:00
Janhavi Mahajan bb5764a56a feat(code fix) rm_nltk_stop_words now expects sentences and stop_word column names 2019-05-10 16:50:34 -04:00
Janhavi Mahajan 197d771208 feat(code review comments) generalize nltk utils tokenize, remove_sto_words to more than 2 sentences 2019-05-10 16:27:48 -04:00
Janhavi Mahajan 6e3523810a feat(code review) fix to_nltk_tokens, add to_lowercase_all and to_lowercase as per said's comments 2019-05-10 16:27:48 -04:00