* OpenAI embeddings with GPU based KNN
Added OpenAI embeddings with GPU based KNN using NVIDIA Rapids
* Added databricks rapids ml init script
Added init script to install repids ml using cuda 11.8
* No outputs
Removed outputs
* Added testing code
With GPU KNN notebook test code
* Added GPU test code
Added GPU test code to OpenAI with KNN notebook
* Fixed extra bracket
* Removed extra bracket
* Update DatabricksUtilities.scala
Corrected brackets
* Update SynapseTests.scala
Removed tab
* Update SynapseTests.scala
Removed whitespace on the end
* Corrected Style errors
* Corrected init script location
* Removed invalid esc char
Fixed style errors
* Removed Fine-tune
Suggested by Mark
* removed init script text
* Update core/src/test/scala/com/microsoft/azure/synapse/ml/nbtest/DatabricksUtilities.scala
Added "Fine-tune" again
Co-authored-by: Mark Hamilton <mhamilton723@gmail.com>
* Update DatabricksUtilities.scala
Added Fine-tune back
* Reverse style changes
* Corrected style
* Added GPUInitScript to createClusterInPool
Create cluster using init script
* Corrected to have a separate Rapids test
* Update DatabricksRapidsTests.scala
Corrected parameters
* Update DatabricksUtilities.scala
Added Rapids cluster name
* Update DatabricksRapidsTests.scala
Reduced number of nodes to 1
* Update DatabricksRapidsTests.scala
Fixed imports
---------
Co-authored-by: Alexander Spiridonov <b-aleksandrs@microsoft.com>
Co-authored-by: Mark Hamilton <mhamilton723@gmail.com>
Co-authored-by: Alexander <157773158+bvonodiripsa@users.noreply.github.com>
* Estimators for diff-in-diff, synthetic control and synthetic diff-in-diff
* add more params
* refactor
Signed-off-by: Jason Wang <jasonwang_83@hotmail.com>
* adding unit tests for linalg
* more unit tests
* Unit test for DiffInDiffEstimator
* more unit tests
* unit test for SyntheticControlEstimator
* unit test for SyntheticDiffInDiffEstimator
* logClass
* Python code gen
Signed-off-by: Jason Wang <jasonwang_83@hotmail.com>
* pyspark wrapper
* expose loss history
* fix bugs for synthetic control
* fix time effects for synthetic control estimator
* fix unit test
* add notebook
* fixing indexing logic
* add file headers
Signed-off-by: Jason Wang <jasonwang_83@hotmail.com>
* Add feature name to logClass call
* more scalastyle fixes
* More scalastyle and unit test fixes
* Python style fix
Signed-off-by: Jason Wang <jasonwang_83@hotmail.com>
* fix unit test
* fix more python style issue
* python style fix
* fix unit test
* Update core/src/main/scala/com/microsoft/azure/synapse/ml/causal/DiffInDiffEstimator.scala
Co-authored-by: Mark Hamilton <mhamilton723@gmail.com>
* Update core/src/main/scala/com/microsoft/azure/synapse/ml/causal/SyntheticControlEstimator.scala
Co-authored-by: Mark Hamilton <mhamilton723@gmail.com>
* Update core/src/main/scala/com/microsoft/azure/synapse/ml/causal/SyntheticDiffInDiffEstimator.scala
Co-authored-by: Mark Hamilton <mhamilton723@gmail.com>
* addressing comments
* extract some constants to findUnusedColumn
* Expose zeta as an optional parameter, also return the RMSE for unit weights and time weights fitting
* Replace constant TimeIdxCol and UnitIdxCol with findUnusedColumn
Signed-off-by: Jason Wang <jasonwang_83@hotmail.com>
* typo
* Adding notebook to sidebar
* fix bad merge
* address code review comments
* Update docs/Explore Algorithms/Causal Inference/Quickstart - Synthetic difference in differences.ipynb
Co-authored-by: Mark Hamilton <mhamilton723@gmail.com>
* clean synapse widget output state
* remove invalid image links
---------
Signed-off-by: Jason Wang <jasonwang_83@hotmail.com>
Co-authored-by: Mark Hamilton <mhamilton723@gmail.com>
* add vector column option
* add the vector option
* vector fields are added and code compiles, untested
* fix bug on checkparity when the index exists
* add FloatType to edm-spark type conversions
* fix synonymmap
* core functionality works
* add no nested field vector check
* add vector validation check
* modify vector columns behavior when column doesn't exist in df schema
* add another test
* clean up the unit test file
* add more tests
* add openai embedding pipeline test
* address comments
* address comments
* address comments
* update notebook
* change index name in notebook
* WIP: docs
* Update image interpretability notebook to use ONNX model
* Update link to sample notebook.
* Adding notebooks and model input/output methods to python wrapper.
* Remove unused code
* Update link in md
* Fixing E2E test; update notebook
* Performance improvement
* More perf improvement for String type tensor creation
* Update notebooks