* add basic support to Spark dataframe
add support to SynapseML LightGBM model
update to pyspark>=3.2.0 to leverage pandas_on_Spark API
* clean code, add TODOs
* add sample_train_data for pyspark.pandas dataframe, fix bugs
* improve some functions, fix bugs
* fix dict change size during iteration
* update model predict
* update LightGBM model, update test
* update SynapseML LightGBM params
* update synapseML and tests
* update TODOs
* Added support to roc_auc for spark models
* Added support to score of spark estimator
* Added test for automl score of spark estimator
* Added cv support to pyspark.pandas dataframe
* Update test, fix bugs
* Added tests
* Updated docs, tests, added a notebook
* Fix bugs in non-spark env
* Fix bugs and improve tests
* Fix uninstall pyspark
* Fix tests error
* Fix java.lang.OutOfMemoryError: Java heap space
* Fix test_performance
* Update test_sparkml to test_0sparkml to use the expected spark conf
* Remove unnecessary widgets in notebook
* Fix iloc java.lang.StackOverflowError
* fix pre-commit
* Added params check for spark dataframes
* Refactor code for train_test_split to a function
* Update train_test_split_pyspark
* Refactor if-else, remove unnecessary code
* Remove y from predict, remove mem control from n_iter compute
* Update workflow
* Improve _split_pyspark
* Fix test failure of too short training time
* Fix typos, improve docstrings
* Fix index errors of pandas_on_spark, add spark loss metric
* Fix typo of ndcgAtK
* Update NDCG metrics and tests
* Remove unuseful logger
* Use cache and count to ensure consistent indexes
* refactor for merge maain
* fix errors of refactor
* Updated SparkLightGBMEstimator and cache
* Updated config2params
* Remove unused import
* Fix unknown parameters
* Update default_estimator_list
* Add unit tests for spark metrics
* merging
* clean commit
* Delete mylearner.py
This file is not needed.
* fix py4j import error
* more tolerant cancelling time
* fix problems following suggestions
* Update flaml/tune/spark/utils.py
Co-authored-by: Li Jiang <bnujli@gmail.com>
* remove redundant model
* Update test/spark/custom_mylearner.py
Co-authored-by: Chi Wang <wang.chi@microsoft.com>
* add docstr
* reverse change in gitignore
* Update test/spark/custom_mylearner.py
Co-authored-by: Chi Wang <wang.chi@microsoft.com>
---------
Co-authored-by: Li Jiang <bnujli@gmail.com>
Co-authored-by: Chi Wang <wang.chi@microsoft.com>
* pickle the AutoML object
* get best model per estimator
* test deberta
* stateless API
* Add Gitter badge (#41)
* prevent divide by zero
* test roberta
* BlendSearchTuner
Co-authored-by: Chi Wang (MSR) <chiw@microsoft.com>
Co-authored-by: The Gitter Badger <badger@gitter.im>
* v0.2.2
separate the HPO part into the module flaml.tune
enhanced implementation of FLOW^2, CFO and BlendSearch
support parallel tuning using ray tune
add support for sample_weight and generic fit arguments
enable mlflow logging
Co-authored-by: Chi Wang (MSR) <chiw@microsoft.com>
Co-authored-by: qingyun-wu <qw2ky@virginia.edu>