Содержание

Generative AI

Hyperparameter optimization and model selection for generation
Autonomous AI

Predictive Modeling

Spark dataframe (preview in 1.2.0)
Multi-modal model
Multivariate time series forecasting
Improve efficiency for multi-tasking
How to decide value of time_budget
Decide how many labeled training examples are needed
Prediction quality
Imbalance
Feature Selection by FLAML
Support "groups" for catboost
Visualize feature importance, SHAP/LIME explanation, optimization history
Use early_stop_rounds
Search space for CatBoost
ONNX/ONNXML export
Fair AutoML

Generative AI

This is the biggest area of focus for the current development.

Hyperparameter optimization and model selection for generation

Support for GPT-4 and GPT-3.5 models is in preview in 1.2.0.

Autonomous AI

Build a higher-level framework to automate AI functions in solving complex tasks.

Predictive Modeling

Spark dataframe (preview in 1.2.0)

flaml.automl currently requires the data to be loaded in pandas dataframe or numpy. Supporting Spark dataframe will handle larger-than-memory datasets in flaml.automl. flaml.tune doesn't have limitation on the dataset size or format because it works with any user-defined training function.

Suggested in https://github.com/microsoft/FLAML/discussions/279. Suggestions about concrete use cases, estimators and hyperparameter search spaces are welcome.

Multivariate time series forecasting

Requested in https://github.com/microsoft/FLAML/issues/204. We currently support two types of multivariate time series forecasting.

Forecast a single time series with exogenous variables. Using statistics models or regressors.
Forecast multiple time series with panel datasets. Using deep neural networks.

Improve efficiency for multi-tasking

https://github.com/microsoft/FLAML/issues/277. The current solution is to fit multiple single-output models. It will be slow when the number of tasks is large. The same issue applies to time series forecasting when there are multiple time series in the data (differentiated by categorical columns). It is an important real problem for many organizations, including big corporations like Meta and Microsoft. Please reach out if you are interested in research or development for solving this issue.

How to decide value of time_budget

A recurring question is how to decide value of time_budget. For example,

Our current recommendation is at https://github.com/microsoft/FLAML/wiki/Time-budget. Any improvement on it will be beneficial to lots of users. It is a good research problem too. Please reach out if you'd like to contribute. A concrete idea to implement is https://github.com/microsoft/FLAML/issues/710.

Decide how many labeled training examples are needed

Asked in https://github.com/microsoft/FLAML/discussions/289#discussioncomment-1690433. Related research:

It will be a unique feature to integrate these or other techniques into FLAML. Help wanted from both researchers and developers.

Prediction quality

Requested in https://github.com/microsoft/FLAML/issues/214 and https://github.com/microsoft/FLAML/issues/355. It is a useful feature for customers such as Meta. Contributions from the community are appreciated.

A related discussion in https://github.com/microsoft/FLAML/discussions/351.

Imbalance

The current guidance to handle imbalanced data is at https://microsoft.github.io/FLAML/docs/FAQ#how-does-flaml-handle-imbalanced-data-unequal-distribution-of-target-classes-in-classification-task. Contributions on improving the performance in label imbalance are welcome. One example idea is from https://github.com/microsoft/FLAML/discussions/27: Throw a warning and let the user know about class imbalance before training. If imbalance is detected, wrap the classifiers with BalancedBaggingClassifier etc. to overcome imbalance.

Feature Selection by FLAML

https://github.com/microsoft/FLAML/issues/258. Help wanted.

Support "groups" for catboost

https://github.com/microsoft/FLAML/issues/304. Help wanted.

Visualize feature importance, SHAP/LIME explanation, optimization history

Though we have some partial solutions, there is room for improvement. Contributions from the community are appreciated.

Use early_stop_rounds

https://github.com/microsoft/FLAML/issues/172. We made some investigation about the effectiveness of using early_stop_rounds for lightgbm and xgboost. The results are inconclusive. Suggestions are welcome.

Search space for CatBoost

https://github.com/microsoft/FLAML/issues/144. Help wanted.

ONNX/ONNXML export

Help wanted.

Fair AutoML

Integrate Fair AutoML.