precommit: introduce `mdformat` (#1276)

* precommit: introduce `mdformat` * precommit: apply
2024-03-19 23:46:56 +01:00 · 2024-03-19 23:46:56 +01:00 · 165d7467f9
--- a/.github/PULL_REQUEST_TEMPLATE.md
+++ b/.github/PULL_REQUEST_TEMPLATE.md
@ -13,6 +13,7 @@
 ## Checks

 <!-- - I've used [pre-commit](https://microsoft.github.io/FLAML/docs/Contribute#pre-commit) to lint the changes in this PR (note the same in integrated in our CI checks). -->
+
 - [ ] I've included any doc changes needed for https://microsoft.github.io/FLAML/. See https://microsoft.github.io/FLAML/docs/Contribute#documentation to build and test documentation locally.
 - [ ] I've added tests (if relevant) corresponding to the changes introduced in this PR.
 - [ ] I've made sure all auto checks have passed.
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@ -22,10 +22,21 @@ repos:
    - id: trailing-whitespace
    - id: end-of-file-fixer
    - id: no-commit-to-branch
+
  - repo: https://github.com/psf/black
    rev: 23.3.0
    hooks:
    - id: black
+
+  - repo: https://github.com/executablebooks/mdformat
+    rev: 0.7.17
+    hooks:
+      - id: mdformat
+        additional_dependencies:
+          - mdformat-gfm
+          - mdformat-black
+          - mdformat_frontmatter
+
  - repo: https://github.com/charliermarsh/ruff-pre-commit
    rev: v0.0.261
    hooks:
--- a/NOTICE.md
+++ b/NOTICE.md
@ -1,221 +1,222 @@
-NOTICES
+# NOTICES

 This repository incorporates material as listed below or described in the code.

-#
 ## Component. Ray.

-Code in tune/[analysis.py, sample.py, trial.py, result.py],
-searcher/[suggestion.py, variant_generator.py], and scheduler/trial_scheduler.py is adapted from
+Code in tune/\[analysis.py, sample.py, trial.py, result.py\],
+searcher/\[suggestion.py, variant_generator.py\], and scheduler/trial_scheduler.py is adapted from
 https://github.com/ray-project/ray/blob/master/python/ray/tune/

-
-
 ## Open Source License/Copyright Notice.

- Apache License
-                           Version 2.0, January 2004
-                        http://www.apache.org/licenses/
+Apache License
+Version 2.0, January 2004
+http://www.apache.org/licenses/

-   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION

-   1. Definitions.
+1. Definitions.

-      "License" shall mean the terms and conditions for use, reproduction,
-      and distribution as defined by Sections 1 through 9 of this document.
+   "License" shall mean the terms and conditions for use, reproduction,
+   and distribution as defined by Sections 1 through 9 of this document.

-      "Licensor" shall mean the copyright owner or entity authorized by
-      the copyright owner that is granting the License.
+   "Licensor" shall mean the copyright owner or entity authorized by
+   the copyright owner that is granting the License.

-      "Legal Entity" shall mean the union of the acting entity and all
-      other entities that control, are controlled by, or are under common
-      control with that entity. For the purposes of this definition,
-      "control" means (i) the power, direct or indirect, to cause the
-      direction or management of such entity, whether by contract or
-      otherwise, or (ii) ownership of fifty percent (50%) or more of the
-      outstanding shares, or (iii) beneficial ownership of such entity.
+   "Legal Entity" shall mean the union of the acting entity and all
+   other entities that control, are controlled by, or are under common
+   control with that entity. For the purposes of this definition,
+   "control" means (i) the power, direct or indirect, to cause the
+   direction or management of such entity, whether by contract or
+   otherwise, or (ii) ownership of fifty percent (50%) or more of the
+   outstanding shares, or (iii) beneficial ownership of such entity.

-      "You" (or "Your") shall mean an individual or Legal Entity
-      exercising permissions granted by this License.
+   "You" (or "Your") shall mean an individual or Legal Entity
+   exercising permissions granted by this License.

-      "Source" form shall mean the preferred form for making modifications,
-      including but not limited to software source code, documentation
-      source, and configuration files.
+   "Source" form shall mean the preferred form for making modifications,
+   including but not limited to software source code, documentation
+   source, and configuration files.

-      "Object" form shall mean any form resulting from mechanical
-      transformation or translation of a Source form, including but
-      not limited to compiled object code, generated documentation,
-      and conversions to other media types.
+   "Object" form shall mean any form resulting from mechanical
+   transformation or translation of a Source form, including but
+   not limited to compiled object code, generated documentation,
+   and conversions to other media types.

-      "Work" shall mean the work of authorship, whether in Source or
-      Object form, made available under the License, as indicated by a
-      copyright notice that is included in or attached to the work
-      (an example is provided in the Appendix below).
+   "Work" shall mean the work of authorship, whether in Source or
+   Object form, made available under the License, as indicated by a
+   copyright notice that is included in or attached to the work
+   (an example is provided in the Appendix below).

-      "Derivative Works" shall mean any work, whether in Source or Object
-      form, that is based on (or derived from) the Work and for which the
-      editorial revisions, annotations, elaborations, or other modifications
-      represent, as a whole, an original work of authorship. For the purposes
-      of this License, Derivative Works shall not include works that remain
-      separable from, or merely link (or bind by name) to the interfaces of,
-      the Work and Derivative Works thereof.
+   "Derivative Works" shall mean any work, whether in Source or Object
+   form, that is based on (or derived from) the Work and for which the
+   editorial revisions, annotations, elaborations, or other modifications
+   represent, as a whole, an original work of authorship. For the purposes
+   of this License, Derivative Works shall not include works that remain
+   separable from, or merely link (or bind by name) to the interfaces of,
+   the Work and Derivative Works thereof.

-      "Contribution" shall mean any work of authorship, including
-      the original version of the Work and any modifications or additions
-      to that Work or Derivative Works thereof, that is intentionally
-      submitted to Licensor for inclusion in the Work by the copyright owner
-      or by an individual or Legal Entity authorized to submit on behalf of
-      the copyright owner. For the purposes of this definition, "submitted"
-      means any form of electronic, verbal, or written communication sent
-      to the Licensor or its representatives, including but not limited to
-      communication on electronic mailing lists, source code control systems,
-      and issue tracking systems that are managed by, or on behalf of, the
-      Licensor for the purpose of discussing and improving the Work, but
-      excluding communication that is conspicuously marked or otherwise
-      designated in writing by the copyright owner as "Not a Contribution."
+   "Contribution" shall mean any work of authorship, including
+   the original version of the Work and any modifications or additions
+   to that Work or Derivative Works thereof, that is intentionally
+   submitted to Licensor for inclusion in the Work by the copyright owner
+   or by an individual or Legal Entity authorized to submit on behalf of
+   the copyright owner. For the purposes of this definition, "submitted"
+   means any form of electronic, verbal, or written communication sent
+   to the Licensor or its representatives, including but not limited to
+   communication on electronic mailing lists, source code control systems,
+   and issue tracking systems that are managed by, or on behalf of, the
+   Licensor for the purpose of discussing and improving the Work, but
+   excluding communication that is conspicuously marked or otherwise
+   designated in writing by the copyright owner as "Not a Contribution."

-      "Contributor" shall mean Licensor and any individual or Legal Entity
-      on behalf of whom a Contribution has been received by Licensor and
-      subsequently incorporated within the Work.
+   "Contributor" shall mean Licensor and any individual or Legal Entity
+   on behalf of whom a Contribution has been received by Licensor and
+   subsequently incorporated within the Work.

-   2. Grant of Copyright License. Subject to the terms and conditions of
-      this License, each Contributor hereby grants to You a perpetual,
-      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
-      copyright license to reproduce, prepare Derivative Works of,
-      publicly display, publicly perform, sublicense, and distribute the
-      Work and such Derivative Works in Source or Object form.
+1. Grant of Copyright License. Subject to the terms and conditions of
+   this License, each Contributor hereby grants to You a perpetual,
+   worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+   copyright license to reproduce, prepare Derivative Works of,
+   publicly display, publicly perform, sublicense, and distribute the
+   Work and such Derivative Works in Source or Object form.

-   3. Grant of Patent License. Subject to the terms and conditions of
-      this License, each Contributor hereby grants to You a perpetual,
-      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
-      (except as stated in this section) patent license to make, have made,
-      use, offer to sell, sell, import, and otherwise transfer the Work,
-      where such license applies only to those patent claims licensable
-      by such Contributor that are necessarily infringed by their
-      Contribution(s) alone or by combination of their Contribution(s)
-      with the Work to which such Contribution(s) was submitted. If You
-      institute patent litigation against any entity (including a
-      cross-claim or counterclaim in a lawsuit) alleging that the Work
-      or a Contribution incorporated within the Work constitutes direct
-      or contributory patent infringement, then any patent licenses
-      granted to You under this License for that Work shall terminate
-      as of the date such litigation is filed.
+1. Grant of Patent License. Subject to the terms and conditions of
+   this License, each Contributor hereby grants to You a perpetual,
+   worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+   (except as stated in this section) patent license to make, have made,
+   use, offer to sell, sell, import, and otherwise transfer the Work,
+   where such license applies only to those patent claims licensable
+   by such Contributor that are necessarily infringed by their
+   Contribution(s) alone or by combination of their Contribution(s)
+   with the Work to which such Contribution(s) was submitted. If You
+   institute patent litigation against any entity (including a
+   cross-claim or counterclaim in a lawsuit) alleging that the Work
+   or a Contribution incorporated within the Work constitutes direct
+   or contributory patent infringement, then any patent licenses
+   granted to You under this License for that Work shall terminate
+   as of the date such litigation is filed.

-   4. Redistribution. You may reproduce and distribute copies of the
-      Work or Derivative Works thereof in any medium, with or without
-      modifications, and in Source or Object form, provided that You
-      meet the following conditions:
+1. Redistribution. You may reproduce and distribute copies of the
+   Work or Derivative Works thereof in any medium, with or without
+   modifications, and in Source or Object form, provided that You
+   meet the following conditions:

-      (a) You must give any other recipients of the Work or
-          Derivative Works a copy of this License; and
+   (a) You must give any other recipients of the Work or
+   Derivative Works a copy of this License; and

-      (b) You must cause any modified files to carry prominent notices
-          stating that You changed the files; and
+   (b) You must cause any modified files to carry prominent notices
+   stating that You changed the files; and

-      (c) You must retain, in the Source form of any Derivative Works
-          that You distribute, all copyright, patent, trademark, and
-          attribution notices from the Source form of the Work,
-          excluding those notices that do not pertain to any part of
-          the Derivative Works; and
+   (c) You must retain, in the Source form of any Derivative Works
+   that You distribute, all copyright, patent, trademark, and
+   attribution notices from the Source form of the Work,
+   excluding those notices that do not pertain to any part of
+   the Derivative Works; and

-      (d) If the Work includes a "NOTICE" text file as part of its
-          distribution, then any Derivative Works that You distribute must
-          include a readable copy of the attribution notices contained
-          within such NOTICE file, excluding those notices that do not
-          pertain to any part of the Derivative Works, in at least one
-          of the following places: within a NOTICE text file distributed
-          as part of the Derivative Works; within the Source form or
-          documentation, if provided along with the Derivative Works; or,
-          within a display generated by the Derivative Works, if and
-          wherever such third-party notices normally appear. The contents
-          of the NOTICE file are for informational purposes only and
-          do not modify the License. You may add Your own attribution
-          notices within Derivative Works that You distribute, alongside
-          or as an addendum to the NOTICE text from the Work, provided
-          that such additional attribution notices cannot be construed
-          as modifying the License.
+   (d) If the Work includes a "NOTICE" text file as part of its
+   distribution, then any Derivative Works that You distribute must
+   include a readable copy of the attribution notices contained
+   within such NOTICE file, excluding those notices that do not
+   pertain to any part of the Derivative Works, in at least one
+   of the following places: within a NOTICE text file distributed
+   as part of the Derivative Works; within the Source form or
+   documentation, if provided along with the Derivative Works; or,
+   within a display generated by the Derivative Works, if and
+   wherever such third-party notices normally appear. The contents
+   of the NOTICE file are for informational purposes only and
+   do not modify the License. You may add Your own attribution
+   notices within Derivative Works that You distribute, alongside
+   or as an addendum to the NOTICE text from the Work, provided
+   that such additional attribution notices cannot be construed
+   as modifying the License.

-      You may add Your own copyright statement to Your modifications and
-      may provide additional or different license terms and conditions
-      for use, reproduction, or distribution of Your modifications, or
-      for any such Derivative Works as a whole, provided Your use,
-      reproduction, and distribution of the Work otherwise complies with
-      the conditions stated in this License.
+   You may add Your own copyright statement to Your modifications and
+   may provide additional or different license terms and conditions
+   for use, reproduction, or distribution of Your modifications, or
+   for any such Derivative Works as a whole, provided Your use,
+   reproduction, and distribution of the Work otherwise complies with
+   the conditions stated in this License.

-   5. Submission of Contributions. Unless You explicitly state otherwise,
-      any Contribution intentionally submitted for inclusion in the Work
-      by You to the Licensor shall be under the terms and conditions of
-      this License, without any additional terms or conditions.
-      Notwithstanding the above, nothing herein shall supersede or modify
-      the terms of any separate license agreement you may have executed
-      with Licensor regarding such Contributions.
+1. Submission of Contributions. Unless You explicitly state otherwise,
+   any Contribution intentionally submitted for inclusion in the Work
+   by You to the Licensor shall be under the terms and conditions of
+   this License, without any additional terms or conditions.
+   Notwithstanding the above, nothing herein shall supersede or modify
+   the terms of any separate license agreement you may have executed
+   with Licensor regarding such Contributions.

-   6. Trademarks. This License does not grant permission to use the trade
-      names, trademarks, service marks, or product names of the Licensor,
-      except as required for reasonable and customary use in describing the
-      origin of the Work and reproducing the content of the NOTICE file.
+1. Trademarks. This License does not grant permission to use the trade
+   names, trademarks, service marks, or product names of the Licensor,
+   except as required for reasonable and customary use in describing the
+   origin of the Work and reproducing the content of the NOTICE file.

-   7. Disclaimer of Warranty. Unless required by applicable law or
-      agreed to in writing, Licensor provides the Work (and each
-      Contributor provides its Contributions) on an "AS IS" BASIS,
-      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
-      implied, including, without limitation, any warranties or conditions
-      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
-      PARTICULAR PURPOSE. You are solely responsible for determining the
-      appropriateness of using or redistributing the Work and assume any
-      risks associated with Your exercise of permissions under this License.
+1. Disclaimer of Warranty. Unless required by applicable law or
+   agreed to in writing, Licensor provides the Work (and each
+   Contributor provides its Contributions) on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+   implied, including, without limitation, any warranties or conditions
+   of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+   PARTICULAR PURPOSE. You are solely responsible for determining the
+   appropriateness of using or redistributing the Work and assume any
+   risks associated with Your exercise of permissions under this License.

-   8. Limitation of Liability. In no event and under no legal theory,
-      whether in tort (including negligence), contract, or otherwise,
-      unless required by applicable law (such as deliberate and grossly
-      negligent acts) or agreed to in writing, shall any Contributor be
-      liable to You for damages, including any direct, indirect, special,
-      incidental, or consequential damages of any character arising as a
-      result of this License or out of the use or inability to use the
-      Work (including but not limited to damages for loss of goodwill,
-      work stoppage, computer failure or malfunction, or any and all
-      other commercial damages or losses), even if such Contributor
-      has been advised of the possibility of such damages.
+1. Limitation of Liability. In no event and under no legal theory,
+   whether in tort (including negligence), contract, or otherwise,
+   unless required by applicable law (such as deliberate and grossly
+   negligent acts) or agreed to in writing, shall any Contributor be
+   liable to You for damages, including any direct, indirect, special,
+   incidental, or consequential damages of any character arising as a
+   result of this License or out of the use or inability to use the
+   Work (including but not limited to damages for loss of goodwill,
+   work stoppage, computer failure or malfunction, or any and all
+   other commercial damages or losses), even if such Contributor
+   has been advised of the possibility of such damages.

-   9. Accepting Warranty or Additional Liability. While redistributing
-      the Work or Derivative Works thereof, You may choose to offer,
-      and charge a fee for, acceptance of support, warranty, indemnity,
-      or other liability obligations and/or rights consistent with this
-      License. However, in accepting such obligations, You may act only
-      on Your own behalf and on Your sole responsibility, not on behalf
-      of any other Contributor, and only if You agree to indemnify,
-      defend, and hold each Contributor harmless for any liability
-      incurred by, or claims asserted against, such Contributor by reason
-      of your accepting any such warranty or additional liability.
+1. Accepting Warranty or Additional Liability. While redistributing
+   the Work or Derivative Works thereof, You may choose to offer,
+   and charge a fee for, acceptance of support, warranty, indemnity,
+   or other liability obligations and/or rights consistent with this
+   License. However, in accepting such obligations, You may act only
+   on Your own behalf and on Your sole responsibility, not on behalf
+   of any other Contributor, and only if You agree to indemnify,
+   defend, and hold each Contributor harmless for any liability
+   incurred by, or claims asserted against, such Contributor by reason
+   of your accepting any such warranty or additional liability.

-   END OF TERMS AND CONDITIONS
+END OF TERMS AND CONDITIONS

-   APPENDIX: How to apply the Apache License to your work.
+APPENDIX: How to apply the Apache License to your work.

-      To apply the Apache License to your work, attach the following
-      boilerplate notice, with the fields enclosed by brackets "{}"
-      replaced with your own identifying information. (Don't include
-      the brackets!)  The text should be enclosed in the appropriate
-      comment syntax for the file format. We also recommend that a
-      file or class name and description of purpose be included on the
-      same "printed page" as the copyright notice for easier
-      identification within third-party archives.
+```
+  To apply the Apache License to your work, attach the following
+  boilerplate notice, with the fields enclosed by brackets "{}"
+  replaced with your own identifying information. (Don't include
+  the brackets!)  The text should be enclosed in the appropriate
+  comment syntax for the file format. We also recommend that a
+  file or class name and description of purpose be included on the
+  same "printed page" as the copyright notice for easier
+  identification within third-party archives.
+```

-   Copyright {yyyy} {name of copyright owner}
+Copyright {yyyy} {name of copyright owner}

-   Licensed under the Apache License, Version 2.0 (the "License");
-   you may not use this file except in compliance with the License.
-   You may obtain a copy of the License at
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at

-       http://www.apache.org/licenses/LICENSE-2.0
+```
+   http://www.apache.org/licenses/LICENSE-2.0
+```

-   Unless required by applicable law or agreed to in writing, software
-   distributed under the License is distributed on an "AS IS" BASIS,
-   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-   See the License for the specific language governing permissions and
-   limitations under the License.
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.

--------------------------------------------------------------------------------
+______________________________________________________________________

 Code in python/ray/rllib/{evolution_strategies, dqn} adapted from
 https://github.com/openai (MIT License)
@ -240,7 +241,7 @@ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
 THE SOFTWARE.

--------------------------------------------------------------------------------
+______________________________________________________________________

 Code in python/ray/rllib/impala/vtrace.py from
 https://github.com/deepmind/scalable_agent
@ -251,7 +252,9 @@ Licensed under the Apache License, Version 2.0 (the "License");
 you may not use this file except in compliance with the License.
 You may obtain a copy of the License at

-    https://www.apache.org/licenses/LICENSE-2.0
+```
+https://www.apache.org/licenses/LICENSE-2.0
+```

 Unless required by applicable law or agreed to in writing, software
 distributed under the License is distributed on an "AS IS" BASIS,
@ -259,7 +262,8 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License.

--------------------------------------------------------------------------------
+______________________________________________________________________
+
 Code in python/ray/rllib/ars is adapted from https://github.com/modestyachts/ARS

 Copyright (c) 2018, ARS contributors (Horia Mania, Aurelia Guy, Benjamin Recht)
@ -269,11 +273,11 @@ Redistribution and use of ARS in source and binary forms, with or without
 modification, are permitted provided that the following conditions are met:

 1. Redistributions of source code must retain the above copyright notice, this
-list of conditions and the following disclaimer.
+   list of conditions and the following disclaimer.

-2. Redistributions in binary form must reproduce the above copyright notice,
-this list of conditions and the following disclaimer in the documentation and/or
-other materials provided with the distribution.
+1. Redistributions in binary form must reproduce the above copyright notice,
+   this list of conditions and the following disclaimer in the documentation and/or
+   other materials provided with the distribution.

 THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
 ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
@ -286,5 +290,6 @@ ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
 (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
 SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

------------------
-Code in python/ray/_private/prometheus_exporter.py is adapted from https://github.com/census-instrumentation/opencensus-python/blob/master/contrib/opencensus-ext-prometheus/opencensus/ext/prometheus/stats_exporter/__init__.py
+______________________________________________________________________
+
+Code in python/ray/\_private/prometheus_exporter.py is adapted from https://github.com/census-instrumentation/opencensus-python/blob/master/contrib/opencensus-ext-prometheus/opencensus/ext/prometheus/stats_exporter/__init__.py
--- a/README.md
+++ b/README.md
@ -4,8 +4,8 @@
 ![Python Version](https://img.shields.io/badge/3.8%20%7C%203.9%20%7C%203.10-blue)
 [![Downloads](https://pepy.tech/badge/flaml)](https://pepy.tech/project/flaml)
 [![](https://img.shields.io/discord/1025786666260111483?logo=discord&style=flat)](https://discord.gg/Cppx2vSPVP)
-<!-- [![Join the chat at https://gitter.im/FLAMLer/community](https://badges.gitter.im/FLAMLer/community.svg)](https://gitter.im/FLAMLer/community?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge) -->

+<!-- [![Join the chat at https://gitter.im/FLAMLer/community](https://badges.gitter.im/FLAMLer/community.svg)](https://gitter.im/FLAMLer/community?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge) -->

 # A Fast Library for Automated Machine Learning & Tuning

@ -24,15 +24,15 @@

 :fire: FLAML supports Code-First AutoML & Tuning – Private Preview in [Microsoft Fabric Data Science](https://learn.microsoft.com/en-us/fabric/data-science/).

-
 ## What is FLAML
+
 FLAML is a lightweight Python library for efficient automation of machine
 learning and AI operations. It automates workflow based on large language models, machine learning models, etc.
 and optimizes their performance.

-* FLAML enables building next-gen GPT-X applications based on multi-agent conversations with minimal effort. It simplifies the orchestration, automation and optimization of a complex GPT-X workflow. It maximizes the performance of GPT-X models and augments their weakness.
-* For common machine learning tasks like classification and regression, it quickly finds quality models for user-provided data with low computational resources. It is easy to customize or extend. Users can find their desired customizability from a smooth range.
-* It supports fast and economical automatic tuning (e.g., inference hyperparameters for foundation models, configurations in MLOps/LMOps workflows, pipelines, mathematical/statistical models, algorithms, computing experiments, software configurations), capable of handling large search space with heterogeneous evaluation cost and complex constraints/guidance/early stopping.
+- FLAML enables building next-gen GPT-X applications based on multi-agent conversations with minimal effort. It simplifies the orchestration, automation and optimization of a complex GPT-X workflow. It maximizes the performance of GPT-X models and augments their weakness.
+- For common machine learning tasks like classification and regression, it quickly finds quality models for user-provided data with low computational resources. It is easy to customize or extend. Users can find their desired customizability from a smooth range.
+- It supports fast and economical automatic tuning (e.g., inference hyperparameters for foundation models, configurations in MLOps/LMOps workflows, pipelines, mathematical/statistical models, algorithms, computing experiments, software configurations), capable of handling large search space with heterogeneous evaluation cost and complex constraints/guidance/early stopping.

 FLAML is powered by a series of [research studies](https://microsoft.github.io/FLAML/docs/Research/) from Microsoft Research and collaborators such as Penn State University, Stevens Institute of Technology, University of Washington, and University of Waterloo.

@ -47,6 +47,7 @@ pip install flaml
 ```

 Minimal dependencies are installed without extra options. You can install extra options based on the feature you need. For example, use the following to install the dependencies needed by the [`autogen`](https://microsoft.github.io/autogen/) package.
+
 ```bash
 pip install "flaml[autogen]"
 ```
@ -56,18 +57,24 @@ Each of the [`notebook examples`](https://github.com/microsoft/FLAML/tree/main/n

 ## Quickstart

-* (New) The [autogen](https://microsoft.github.io/autogen/) package enables the next-gen GPT-X applications with a generic multi-agent conversation framework.
-It offers customizable and conversable agents which integrate LLMs, tools and human.
-By automating chat among multiple capable agents, one can easily make them collectively perform tasks autonomously or with human feedback, including tasks that require using tools via code. For example,
+- (New) The [autogen](https://microsoft.github.io/autogen/) package enables the next-gen GPT-X applications with a generic multi-agent conversation framework.
+  It offers customizable and conversable agents which integrate LLMs, tools and human.
+  By automating chat among multiple capable agents, one can easily make them collectively perform tasks autonomously or with human feedback, including tasks that require using tools via code. For example,
+
 ```python
 from flaml import autogen
+
 assistant = autogen.AssistantAgent("assistant")
 user_proxy = autogen.UserProxyAgent("user_proxy")
-user_proxy.initiate_chat(assistant, message="Show me the YTD gain of 10 largest technology companies as of today.")
+user_proxy.initiate_chat(
+    assistant,
+    message="Show me the YTD gain of 10 largest technology companies as of today.",
+)
 # This initiates an automated chat between the two agents to solve the task
 ```

 Autogen also helps maximize the utility out of the expensive LLMs such as ChatGPT and GPT-4. It offers a drop-in replacement of `openai.Completion` or `openai.ChatCompletion` with powerful functionalites like tuning, caching, templating, filtering. For example, you can optimize generations by LLM with your own tuning data, success metrics and budgets.
+
 ```python
 # perform tuning
 config, analysis = autogen.Completion.tune(
@ -82,30 +89,32 @@ config, analysis = autogen.Completion.tune(
 # perform inference for a test instance
 response = autogen.Completion.create(context=test_instance, **config)
 ```
-* With three lines of code, you can start using this economical and fast
-AutoML engine as a [scikit-learn style estimator](https://microsoft.github.io/FLAML/docs/Use-Cases/Task-Oriented-AutoML).
+
+- With three lines of code, you can start using this economical and fast
+  AutoML engine as a [scikit-learn style estimator](https://microsoft.github.io/FLAML/docs/Use-Cases/Task-Oriented-AutoML).

 ```python
 from flaml import AutoML
+
 automl = AutoML()
 automl.fit(X_train, y_train, task="classification")
 ```

-* You can restrict the learners and use FLAML as a fast hyperparameter tuning
-tool for XGBoost, LightGBM, Random Forest etc. or a [customized learner](https://microsoft.github.io/FLAML/docs/Use-Cases/Task-Oriented-AutoML#estimator-and-search-space).
+- You can restrict the learners and use FLAML as a fast hyperparameter tuning
+  tool for XGBoost, LightGBM, Random Forest etc. or a [customized learner](https://microsoft.github.io/FLAML/docs/Use-Cases/Task-Oriented-AutoML#estimator-and-search-space).

 ```python
 automl.fit(X_train, y_train, task="classification", estimator_list=["lgbm"])
 ```

-* You can also run generic hyperparameter tuning for a [custom function](https://microsoft.github.io/FLAML/docs/Use-Cases/Tune-User-Defined-Function).
+- You can also run generic hyperparameter tuning for a [custom function](https://microsoft.github.io/FLAML/docs/Use-Cases/Tune-User-Defined-Function).

 ```python
 from flaml import tune
 tune.run(evaluation_function, config={…}, low_cost_partial_config={…}, time_budget_s=3600)
 ```

-* [Zero-shot AutoML](https://microsoft.github.io/FLAML/docs/Use-Cases/Zero-Shot-AutoML) allows using the existing training API from lightgbm, xgboost etc. while getting the benefit of AutoML in choosing high-performance hyperparameter configurations per task.
+- [Zero-shot AutoML](https://microsoft.github.io/FLAML/docs/Use-Cases/Zero-Shot-AutoML) allows using the existing training API from lightgbm, xgboost etc. while getting the benefit of AutoML in choosing high-performance hyperparameter configurations per task.

 ```python
 from flaml.default import LGBMRegressor
--- a/SECURITY.md
+++ b/SECURITY.md
@ -4,7 +4,7 @@

 Microsoft takes the security of our software products and services seriously, which includes all source code repositories managed through our GitHub organizations, which include [Microsoft](https://github.com/Microsoft), [Azure](https://github.com/Azure), [DotNet](https://github.com/dotnet), [AspNet](https://github.com/aspnet), [Xamarin](https://github.com/xamarin), and [our GitHub organizations](https://opensource.microsoft.com/).

-If you believe you have found a security vulnerability in any Microsoft-owned repository that meets [Microsoft's definition of a security vulnerability](https://docs.microsoft.com/en-us/previous-versions/tn-archive/cc751383(v=technet.10)), please report it to us as described below.
+If you believe you have found a security vulnerability in any Microsoft-owned repository that meets [Microsoft's definition of a security vulnerability](<https://docs.microsoft.com/en-us/previous-versions/tn-archive/cc751383(v=technet.10)>), please report it to us as described below.

 ## Reporting Security Issues

@ -18,13 +18,13 @@ You should receive a response within 24 hours. If for some reason you do not, pl

 Please include the requested information listed below (as much as you can provide) to help us better understand the nature and scope of the possible issue:

-  * Type of issue (e.g. buffer overflow, SQL injection, cross-site scripting, etc.)
-  * Full paths of source file(s) related to the manifestation of the issue
-  * The location of the affected source code (tag/branch/commit or direct URL)
-  * Any special configuration required to reproduce the issue
-  * Step-by-step instructions to reproduce the issue
-  * Proof-of-concept or exploit code (if possible)
-  * Impact of the issue, including how an attacker might exploit the issue
+- Type of issue (e.g. buffer overflow, SQL injection, cross-site scripting, etc.)
+- Full paths of source file(s) related to the manifestation of the issue
+- The location of the affected source code (tag/branch/commit or direct URL)
+- Any special configuration required to reproduce the issue
+- Step-by-step instructions to reproduce the issue
+- Proof-of-concept or exploit code (if possible)
+- Impact of the issue, including how an attacker might exploit the issue

 This information will help us triage your report more quickly.

--- a/flaml/automl/nlp/README.md
+++ b/flaml/automl/nlp/README.md
@ -4,16 +4,15 @@ This directory contains utility functions used by AutoNLP. Currently we support

 Please refer to this [link](https://microsoft.github.io/FLAML/docs/Examples/AutoML-NLP) for examples.

-
 # Troubleshooting fine-tuning HPO for pre-trained language models

 The frequent updates of transformers may lead to fluctuations in the results of tuning. To help users quickly troubleshoot the result of AutoNLP when a tuning failure occurs (e.g., failing to reproduce previous results), we have provided the following jupyter notebook:

-* [Troubleshooting HPO for fine-tuning pre-trained language models](https://github.com/microsoft/FLAML/blob/main/notebook/research/acl2021.ipynb)
+- [Troubleshooting HPO for fine-tuning pre-trained language models](https://github.com/microsoft/FLAML/blob/main/notebook/research/acl2021.ipynb)

 Our findings on troubleshooting fine-tuning the Electra and RoBERTa model for the GLUE dataset can be seen in the following paper published in ACL 2021:

-* [An Empirical Study on Hyperparameter Optimization for Fine-Tuning Pre-trained Language Models](https://arxiv.org/abs/2106.09204). Xueqing Liu, Chi Wang. ACL-IJCNLP 2021.
+- [An Empirical Study on Hyperparameter Optimization for Fine-Tuning Pre-trained Language Models](https://arxiv.org/abs/2106.09204). Xueqing Liu, Chi Wang. ACL-IJCNLP 2021.

 ```bibtex
@inproceedings{liu2021hpo,
--- a/flaml/default/README.md
+++ b/flaml/default/README.md
@ -14,7 +14,6 @@ estimator.fit(X_train, y_train)
 estimator.predict(X_test, y_test)
 ```

-
 1. Use AutoML.fit(). set `starting_points="data"` and `max_iter=0`.

 ```python
@ -36,10 +35,17 @@ automl.fit(X_train, y_train, **automl_settings)
 from flaml.default import preprocess_and_suggest_hyperparams

 X, y = load_iris(return_X_y=True, as_frame=True)
-X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
-hyperparams, estimator_class, X_transformed, y_transformed, feature_transformer, label_transformer = preprocess_and_suggest_hyperparams(
-    "classification", X_train, y_train, "lgbm"
+X_train, X_test, y_train, y_test = train_test_split(
+    X, y, test_size=0.33, random_state=42
 )
+(
+    hyperparams,
+    estimator_class,
+    X_transformed,
+    y_transformed,
+    feature_transformer,
+    label_transformer,
+) = preprocess_and_suggest_hyperparams("classification", X_train, y_train, "lgbm")
 model = estimator_class(**hyperparams)  # estimator_class is LGBMClassifier
 model.fit(X_transformed, y_train)  # LGBMClassifier can handle raw labels
 X_test = feature_transformer.transform(X_test)  # preprocess test data
@ -172,7 +178,7 @@ Change "binary" into "multiclass" or "regression" for the other tasks.

 For more technical details, please check our research paper.

-* [Mining Robust Default Configurations for Resource-constrained AutoML](https://arxiv.org/abs/2202.09927). Moe Kayali, Chi Wang. arXiv preprint arXiv:2202.09927 (2022).
+- [Mining Robust Default Configurations for Resource-constrained AutoML](https://arxiv.org/abs/2202.09927). Moe Kayali, Chi Wang. arXiv preprint arXiv:2202.09927 (2022).

 ```bibtex
@article{Kayali2022default,
--- a/flaml/onlineml/README.md
+++ b/flaml/onlineml/README.md
@ -4,7 +4,8 @@ FLAML includes *ChaCha* which is an automatic hyperparameter tuning solution for

 For more technical details about *ChaCha*, please check our paper.

-* [ChaCha for Online AutoML](https://www.microsoft.com/en-us/research/publication/chacha-for-online-automl/). Qingyun Wu, Chi Wang, John Langford, Paul Mineiro and Marco Rossi. ICML 2021.
+- [ChaCha for Online AutoML](https://www.microsoft.com/en-us/research/publication/chacha-for-online-automl/). Qingyun Wu, Chi Wang, John Langford, Paul Mineiro and Marco Rossi. ICML 2021.
+
 ```
@inproceedings{wu2021chacha,
    title={ChaCha for online AutoML},
@ -23,8 +24,9 @@ An example of online namespace interactions tuning in VW:
 ```python
 # require: pip install flaml[vw]
 from flaml import AutoVW
-'''create an AutoVW instance for tuning namespace interactions'''
-autovw = AutoVW(max_live_model_num=5, search_space={'interactions': AutoVW.AUTOMATIC})
+
+"""create an AutoVW instance for tuning namespace interactions"""
+autovw = AutoVW(max_live_model_num=5, search_space={"interactions": AutoVW.AUTOMATIC})
 ```

 An example of online tuning of both namespace interactions and learning rate in VW:
@ -33,12 +35,18 @@ An example of online tuning of both namespace interactions and learning rate in
 # require: pip install flaml[vw]
 from flaml import AutoVW
 from flaml.tune import loguniform
-''' create an AutoVW instance for tuning namespace interactions and learning rate'''
+
+""" create an AutoVW instance for tuning namespace interactions and learning rate"""
 # set up the search space and init config
-search_space_nilr = {'interactions': AutoVW.AUTOMATIC, 'learning_rate': loguniform(lower=2e-10, upper=1.0)}
-init_config_nilr = {'interactions': set(), 'learning_rate': 0.5}
+search_space_nilr = {
+    "interactions": AutoVW.AUTOMATIC,
+    "learning_rate": loguniform(lower=2e-10, upper=1.0),
+}
+init_config_nilr = {"interactions": set(), "learning_rate": 0.5}
 # create an AutoVW instance
-autovw = AutoVW(max_live_model_num=5, search_space=search_space_nilr, init_config=init_config_nilr)
+autovw = AutoVW(
+    max_live_model_num=5, search_space=search_space_nilr, init_config=init_config_nilr
+)
 ```

 A user can use the resulting AutoVW instances `autovw` in a similar way to a vanilla Vowpal Wabbit instance, i.e., `pyvw.vw`, to perform online learning by iteratively calling its `predict(data_example)` and `learn(data_example)` functions at each data example.
--- a/flaml/tune/README.md
+++ b/flaml/tune/README.md
@ -5,45 +5,47 @@ It can be used standalone, or together with ray tune or nni. Please find detaile

 Below are some quick examples.

-* Example for sequential tuning (recommended when compute resource is limited and each trial can consume all the resources):
+- Example for sequential tuning (recommended when compute resource is limited and each trial can consume all the resources):

 ```python
 # require: pip install flaml[blendsearch]
 from flaml import tune
 import time

+
 def evaluate_config(config):
-    '''evaluate a hyperparameter configuration'''
+    """evaluate a hyperparameter configuration"""
    # we uss a toy example with 2 hyperparameters
-    metric = (round(config['x'])-85000)**2 - config['x']/config['y']
+    metric = (round(config["x"]) - 85000) ** 2 - config["x"] / config["y"]
    # usually the evaluation takes an non-neglible cost
    # and the cost could be related to certain hyperparameters
    # in this example, we assume it's proportional to x
-    time.sleep(config['x']/100000)
+    time.sleep(config["x"] / 100000)
    # use tune.report to report the metric to optimize
    tune.report(metric=metric)

+
 analysis = tune.run(
-    evaluate_config,    # the function to evaluate a config
+    evaluate_config,  # the function to evaluate a config
    config={
-        'x': tune.lograndint(lower=1, upper=100000),
-        'y': tune.randint(lower=1, upper=100000)
-    }, # the search space
-    low_cost_partial_config={'x':1},    # a initial (partial) config with low cost
-    metric='metric',    # the name of the metric used for optimization
-    mode='min',         # the optimization mode, 'min' or 'max'
-    num_samples=-1,    # the maximal number of configs to try, -1 means infinite
-    time_budget_s=60,   # the time budget in seconds
-    local_dir='logs/',  # the local directory to store logs
+        "x": tune.lograndint(lower=1, upper=100000),
+        "y": tune.randint(lower=1, upper=100000),
+    },  # the search space
+    low_cost_partial_config={"x": 1},  # a initial (partial) config with low cost
+    metric="metric",  # the name of the metric used for optimization
+    mode="min",  # the optimization mode, 'min' or 'max'
+    num_samples=-1,  # the maximal number of configs to try, -1 means infinite
+    time_budget_s=60,  # the time budget in seconds
+    local_dir="logs/",  # the local directory to store logs
    # verbose=0,          # verbosity
    # use_ray=True, # uncomment when performing parallel tuning using ray
-    )
+)

 print(analysis.best_trial.last_result)  # the best trial's result
-print(analysis.best_config) # the best config
+print(analysis.best_config)  # the best config
 ```

-* Example for using ray tune's API:
+- Example for using ray tune's API:

 ```python
 # require: pip install flaml[blendsearch,ray]
@ -51,36 +53,39 @@ from ray import tune as raytune
 from flaml import CFO, BlendSearch
 import time

+
 def evaluate_config(config):
-    '''evaluate a hyperparameter configuration'''
+    """evaluate a hyperparameter configuration"""
    # we use a toy example with 2 hyperparameters
-    metric = (round(config['x'])-85000)**2 - config['x']/config['y']
+    metric = (round(config["x"]) - 85000) ** 2 - config["x"] / config["y"]
    # usually the evaluation takes a non-neglible cost
    # and the cost could be related to certain hyperparameters
    # in this example, we assume it's proportional to x
-    time.sleep(config['x']/100000)
+    time.sleep(config["x"] / 100000)
    # use tune.report to report the metric to optimize
    tune.report(metric=metric)

+
 # provide a time budget (in seconds) for the tuning process
 time_budget_s = 60
 # provide the search space
 config_search_space = {
-        'x': tune.lograndint(lower=1, upper=100000),
-        'y': tune.randint(lower=1, upper=100000)
-    }
+    "x": tune.lograndint(lower=1, upper=100000),
+    "y": tune.randint(lower=1, upper=100000),
+}
 # provide the low cost partial config
-low_cost_partial_config={'x':1}
+low_cost_partial_config = {"x": 1}

 # set up CFO
 cfo = CFO(low_cost_partial_config=low_cost_partial_config)

 # set up BlendSearch
 blendsearch = BlendSearch(
-    metric="metric", mode="min",
+    metric="metric",
+    mode="min",
    space=config_search_space,
    low_cost_partial_config=low_cost_partial_config,
-    time_budget_s=time_budget_s
+    time_budget_s=time_budget_s,
 )
 # NOTE: when using BlendSearch as a search_alg in ray tune, you need to
 # configure the 'time_budget_s' for BlendSearch accordingly such that
@ -89,28 +94,28 @@ blendsearch = BlendSearch(
 # automatically in flaml.

 analysis = raytune.run(
-    evaluate_config,    # the function to evaluate a config
+    evaluate_config,  # the function to evaluate a config
    config=config_search_space,
-    metric='metric',    # the name of the metric used for optimization
-    mode='min',         # the optimization mode, 'min' or 'max'
-    num_samples=-1,     # the maximal number of configs to try, -1 means infinite
-    time_budget_s=time_budget_s,   # the time budget in seconds
-    local_dir='logs/',  # the local directory to store logs
-    search_alg=blendsearch  # or cfo
+    metric="metric",  # the name of the metric used for optimization
+    mode="min",  # the optimization mode, 'min' or 'max'
+    num_samples=-1,  # the maximal number of configs to try, -1 means infinite
+    time_budget_s=time_budget_s,  # the time budget in seconds
+    local_dir="logs/",  # the local directory to store logs
+    search_alg=blendsearch,  # or cfo
 )

 print(analysis.best_trial.last_result)  # the best trial's result
 print(analysis.best_config)  # the best config
 ```

-* Example for using NNI: An example of using BlendSearch with NNI can be seen in [test](https://github.com/microsoft/FLAML/tree/main/test/nni). CFO can be used as well in a similar manner. To run the example, first make sure you have [NNI](https://nni.readthedocs.io/en/stable/) installed, then run:
+- Example for using NNI: An example of using BlendSearch with NNI can be seen in [test](https://github.com/microsoft/FLAML/tree/main/test/nni). CFO can be used as well in a similar manner. To run the example, first make sure you have [NNI](https://nni.readthedocs.io/en/stable/) installed, then run:

 ```shell
 $nnictl create --config ./config.yml
 ```

-* For more examples, please check out
-[notebooks](https://github.com/microsoft/FLAML/tree/main/notebook/).
+- For more examples, please check out
+  [notebooks](https://github.com/microsoft/FLAML/tree/main/notebook/).

 `flaml` offers two HPO methods: CFO and BlendSearch.
 `flaml.tune` uses BlendSearch by default.
@ -185,16 +190,16 @@ tune.run(...
 )
 ```

-* Recommended scenario: cost-related hyperparameters exist, a low-cost
-initial point is known, and the search space is complex such that local search
-is prone to be stuck at local optima.
+- Recommended scenario: cost-related hyperparameters exist, a low-cost
+  initial point is known, and the search space is complex such that local search
+  is prone to be stuck at local optima.

-* Suggestion about using larger search space in BlendSearch:
-In hyperparameter optimization, a larger search space is desirable because it is more likely to include the optimal configuration (or one of the optimal configurations) in hindsight. However the performance (especially anytime performance) of most existing HPO methods is undesirable if the cost of the configurations in the search space has a large variation. Thus hand-crafted small search spaces (with relatively homogeneous cost) are often used in practice for these methods, which is subject to idiosyncrasy. BlendSearch combines the benefits of local search and global search, which enables a smart (economical) way of deciding where to explore in the search space even though it is larger than necessary. This allows users to specify a larger search space in BlendSearch, which is often easier and a better practice than narrowing down the search space by hand.
+- Suggestion about using larger search space in BlendSearch:
+  In hyperparameter optimization, a larger search space is desirable because it is more likely to include the optimal configuration (or one of the optimal configurations) in hindsight. However the performance (especially anytime performance) of most existing HPO methods is undesirable if the cost of the configurations in the search space has a large variation. Thus hand-crafted small search spaces (with relatively homogeneous cost) are often used in practice for these methods, which is subject to idiosyncrasy. BlendSearch combines the benefits of local search and global search, which enables a smart (economical) way of deciding where to explore in the search space even though it is larger than necessary. This allows users to specify a larger search space in BlendSearch, which is often easier and a better practice than narrowing down the search space by hand.

 For more technical details, please check our papers.

-* [Frugal Optimization for Cost-related Hyperparameters](https://arxiv.org/abs/2005.01571). Qingyun Wu, Chi Wang, Silu Huang. AAAI 2021.
+- [Frugal Optimization for Cost-related Hyperparameters](https://arxiv.org/abs/2005.01571). Qingyun Wu, Chi Wang, Silu Huang. AAAI 2021.

 ```bibtex
@inproceedings{wu2021cfo,
@ -205,7 +210,7 @@ For more technical details, please check our papers.
 }
 ```

-* [Economical Hyperparameter Optimization With Blended Search Strategy](https://www.microsoft.com/en-us/research/publication/economical-hyperparameter-optimization-with-blended-search-strategy/). Chi Wang, Qingyun Wu, Silu Huang, Amin Saied. ICLR 2021.
+- [Economical Hyperparameter Optimization With Blended Search Strategy](https://www.microsoft.com/en-us/research/publication/economical-hyperparameter-optimization-with-blended-search-strategy/). Chi Wang, Qingyun Wu, Silu Huang, Amin Saied. ICLR 2021.

 ```bibtex
@inproceedings{wang2021blendsearch,
--- a/tutorials/README.md
+++ b/tutorials/README.md
@ -1,4 +1,5 @@
 Please find tutorials on FLAML below:
+
 - [PyData Seattle 2023](flaml-tutorial-pydata-23.md)
 - [A hands-on tutorial on FLAML presented at KDD 2022](flaml-tutorial-kdd-22.md)
 - [A lab forum on FLAML at AAAI 2023](flaml-tutorial-aaai-23.md)
--- a/tutorials/flaml-tutorial-aaai-23.md
+++ b/tutorials/flaml-tutorial-aaai-23.md
@ -15,8 +15,8 @@ For the most up-to-date information, see the [AAAI'23 Program Agenda](https://aa
 ## What Will You Learn?

 - What FLAML is and how to use FLAML to
-    - find accurate ML models with low computational resources for common ML tasks
-    - tune hyperparameters generically
+  - find accurate ML models with low computational resources for common ML tasks
+  - tune hyperparameters generically
 - How to leverage the flexible and rich customization choices
  - finish the last mile for deployment
  - create new applications
@ -29,39 +29,43 @@ For the most up-to-date information, see the [AAAI'23 Program Agenda](https://aa

 - Overview of AutoML and FLAML
 - Basic usages of FLAML
-    - Task-oriented AutoML
-        - [Documentation](https://microsoft.github.io/FLAML/docs/Use-Cases/Task-Oriented-AutoML)
-        - [Notebook: A classification task with AutoML](https://github.com/microsoft/FLAML/blob/tutorial-aaai23/notebook/automl_classification.ipynb); [Open In Colab](https://colab.research.google.com/github/microsoft/FLAML/blob/tutorial-aaai23/notebook/automl_classification.ipynb)
-    - Tune User-Defined-functions with FLAML
-        - [Documentation](https://microsoft.github.io/FLAML/docs/Use-Cases/Tune-User-Defined-Function)
-        - [Notebook: Tune user-defined function](https://github.com/microsoft/FLAML/blob/tutorial-aaai23/notebook/tune_demo.ipynb); [Open In Colab](https://colab.research.google.com/github/microsoft/FLAML/blob/tutorial-aaai23/notebook/tune_demo.ipynb)
-    - Zero-shot AutoML
-        - [Documentation](https://microsoft.github.io/FLAML/docs/Use-Cases/Zero-Shot-AutoML)
-        - [Notebook: Zeroshot AutoML](https://github.com/microsoft/FLAML/blob/tutorial-aaai23/notebook/zeroshot_lightgbm.ipynb); [Open In Colab](https://colab.research.google.com/github/microsoft/FLAML/blob/tutorial-aaai23/notebook/zeroshot_lightgbm.ipynb)
+  - Task-oriented AutoML
+    - [Documentation](https://microsoft.github.io/FLAML/docs/Use-Cases/Task-Oriented-AutoML)
+    - [Notebook: A classification task with AutoML](https://github.com/microsoft/FLAML/blob/tutorial-aaai23/notebook/automl_classification.ipynb); [Open In Colab](https://colab.research.google.com/github/microsoft/FLAML/blob/tutorial-aaai23/notebook/automl_classification.ipynb)
+  - Tune User-Defined-functions with FLAML
+    - [Documentation](https://microsoft.github.io/FLAML/docs/Use-Cases/Tune-User-Defined-Function)
+    - [Notebook: Tune user-defined function](https://github.com/microsoft/FLAML/blob/tutorial-aaai23/notebook/tune_demo.ipynb); [Open In Colab](https://colab.research.google.com/github/microsoft/FLAML/blob/tutorial-aaai23/notebook/tune_demo.ipynb)
+  - Zero-shot AutoML
+    - [Documentation](https://microsoft.github.io/FLAML/docs/Use-Cases/Zero-Shot-AutoML)
+    - [Notebook: Zeroshot AutoML](https://github.com/microsoft/FLAML/blob/tutorial-aaai23/notebook/zeroshot_lightgbm.ipynb); [Open In Colab](https://colab.research.google.com/github/microsoft/FLAML/blob/tutorial-aaai23/notebook/zeroshot_lightgbm.ipynb)
 - [ML.NET demo](https://learn.microsoft.com/dotnet/machine-learning/tutorials/predict-prices-with-model-builder)

 Break (15m)

 ### **Part 2. Deep Dive into FLAML**
+
 - The Science Behind FLAML’s Success
-    - [Economical hyperparameter optimization methods in FLAML](https://microsoft.github.io/FLAML/docs/Use-Cases/Tune-User-Defined-Function/#hyperparameter-optimization-algorithm)
-    - [Other research in FLAML](https://microsoft.github.io/FLAML/docs/Research)
+
+  - [Economical hyperparameter optimization methods in FLAML](https://microsoft.github.io/FLAML/docs/Use-Cases/Tune-User-Defined-Function/#hyperparameter-optimization-algorithm)
+  - [Other research in FLAML](https://microsoft.github.io/FLAML/docs/Research)

 - Maximize the Power of FLAML through Customization and Advanced Functionalities
-    - [Notebook: Customize your AutoML with FLAML](https://github.com/microsoft/FLAML/blob/tutorial-aaai23/notebook/customize_your_automl_with_flaml.ipynb); [Open In Colab](https://colab.research.google.com/github/microsoft/FLAML/blob/tutorial-aaai23/notebook/customize_your_automl_with_flaml.ipynb)
-    - [Notebook: Further acceleration of AutoML with FLAML](https://github.com/microsoft/FLAML/blob/tutorial-aaai23/notebook/further_acceleration_of_automl_with_flaml.ipynb); [Open In Colab](https://colab.research.google.com/github/microsoft/FLAML/blob/tutorial-aaai23/notebook/further_acceleration_of_automl_with_flaml.ipynb)
-    - [Notebook: Neural network model tuning with FLAML ](https://github.com/microsoft/FLAML/blob/tutorial-aaai23/notebook/tune_pytorch.ipynb); [Open In Colab](https://colab.research.google.com/github/microsoft/FLAML/blob/tutorial-aaai23/notebook/tune_pytorch.ipynb)

+  - [Notebook: Customize your AutoML with FLAML](https://github.com/microsoft/FLAML/blob/tutorial-aaai23/notebook/customize_your_automl_with_flaml.ipynb); [Open In Colab](https://colab.research.google.com/github/microsoft/FLAML/blob/tutorial-aaai23/notebook/customize_your_automl_with_flaml.ipynb)
+  - [Notebook: Further acceleration of AutoML with FLAML](https://github.com/microsoft/FLAML/blob/tutorial-aaai23/notebook/further_acceleration_of_automl_with_flaml.ipynb); [Open In Colab](https://colab.research.google.com/github/microsoft/FLAML/blob/tutorial-aaai23/notebook/further_acceleration_of_automl_with_flaml.ipynb)
+  - [Notebook: Neural network model tuning with FLAML ](https://github.com/microsoft/FLAML/blob/tutorial-aaai23/notebook/tune_pytorch.ipynb); [Open In Colab](https://colab.research.google.com/github/microsoft/FLAML/blob/tutorial-aaai23/notebook/tune_pytorch.ipynb)

 ### **Part 3. New features in FLAML**
+
 - Natural language processing
-    - [Notebook: AutoML for NLP tasks](https://github.com/microsoft/FLAML/blob/tutorial-aaai23/notebook/automl_nlp.ipynb); [Open In Colab](https://colab.research.google.com/github/microsoft/FLAML/blob/tutorial-aaai23/notebook/automl_nlp.ipynb)
+  - [Notebook: AutoML for NLP tasks](https://github.com/microsoft/FLAML/blob/tutorial-aaai23/notebook/automl_nlp.ipynb); [Open In Colab](https://colab.research.google.com/github/microsoft/FLAML/blob/tutorial-aaai23/notebook/automl_nlp.ipynb)
 - Time Series Forecasting
-    - [Notebook: AutoML for Time Series Forecast tasks](https://github.com/microsoft/FLAML/blob/tutorial-aaai23/notebook/automl_time_series_forecast.ipynb); [Open In Colab](https://colab.research.google.com/github/microsoft/FLAML/blob/tutorial-aaai23/notebook/automl_time_series_forecast.ipynb)
+  - [Notebook: AutoML for Time Series Forecast tasks](https://github.com/microsoft/FLAML/blob/tutorial-aaai23/notebook/automl_time_series_forecast.ipynb); [Open In Colab](https://colab.research.google.com/github/microsoft/FLAML/blob/tutorial-aaai23/notebook/automl_time_series_forecast.ipynb)
 - Targeted Hyperparameter Optimization With Lexicographic Objectives
-    - [Documentation](https://microsoft.github.io/FLAML/docs/Use-Cases/Tune-User-Defined-Function/#lexicographic-objectives)
-    - [Notebook: Find accurate and fast neural networks with lexicographic objectives](https://github.com/microsoft/FLAML/blob/tutorial-aaai23/notebook/tune_lexicographic.ipynb); [Open In Colab](https://colab.research.google.com/github/microsoft/FLAML/blob/tutorial-aaai23/notebook/tune_lexicographic.ipynb)
+  - [Documentation](https://microsoft.github.io/FLAML/docs/Use-Cases/Tune-User-Defined-Function/#lexicographic-objectives)
+  - [Notebook: Find accurate and fast neural networks with lexicographic objectives](https://github.com/microsoft/FLAML/blob/tutorial-aaai23/notebook/tune_lexicographic.ipynb); [Open In Colab](https://colab.research.google.com/github/microsoft/FLAML/blob/tutorial-aaai23/notebook/tune_lexicographic.ipynb)
 - Online AutoML
-    - [Notebook: Online AutoML with Vowpal Wabbit](https://github.com/microsoft/FLAML/blob/tutorial-aaai23/notebook/autovw.ipynb); [Open In Colab](https://colab.research.google.com/github/microsoft/FLAML/blob/tutorial-aaai23/notebook/autovw.ipynb)
+  - [Notebook: Online AutoML with Vowpal Wabbit](https://github.com/microsoft/FLAML/blob/tutorial-aaai23/notebook/autovw.ipynb); [Open In Colab](https://colab.research.google.com/github/microsoft/FLAML/blob/tutorial-aaai23/notebook/autovw.ipynb)
 - Fair AutoML
+
 ### Challenges and open problems
--- a/tutorials/flaml-tutorial-kdd-22.md
+++ b/tutorials/flaml-tutorial-kdd-22.md
@ -26,23 +26,23 @@ For the most up-to-date information, see the [SIGKDD'22 Program Agenda](https://

 - Overview of AutoML and FLAML
 - Task-oriented AutoML with FLAML
-    - [Notebook: A classification task with AutoML](https://github.com/microsoft/FLAML/blob/tutorial/notebook/automl_classification.ipynb); [Open In Colab](https://colab.research.google.com/github/microsoft/FLAML/blob/tutorial/notebook/automl_classification.ipynb)
-    - [Notebook: A regression task with AuotML using LightGBM as the learner](https://github.com/microsoft/FLAML/blob/tutorial/notebook/automl_lightgbm.ipynb); [Open In Colab](https://colab.research.google.com/github/microsoft/FLAML/blob/tutorial/notebook/automl_lightgbm.ipynb)
+  - [Notebook: A classification task with AutoML](https://github.com/microsoft/FLAML/blob/tutorial/notebook/automl_classification.ipynb); [Open In Colab](https://colab.research.google.com/github/microsoft/FLAML/blob/tutorial/notebook/automl_classification.ipynb)
+  - [Notebook: A regression task with AuotML using LightGBM as the learner](https://github.com/microsoft/FLAML/blob/tutorial/notebook/automl_lightgbm.ipynb); [Open In Colab](https://colab.research.google.com/github/microsoft/FLAML/blob/tutorial/notebook/automl_lightgbm.ipynb)
 - [ML.NET demo](https://docs.microsoft.com/dotnet/machine-learning/tutorials/predict-prices-with-model-builder)
 - Tune user defined functions with FLAML
-    - [Notebook: Basic tuning procedures and advanced tuning options](https://github.com/microsoft/FLAML/blob/tutorial/notebook/tune_demo.ipynb); [Open In Colab](https://colab.research.google.com/github/microsoft/FLAML/blob/tutorial/notebook/tune_demo.ipynb)
-    - [Notebook: Tune pytorch](https://github.com/microsoft/FLAML/blob/tutorial/notebook/tune_pytorch.ipynb); [Open In Colab](https://colab.research.google.com/github/microsoft/FLAML/blob/tutorial/notebook/tune_pytorch.ipynb)
+  - [Notebook: Basic tuning procedures and advanced tuning options](https://github.com/microsoft/FLAML/blob/tutorial/notebook/tune_demo.ipynb); [Open In Colab](https://colab.research.google.com/github/microsoft/FLAML/blob/tutorial/notebook/tune_demo.ipynb)
+  - [Notebook: Tune pytorch](https://github.com/microsoft/FLAML/blob/tutorial/notebook/tune_pytorch.ipynb); [Open In Colab](https://colab.research.google.com/github/microsoft/FLAML/blob/tutorial/notebook/tune_pytorch.ipynb)
 - Q & A

 ### Part 2

 - Zero-shot AutoML
-    - [Notebook: Zeroshot AutoML](https://github.com/microsoft/FLAML/blob/tutorial/notebook/zeroshot_lightgbm.ipynb); [Open In Colab](https://colab.research.google.com/github/microsoft/FLAML/blob/tutorial/notebook/zeroshot_lightgbm.ipynb)
+  - [Notebook: Zeroshot AutoML](https://github.com/microsoft/FLAML/blob/tutorial/notebook/zeroshot_lightgbm.ipynb); [Open In Colab](https://colab.research.google.com/github/microsoft/FLAML/blob/tutorial/notebook/zeroshot_lightgbm.ipynb)
 - Time series forecasting
-    - [Notebook: AutoML for Time Series Forecast tasks](https://github.com/microsoft/FLAML/blob/tutorial/notebook/automl_time_series_forecast.ipynb); [Open In Colab](https://colab.research.google.com/github/microsoft/FLAML/blob/tutorial/notebook/automl_time_series_forecast.ipynb)
+  - [Notebook: AutoML for Time Series Forecast tasks](https://github.com/microsoft/FLAML/blob/tutorial/notebook/automl_time_series_forecast.ipynb); [Open In Colab](https://colab.research.google.com/github/microsoft/FLAML/blob/tutorial/notebook/automl_time_series_forecast.ipynb)
 - Natural language processing
-    - [Notebook: AutoML for NLP tasks](https://github.com/microsoft/FLAML/blob/tutorial/notebook/automl_nlp.ipynb); [Open In Colab](https://colab.research.google.com/github/microsoft/FLAML/blob/tutorial/notebook/automl_nlp.ipynb)
+  - [Notebook: AutoML for NLP tasks](https://github.com/microsoft/FLAML/blob/tutorial/notebook/automl_nlp.ipynb); [Open In Colab](https://colab.research.google.com/github/microsoft/FLAML/blob/tutorial/notebook/automl_nlp.ipynb)
 - Online AutoML
-    - [Notebook: Online AutoML with Vowpal Wabbit](https://github.com/microsoft/FLAML/blob/tutorial/notebook/autovw.ipynb); [Open In Colab](https://colab.research.google.com/github/microsoft/FLAML/blob/tutorial/notebook/autovw.ipynb)
+  - [Notebook: Online AutoML with Vowpal Wabbit](https://github.com/microsoft/FLAML/blob/tutorial/notebook/autovw.ipynb); [Open In Colab](https://colab.research.google.com/github/microsoft/FLAML/blob/tutorial/notebook/autovw.ipynb)
 - Fair AutoML
 - Challenges and open problems
--- a/tutorials/flaml-tutorial-pydata-23.md
+++ b/tutorials/flaml-tutorial-pydata-23.md
@ -19,22 +19,26 @@ In this session, we will provide an in-depth and hands-on tutorial on Automated
 ## Tutorial Outline

 ### **Part 1. Overview**
+
 - Overview of AutoML & Hyperparameter Tuning

 ### **Part 2. Introduction to FLAML**
+
 - Introduction to FLAML
 - AutoML and Hyperparameter Tuning with FLAML
-    - [Notebook: AutoML with FLAML Library](https://github.com/microsoft/FLAML/blob/d047c79352a2b5d32b72f4323dadfa2be0db8a45/notebook/automl_flight_delays.ipynb)
-    - [Notebook: Hyperparameter Tuning with FLAML](https://github.com/microsoft/FLAML/blob/d047c79352a2b5d32b72f4323dadfa2be0db8a45/notebook/tune_synapseml.ipynb)
+  - [Notebook: AutoML with FLAML Library](https://github.com/microsoft/FLAML/blob/d047c79352a2b5d32b72f4323dadfa2be0db8a45/notebook/automl_flight_delays.ipynb)
+  - [Notebook: Hyperparameter Tuning with FLAML](https://github.com/microsoft/FLAML/blob/d047c79352a2b5d32b72f4323dadfa2be0db8a45/notebook/tune_synapseml.ipynb)

 ### **Part 3. Deep Dive into FLAML**
+
 - Advanced Functionalities
 - Parallelization with Apache Spark
-    - [Notebook: FLAML AutoML on Apache Spark](https://github.com/microsoft/FLAML/blob/d047c79352a2b5d32b72f4323dadfa2be0db8a45/notebook/automl_bankrupt_synapseml.ipynb)
+  - [Notebook: FLAML AutoML on Apache Spark](https://github.com/microsoft/FLAML/blob/d047c79352a2b5d32b72f4323dadfa2be0db8a45/notebook/automl_bankrupt_synapseml.ipynb)

 ### **Part 4. New features in FLAML**
+
 - Targeted Hyperparameter Optimization With Lexicographic Objectives
-    - [Notebook: Tune models with lexicographic preference across objectives](https://github.com/microsoft/FLAML/blob/7ae410c8eb967e2084b2e7dbe7d5fa2145a44b79/notebook/tune_lexicographic.ipynb)
+  - [Notebook: Tune models with lexicographic preference across objectives](https://github.com/microsoft/FLAML/blob/7ae410c8eb967e2084b2e7dbe7d5fa2145a44b79/notebook/tune_lexicographic.ipynb)
 - OpenAI GPT-3, GPT-4 and ChatGPT tuning
-    - [Notebook: Use FLAML to Tune OpenAI Models](https://github.com/microsoft/FLAML/blob/a0b318b12ee8288db54b674904655307f9e201c2/notebook/autogen_openai_completion.ipynb)
-    - [Notebook: Use FLAML to Tune ChatGPT](https://github.com/microsoft/FLAML/blob/a0b318b12ee8288db54b674904655307f9e201c2/notebook/autogen_chatgpt_gpt4.ipynb)
+  - [Notebook: Use FLAML to Tune OpenAI Models](https://github.com/microsoft/FLAML/blob/a0b318b12ee8288db54b674904655307f9e201c2/notebook/autogen_openai_completion.ipynb)
+  - [Notebook: Use FLAML to Tune ChatGPT](https://github.com/microsoft/FLAML/blob/a0b318b12ee8288db54b674904655307f9e201c2/notebook/autogen_chatgpt_gpt4.ipynb)
--- a/website/docs/Contribute.md
+++ b/website/docs/Contribute.md
@ -2,13 +2,13 @@

 This project welcomes and encourages all forms of contributions, including but not limited to:

-  Pushing patches.
-  Code review of pull requests.
-  Documentation, examples and test cases.
-  Readability improvement, e.g., improvement on docstr and comments.
-  Community participation in [issues](https://github.com/microsoft/FLAML/issues), [discussions](https://github.com/microsoft/FLAML/discussions), and [discord](https://discord.gg/7ZVfhbTQZ5).
-  Tutorials, blog posts, talks that promote the project.
-  Sharing application scenarios and/or related research.
+- Pushing patches.
+- Code review of pull requests.
+- Documentation, examples and test cases.
+- Readability improvement, e.g., improvement on docstr and comments.
+- Community participation in [issues](https://github.com/microsoft/FLAML/issues), [discussions](https://github.com/microsoft/FLAML/discussions), and [discord](https://discord.gg/7ZVfhbTQZ5).
+- Tutorials, blog posts, talks that promote the project.
+- Sharing application scenarios and/or related research.

 You can take a look at the [Roadmap for Upcoming Features](https://github.com/microsoft/FLAML/wiki/Roadmap-for-Upcoming-Features) to identify potential things to work on.

@ -41,8 +41,10 @@ feedback:
 - Please include your **operating system type and version number**, as well as
  your **Python, flaml, scikit-learn versions**. The version of flaml
  can be found by running the following code snippet:
+
 ```python
 import flaml
+
 print(flaml.__version__)
 ```

@ -50,7 +52,6 @@ print(flaml.__version__)
  appropriate code blocks**.  See [Creating and highlighting code blocks](https://help.github.com/articles/creating-and-highlighting-code-blocks)
  for more details.

-
 ## Becoming a Reviewer

 There is currently no formal reviewer solicitation process. Current reviewers identify reviewers from active contributors. If you are willing to become a reviewer, you are welcome to let us know on discord.
@ -87,7 +88,7 @@ Run `pre-commit install` to install pre-commit into your git hooks. Before you c

 ### Coverage

-Any code you commit should not decrease coverage. To run all unit tests, install the [test] option under FLAML/:
+Any code you commit should not decrease coverage. To run all unit tests, install the \[test\] option under FLAML/:

 ```bash
 pip install -e."[test]"
--- a/website/docs/Examples/AutoGen-OpenAI.md
+++ b/website/docs/Examples/AutoGen-OpenAI.md
@ -4,5 +4,6 @@
 Please find documentation about this feature [here](https://microsoft.github.io/autogen/docs/Use-Cases/#enhanced-inference).

 Links to notebook examples:
-* [Optimize for Code Generation](https://github.com/microsoft/FLAML/blob/main/notebook/autogen_openai_completion.ipynb) | [Open in colab](https://colab.research.google.com/github/microsoft/FLAML/blob/main/notebook/autogen_openai_completion.ipynb)
-* [Optimize for Math](https://github.com/microsoft/FLAML/blob/main/notebook/autogen_chatgpt_gpt4.ipynb) | [Open in colab](https://colab.research.google.com/github/microsoft/FLAML/blob/main/notebook/autogen_chatgpt_gpt4.ipynb)
+
+- [Optimize for Code Generation](https://github.com/microsoft/FLAML/blob/main/notebook/autogen_openai_completion.ipynb) | [Open in colab](https://colab.research.google.com/github/microsoft/FLAML/blob/main/notebook/autogen_openai_completion.ipynb)
+- [Optimize for Math](https://github.com/microsoft/FLAML/blob/main/notebook/autogen_chatgpt_gpt4.ipynb) | [Open in colab](https://colab.research.google.com/github/microsoft/FLAML/blob/main/notebook/autogen_chatgpt_gpt4.ipynb)
--- a/website/docs/Examples/AutoML-Classification.md
+++ b/website/docs/Examples/AutoML-Classification.md
@ -2,7 +2,8 @@

 ### Prerequisites

-Install the [automl] option.
+Install the \[automl\] option.
+
 ```bash
 pip install "flaml[automl]"
 ```
@ -18,14 +19,13 @@ automl = AutoML()
 # Specify automl goal and constraint
 automl_settings = {
    "time_budget": 1,  # in seconds
-    "metric": 'accuracy',
-    "task": 'classification',
+    "metric": "accuracy",
+    "task": "classification",
    "log_file_name": "iris.log",
 }
 X_train, y_train = load_iris(return_X_y=True)
 # Train with labeled input data
-automl.fit(X_train=X_train, y_train=y_train,
-           **automl_settings)
+automl.fit(X_train=X_train, y_train=y_train, **automl_settings)
 # Predict
 print(automl.predict_proba(X_train))
 # Print the best model
@ -33,6 +33,7 @@ print(automl.model.estimator)
 ```

 #### Sample of output
+
 ```
 [flaml.automl: 11-12 18:21:44] {1485} INFO - Data split method: stratified
 [flaml.automl: 11-12 18:21:44] {1489} INFO - Evaluation method: cv
--- a/website/docs/Examples/AutoML-NLP.md
+++ b/website/docs/Examples/AutoML-NLP.md
@ -2,7 +2,8 @@

 ### Requirements

-This example requires GPU. Install the [automl,hf] option:
+This example requires GPU. Install the \[automl,hf\] option:
+
 ```python
 pip install "flaml[automl,hf]"
 ```
@ -31,9 +32,11 @@ automl_settings = {
            "output_dir": "data/output/"  # if model_path is not set, the default model is facebook/muppet-roberta-base: https://huggingface.co/facebook/muppet-roberta-base
        }
    },  # setting the huggingface arguments: output directory
-    "gpu_per_trial": 1,                         # set to 0 if no GPU is available
+    "gpu_per_trial": 1,  # set to 0 if no GPU is available
 }
-automl.fit(X_train=X_train, y_train=y_train, X_val=X_val, y_val=y_val, **automl_settings)
+automl.fit(
+    X_train=X_train, y_train=y_train, X_val=X_val, y_val=y_val, **automl_settings
+)
 automl.predict(X_test)
 ```

@ -68,12 +71,8 @@ if os.path.exists("data/output/"):
 from flaml import AutoML
 from datasets import load_dataset

-train_dataset = (
-    load_dataset("glue", "stsb", split="train").to_pandas()
-)
-dev_dataset = (
-    load_dataset("glue", "stsb", split="train").to_pandas()
-)
+train_dataset = load_dataset("glue", "stsb", split="train").to_pandas()
+dev_dataset = load_dataset("glue", "stsb", split="train").to_pandas()
 custom_sent_keys = ["sentence1", "sentence2"]
 label_key = "label"
 X_train = train_dataset[custom_sent_keys]
@ -90,10 +89,10 @@ automl_settings = {
 }
 automl_settings["fit_kwargs_by_estimator"] = {  # setting the huggingface arguments
    "transformer": {
-        "model_path": "google/electra-small-discriminator", # if model_path is not set, the default model is facebook/muppet-roberta-base: https://huggingface.co/facebook/muppet-roberta-base
-        "output_dir": "data/output/",                       # setting the output directory
+        "model_path": "google/electra-small-discriminator",  # if model_path is not set, the default model is facebook/muppet-roberta-base: https://huggingface.co/facebook/muppet-roberta-base
+        "output_dir": "data/output/",  # setting the output directory
        "fp16": False,
-    }   # setting whether to use FP16
+    }  # setting whether to use FP16
 }
 automl.fit(
    X_train=X_train, y_train=y_train, X_val=X_val, y_val=y_val, **automl_settings
@ -117,12 +116,8 @@ automl.fit(
 from flaml import AutoML
 from datasets import load_dataset

-train_dataset = (
-    load_dataset("xsum", split="train").to_pandas()
-)
-dev_dataset = (
-    load_dataset("xsum", split="validation").to_pandas()
-)
+train_dataset = load_dataset("xsum", split="train").to_pandas()
+dev_dataset = load_dataset("xsum", split="validation").to_pandas()
 custom_sent_keys = ["document"]
 label_key = "summary"

@ -139,17 +134,18 @@ automl_settings = {
    "task": "summarization",
    "metric": "rouge1",
 }
-automl_settings["fit_kwargs_by_estimator"] = {      # setting the huggingface arguments
+automl_settings["fit_kwargs_by_estimator"] = {  # setting the huggingface arguments
    "transformer": {
-        "model_path": "t5-small",             # if model_path is not set, the default model is t5-small: https://huggingface.co/t5-small
-        "output_dir": "data/output/",         # setting the output directory
+        "model_path": "t5-small",  # if model_path is not set, the default model is t5-small: https://huggingface.co/t5-small
+        "output_dir": "data/output/",  # setting the output directory
        "fp16": False,
-    } # setting whether to use FP16
+    }  # setting whether to use FP16
 }
 automl.fit(
    X_train=X_train, y_train=y_train, X_val=X_val, y_val=y_val, **automl_settings
 )
 ```
+
 #### Sample Output

 ```
@ -234,7 +230,15 @@ train_dataset = {
    ],
    "tokens": [
        [
-            "EU", "rejects", "German", "call", "to", "boycott", "British", "lamb", ".",
+            "EU",
+            "rejects",
+            "German",
+            "call",
+            "to",
+            "boycott",
+            "British",
+            "lamb",
+            ".",
        ],
        ["Peter", "Blackburn"],
    ],
@ -244,18 +248,14 @@ dev_dataset = {
    "ner_tags": [
        ["O"],
    ],
-    "tokens": [
-        ["1996-08-22"]
-    ],
+    "tokens": [["1996-08-22"]],
 }
 test_dataset = {
    "id": ["0"],
    "ner_tags": [
        ["O"],
    ],
-    "tokens": [
-        ['.']
-    ],
+    "tokens": [["."]],
 }
 custom_sent_keys = ["tokens"]
 label_key = "ner_tags"
@ -273,17 +273,18 @@ automl_settings = {
    "time_budget": 10,
    "task": "token-classification",
    "fit_kwargs_by_estimator": {
-        "transformer":
-            {
-                "output_dir": "data/output/"
-                # if model_path is not set, the default model is facebook/muppet-roberta-base: https://huggingface.co/facebook/muppet-roberta-base
-            }
+        "transformer": {
+            "output_dir": "data/output/"
+            # if model_path is not set, the default model is facebook/muppet-roberta-base: https://huggingface.co/facebook/muppet-roberta-base
+        }
    },  # setting the huggingface arguments: output directory
    "gpu_per_trial": 1,  # set to 0 if no GPU is available
-    "metric": "seqeval:overall_f1"
+    "metric": "seqeval:overall_f1",
 }

-automl.fit(X_train=X_train, y_train=y_train, X_val=X_val, y_val=y_val, **automl_settings)
+automl.fit(
+    X_train=X_train, y_train=y_train, X_val=X_val, y_val=y_val, **automl_settings
+)
 automl.predict(X_test)
 ```

@ -294,35 +295,39 @@ from flaml import AutoML
 import pandas as pd

 train_dataset = {
-        "id": ["0", "1"],
-        "ner_tags": [
-            [3, 0, 7, 0, 0, 0, 7, 0, 0],
-            [1, 2],
+    "id": ["0", "1"],
+    "ner_tags": [
+        [3, 0, 7, 0, 0, 0, 7, 0, 0],
+        [1, 2],
+    ],
+    "tokens": [
+        [
+            "EU",
+            "rejects",
+            "German",
+            "call",
+            "to",
+            "boycott",
+            "British",
+            "lamb",
+            ".",
        ],
-        "tokens": [
-            [
-                "EU", "rejects", "German", "call", "to", "boycott", "British", "lamb", ".",
-            ],
-            ["Peter", "Blackburn"],
-        ],
-    }
+        ["Peter", "Blackburn"],
+    ],
+}
 dev_dataset = {
    "id": ["0"],
    "ner_tags": [
        [0],
    ],
-    "tokens": [
-        ["1996-08-22"]
-    ],
+    "tokens": [["1996-08-22"]],
 }
 test_dataset = {
    "id": ["0"],
    "ner_tags": [
        [0],
    ],
-    "tokens": [
-        ['.']
-    ],
+    "tokens": [["."]],
 }
 custom_sent_keys = ["tokens"]
 label_key = "ner_tags"
@ -340,18 +345,29 @@ automl_settings = {
    "time_budget": 10,
    "task": "token-classification",
    "fit_kwargs_by_estimator": {
-        "transformer":
-            {
-                "output_dir": "data/output/",
-                # if model_path is not set, the default model is facebook/muppet-roberta-base: https://huggingface.co/facebook/muppet-roberta-base
-                "label_list": [ "O","B-PER", "I-PER", "B-ORG", "I-ORG", "B-LOC", "I-LOC", "B-MISC", "I-MISC" ]
-            }
+        "transformer": {
+            "output_dir": "data/output/",
+            # if model_path is not set, the default model is facebook/muppet-roberta-base: https://huggingface.co/facebook/muppet-roberta-base
+            "label_list": [
+                "O",
+                "B-PER",
+                "I-PER",
+                "B-ORG",
+                "I-ORG",
+                "B-LOC",
+                "I-LOC",
+                "B-MISC",
+                "I-MISC",
+            ],
+        }
    },  # setting the huggingface arguments: output directory
    "gpu_per_trial": 1,  # set to 0 if no GPU is available
-    "metric": "seqeval:overall_f1"
+    "metric": "seqeval:overall_f1",
 }

-automl.fit(X_train=X_train, y_train=y_train, X_val=X_val, y_val=y_val, **automl_settings)
+automl.fit(
+    X_train=X_train, y_train=y_train, X_val=X_val, y_val=y_val, **automl_settings
+)
 automl.predict(X_test)
 ```

--- a/website/docs/Examples/AutoML-Rank.md
+++ b/website/docs/Examples/AutoML-Rank.md
@ -2,7 +2,8 @@

 ### Prerequisites

-Install the [automl] option.
+Install the \[automl\] option.
+
 ```bash
 pip install "flaml[automl]"
 ```
@ -16,11 +17,14 @@ from flaml import AutoML
 X_train, y_train = fetch_openml(name="credit-g", return_X_y=True, as_frame=False)
 y_train = y_train.cat.codes
 # not a real learning to rank dataaset
-groups = [200] * 4 + [100] * 2    # group counts
+groups = [200] * 4 + [100] * 2  # group counts
 automl = AutoML()
 automl.fit(
-    X_train, y_train, groups=groups,
-    task='rank', time_budget=10,    # in seconds
+    X_train,
+    y_train,
+    groups=groups,
+    task="rank",
+    time_budget=10,  # in seconds
 )
 ```

--- a/website/docs/Examples/AutoML-Regression.md
+++ b/website/docs/Examples/AutoML-Regression.md
@ -2,7 +2,8 @@

 ### Prerequisites

-Install the [automl] option.
+Install the \[automl\] option.
+
 ```bash
 pip install "flaml[automl]"
 ```
@ -18,14 +19,13 @@ automl = AutoML()
 # Specify automl goal and constraint
 automl_settings = {
    "time_budget": 1,  # in seconds
-    "metric": 'r2',
-    "task": 'regression',
+    "metric": "r2",
+    "task": "regression",
    "log_file_name": "california.log",
 }
 X_train, y_train = fetch_california_housing(return_X_y=True)
 # Train with labeled input data
-automl.fit(X_train=X_train, y_train=y_train,
-           **automl_settings)
+automl.fit(X_train=X_train, y_train=y_train, **automl_settings)
 # Predict
 print(automl.predict(X_train))
 # Print the best model
@ -95,7 +95,9 @@ from sklearn.multioutput import MultiOutputRegressor
 X, y = make_regression(n_targets=3)

 # split into train and test data
-X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=42)
+X_train, X_test, y_train, y_test = train_test_split(
+    X, y, test_size=0.30, random_state=42
+)

 # train the model
 model = MultiOutputRegressor(AutoML(task="regression", time_budget=60))
--- a/website/docs/Examples/AutoML-Time
+++ b/website/docs/Examples/AutoML-Time
@ -2,7 +2,8 @@

 ### Prerequisites

-Install the [automl,ts_forecast] option.
+Install the \[automl,ts_forecast\] option.
+
 ```bash
 pip install "flaml[automl,ts_forecast]"
 ```
@ -13,16 +14,18 @@ pip install "flaml[automl,ts_forecast]"
 import numpy as np
 from flaml import AutoML

-X_train = np.arange('2014-01', '2022-01', dtype='datetime64[M]')
+X_train = np.arange("2014-01", "2022-01", dtype="datetime64[M]")
 y_train = np.random.random(size=84)
 automl = AutoML()
-automl.fit(X_train=X_train[:84],  # a single column of timestamp
-           y_train=y_train,  # value for each timestamp
-           period=12,  # time horizon to forecast, e.g., 12 months
-           task='ts_forecast', time_budget=15,  # time budget in seconds
-           log_file_name="ts_forecast.log",
-           eval_method="holdout",
-          )
+automl.fit(
+    X_train=X_train[:84],  # a single column of timestamp
+    y_train=y_train,  # value for each timestamp
+    period=12,  # time horizon to forecast, e.g., 12 months
+    task="ts_forecast",
+    time_budget=15,  # time budget in seconds
+    log_file_name="ts_forecast.log",
+    eval_method="holdout",
+)
 print(automl.predict(X_train[84:]))
 ```

@ -246,32 +249,40 @@ import statsmodels.api as sm

 data = sm.datasets.co2.load_pandas().data
 # data is given in weeks, but the task is to predict monthly, so use monthly averages instead
-data = data['co2'].resample('MS').mean()
+data = data["co2"].resample("MS").mean()
 data = data.bfill().ffill()  # makes sure there are no missing values
 data = data.to_frame().reset_index()
 num_samples = data.shape[0]
 time_horizon = 12
 split_idx = num_samples - time_horizon
-train_df = data[:split_idx]  # train_df is a dataframe with two columns: timestamp and label
-X_test = data[split_idx:]['index'].to_frame()  # X_test is a dataframe with dates for prediction
-y_test = data[split_idx:]['co2']  # y_test is a series of the values corresponding to the dates for prediction
+train_df = data[
+    :split_idx
+]  # train_df is a dataframe with two columns: timestamp and label
+X_test = data[split_idx:][
+    "index"
+].to_frame()  # X_test is a dataframe with dates for prediction
+y_test = data[split_idx:][
+    "co2"
+]  # y_test is a series of the values corresponding to the dates for prediction

 from flaml import AutoML

 automl = AutoML()
 settings = {
    "time_budget": 10,  # total running time in seconds
-    "metric": 'mape',  # primary metric for validation: 'mape' is generally used for forecast tasks
-    "task": 'ts_forecast',  # task type
-    "log_file_name": 'CO2_forecast.log',  # flaml log file
+    "metric": "mape",  # primary metric for validation: 'mape' is generally used for forecast tasks
+    "task": "ts_forecast",  # task type
+    "log_file_name": "CO2_forecast.log",  # flaml log file
    "eval_method": "holdout",  # validation method can be chosen from ['auto', 'holdout', 'cv']
    "seed": 7654321,  # random seed
 }

-automl.fit(dataframe=train_df,  # training data
-           label='co2',  # label column
-           period=time_horizon,  # key word argument 'period' must be included for forecast task)
-           **settings)
+automl.fit(
+    dataframe=train_df,  # training data
+    label="co2",  # label column
+    period=time_horizon,  # key word argument 'period' must be included for forecast task)
+    **settings
+)
 ```

 #### Sample output
@ -417,16 +428,17 @@ The example plotting code requires matplotlib.
 flaml_y_pred = automl.predict(X_test)
 import matplotlib.pyplot as plt

-plt.plot(X_test, y_test, label='Actual level')
-plt.plot(X_test, flaml_y_pred, label='FLAML forecast')
-plt.xlabel('Date')
-plt.ylabel('CO2 Levels')
+plt.plot(X_test, y_test, label="Actual level")
+plt.plot(X_test, flaml_y_pred, label="FLAML forecast")
+plt.xlabel("Date")
+plt.ylabel("CO2 Levels")
 plt.legend()
 ```

 ![png](images/CO2.png)

 ### Multivariate Time Series (Forecasting with Exogenous Variables)
+
 ```python
 import pandas as pd

@ -444,6 +456,7 @@ multi_df["precip"] = multi_df["precip"].fillna(method="ffill")
 multi_df = multi_df[:-2]  # last two rows are NaN for 'demand' column so remove them
 multi_df = multi_df.reset_index()

+
 # Using temperature values create categorical values
 # where 1 denotes daily tempurature is above monthly average and 0 is below.
 def get_monthly_avg(data):
@ -452,8 +465,10 @@ def get_monthly_avg(data):
    data = data.agg({"temp": "mean"})
    return data

+
 monthly_avg = get_monthly_avg(multi_df).to_dict().get("temp")

+
 def above_monthly_avg(date, temp):
    month = date.month
    if temp > monthly_avg.get(month):
@ -461,6 +476,7 @@ def above_monthly_avg(date, temp):
    else:
        return 0

+
 multi_df["temp_above_monthly_avg"] = multi_df.apply(
    lambda x: above_monthly_avg(x["timeStamp"], x["temp"]), axis=1
 )
@ -536,6 +552,7 @@ print(automl.predict(multi_X_test))
 ```

 ### Forecasting Discrete Variables
+
 ```python
 from hcrystalball.utils import get_sales_data
 import numpy as np
@ -557,7 +574,10 @@ discrete_X_train, discrete_X_test = (
    discrete_train_df[["Date", "Open", "Promo", "Promo2"]],
    discrete_test_df[["Date", "Open", "Promo", "Promo2"]],
 )
-discrete_y_train, discrete_y_test = discrete_train_df["above_mean_sales"], discrete_test_df["above_mean_sales"]
+discrete_y_train, discrete_y_test = (
+    discrete_train_df["above_mean_sales"],
+    discrete_test_df["above_mean_sales"],
+)

 # initialize AutoML instance
 automl = AutoML()
@ -572,10 +592,9 @@ settings = {
 }

 # train the model
-automl.fit(X_train=discrete_X_train,
-           y_train=discrete_y_train,
-           **settings,
-           period=time_horizon)
+automl.fit(
+    X_train=discrete_X_train, y_train=discrete_y_train, **settings, period=time_horizon
+)

 # make predictions
 discrete_y_pred = automl.predict(discrete_X_test)
@ -713,6 +732,7 @@ def get_stalliion_data():
    )
    return data, special_days

+
 data, special_days = get_stalliion_data()
 time_horizon = 6  # predict six months
 training_cutoff = data["time_idx"].max() - time_horizon
--- a/website/docs/Examples/AutoML-for-LightGBM.md
+++ b/website/docs/Examples/AutoML-for-LightGBM.md
@ -2,7 +2,8 @@

 ### Prerequisites for this example

-Install the [automl] option.
+Install the \[automl\] option.
+
 ```bash
 pip install "flaml[automl] matplotlib openml"
 ```
@ -14,15 +15,15 @@ from flaml import AutoML
 from flaml.automl.data import load_openml_dataset

 # Download [houses dataset](https://www.openml.org/d/537) from OpenML. The task is to predict median price of the house in the region based on demographic composition and a state of housing market in the region.
-X_train, X_test, y_train, y_test = load_openml_dataset(dataset_id=537, data_dir='./')
+X_train, X_test, y_train, y_test = load_openml_dataset(dataset_id=537, data_dir="./")

 automl = AutoML()
 settings = {
    "time_budget": 60,  # total running time in seconds
-    "metric": 'r2',  # primary metrics for regression can be chosen from: ['mae','mse','r2']
-    "estimator_list": ['lgbm'],  # list of ML learners; we tune lightgbm in this example
-    "task": 'regression',  # task type
-    "log_file_name": 'houses_experiment.log',  # flaml log file
+    "metric": "r2",  # primary metrics for regression can be chosen from: ['mae','mse','r2']
+    "estimator_list": ["lgbm"],  # list of ML learners; we tune lightgbm in this example
+    "task": "regression",  # task type
+    "log_file_name": "houses_experiment.log",  # flaml log file
    "seed": 7654321,  # random seed
 }
 automl.fit(X_train=X_train, y_train=y_train, **settings)
@ -78,9 +79,9 @@ automl.fit(X_train=X_train, y_train=y_train, **settings)
 #### Retrieve best config

 ```python
-print('Best hyperparmeter config:', automl.best_config)
-print('Best r2 on validation data: {0:.4g}'.format(1-automl.best_loss))
-print('Training duration of best run: {0:.4g} s'.format(automl.best_config_train_time))
+print("Best hyperparmeter config:", automl.best_config)
+print("Best r2 on validation data: {0:.4g}".format(1 - automl.best_loss))
+print("Training duration of best run: {0:.4g} s".format(automl.best_config_train_time))
 print(automl.model.estimator)
 # Best hyperparmeter config: {'n_estimators': 363, 'num_leaves': 216, 'min_child_samples': 42, 'learning_rate': 0.09100963138990374, 'log_max_bin': 8, 'colsample_bytree': 0.8025848209352517, 'reg_alpha': 0.001113000336715291, 'reg_lambda': 76.50614276906414}
 # Best r2 on validation data: 0.8436
@ -96,15 +97,17 @@ print(automl.model.estimator)

 ```python
 import matplotlib.pyplot as plt
+
 plt.barh(automl.feature_names_in_, automl.feature_importances_)
 ```
+
 ![png](../Use-Cases/images/feature_importance.png)

 #### Compute predictions of testing dataset

 ```python
 y_pred = automl.predict(X_test)
-print('Predicted labels', y_pred)
+print("Predicted labels", y_pred)
 # Predicted labels [143391.65036562 245535.13731811 153171.44071629 ... 184354.52735963
 #  235510.49470445 282617.22858956]
 ```
@ -114,9 +117,9 @@ print('Predicted labels', y_pred)
 ```python
 from flaml.automl.ml import sklearn_metric_loss_score

-print('r2', '=', 1 - sklearn_metric_loss_score('r2', y_pred, y_test))
-print('mse', '=', sklearn_metric_loss_score('mse', y_pred, y_test))
-print('mae', '=', sklearn_metric_loss_score('mae', y_pred, y_test))
+print("r2", "=", 1 - sklearn_metric_loss_score("r2", y_pred, y_test))
+print("mse", "=", sklearn_metric_loss_score("mse", y_pred, y_test))
+print("mae", "=", sklearn_metric_loss_score("mae", y_pred, y_test))
 # r2 = 0.8505434326526395
 # mse = 1975592613.138005
 # mae = 29471.536046068788
@ -132,7 +135,7 @@ lgbm.fit(X_train, y_train)
 y_pred = lgbm.predict(X_test)
 from flaml.automl.ml import sklearn_metric_loss_score

-print('default lgbm r2', '=', 1 - sklearn_metric_loss_score('r2', y_pred, y_test))
+print("default lgbm r2", "=", 1 - sklearn_metric_loss_score("r2", y_pred, y_test))
 # default lgbm r2 = 0.8296179648694404
 ```

@ -152,6 +155,7 @@ plt.ylabel('Validation r2')
 plt.step(time_history, 1 - np.array(best_valid_loss_history), where='post')
 plt.show()
 ```
+
 ![png](images/lgbm_curve.png)

 ### Use a customized LightGBM learner
@ -199,8 +203,8 @@ class MyLGBM(LGBMEstimator):

 ```python
 automl = AutoML()
-automl.add_learner(learner_name='my_lgbm', learner_class=MyLGBM)
-settings["estimator_list"] = ['my_lgbm']  # change the estimator list
+automl.add_learner(learner_name="my_lgbm", learner_class=MyLGBM)
+settings["estimator_list"] = ["my_lgbm"]  # change the estimator list
 automl.fit(X_train=X_train, y_train=y_train, **settings)
 ```

--- a/website/docs/Examples/AutoML-for-XGBoost.md
+++ b/website/docs/Examples/AutoML-for-XGBoost.md
@ -2,7 +2,8 @@

 ### Prerequisites for this example

-Install the [automl] option.
+Install the \[automl\] option.
+
 ```bash
 pip install "flaml[automl] matplotlib openml"
 ```
@ -14,15 +15,17 @@ from flaml import AutoML
 from flaml.automl.data import load_openml_dataset

 # Download [houses dataset](https://www.openml.org/d/537) from OpenML. The task is to predict median price of the house in the region based on demographic composition and a state of housing market in the region.
-X_train, X_test, y_train, y_test = load_openml_dataset(dataset_id=537, data_dir='./')
+X_train, X_test, y_train, y_test = load_openml_dataset(dataset_id=537, data_dir="./")

 automl = AutoML()
 settings = {
    "time_budget": 60,  # total running time in seconds
-    "metric": 'r2',  # primary metrics for regression can be chosen from: ['mae','mse','r2']
-    "estimator_list": ['xgboost'],  # list of ML learners; we tune XGBoost in this example
-    "task": 'regression',  # task type
-    "log_file_name": 'houses_experiment.log',  # flaml log file
+    "metric": "r2",  # primary metrics for regression can be chosen from: ['mae','mse','r2']
+    "estimator_list": [
+        "xgboost"
+    ],  # list of ML learners; we tune XGBoost in this example
+    "task": "regression",  # task type
+    "log_file_name": "houses_experiment.log",  # flaml log file
    "seed": 7654321,  # random seed
 }
 automl.fit(X_train=X_train, y_train=y_train, **settings)
@ -101,9 +104,9 @@ automl.fit(X_train=X_train, y_train=y_train, **settings)
 #### Retrieve best config

 ```python
-print('Best hyperparmeter config:', automl.best_config)
-print('Best r2 on validation data: {0:.4g}'.format(1-automl.best_loss))
-print('Training duration of best run: {0:.4g} s'.format(automl.best_config_train_time))
+print("Best hyperparmeter config:", automl.best_config)
+print("Best r2 on validation data: {0:.4g}".format(1 - automl.best_loss))
+print("Training duration of best run: {0:.4g} s".format(automl.best_config_train_time))
 print(automl.model.estimator)
 # Best hyperparmeter config: {'n_estimators': 473, 'max_leaves': 35, 'max_depth': 0, 'min_child_weight': 0.001, 'learning_rate': 0.26865031351923346, 'subsample': 0.9718245679598786, 'colsample_bylevel': 0.7421362469066445, 'colsample_bytree': 1.0, 'reg_alpha': 0.06824336834995245, 'reg_lambda': 250.9654222583276}
 # Best r2 on validation data: 0.8384
@ -128,13 +131,14 @@ import matplotlib.pyplot as plt

 plt.barh(automl.feature_names_in_, automl.feature_importances_)
 ```
+
 ![png](images/xgb_feature_importance.png)

 #### Compute predictions of testing dataset

 ```python
 y_pred = automl.predict(X_test)
-print('Predicted labels', y_pred)
+print("Predicted labels", y_pred)
 # Predicted labels [139062.95 237622.   140522.03 ... 182125.5  252156.36 264884.5 ]
 ```

@ -143,9 +147,9 @@ print('Predicted labels', y_pred)
 ```python
 from flaml.automl.ml import sklearn_metric_loss_score

-print('r2', '=', 1 - sklearn_metric_loss_score('r2', y_pred, y_test))
-print('mse', '=', sklearn_metric_loss_score('mse', y_pred, y_test))
-print('mae', '=', sklearn_metric_loss_score('mae', y_pred, y_test))
+print("r2", "=", 1 - sklearn_metric_loss_score("r2", y_pred, y_test))
+print("mse", "=", sklearn_metric_loss_score("mse", y_pred, y_test))
+print("mae", "=", sklearn_metric_loss_score("mae", y_pred, y_test))
 # r2 = 0.8456494234135888
 # mse = 2040284106.2781258
 # mae = 30212.830996680445
@ -161,7 +165,7 @@ xgb.fit(X_train, y_train)
 y_pred = xgb.predict(X_test)
 from flaml.automl.ml import sklearn_metric_loss_score

-print('default xgboost r2', '=', 1 - sklearn_metric_loss_score('r2', y_pred, y_test))
+print("default xgboost r2", "=", 1 - sklearn_metric_loss_score("r2", y_pred, y_test))
 # default xgboost r2 = 0.8265451174596482
 ```

@ -181,6 +185,7 @@ plt.ylabel('Validation r2')
 plt.step(time_history, 1 - np.array(best_valid_loss_history), where='post')
 plt.show()
 ```
+
 ![png](images/xgb_curve.png)

 ### Use a customized XGBoost learner
@ -204,28 +209,26 @@ from flaml.automl.model import XGBoostEstimator


 class MyXGB1(XGBoostEstimator):
-    '''XGBoostEstimator with the logregobj function as the objective function
-    '''
+    """XGBoostEstimator with the logregobj function as the objective function"""

    def __init__(self, **config):
        super().__init__(objective=logregobj, **config)


 class MyXGB2(XGBoostEstimator):
-    '''XGBoostEstimator with 'reg:squarederror' as the objective function
-    '''
+    """XGBoostEstimator with 'reg:squarederror' as the objective function"""

    def __init__(self, **config):
-        super().__init__(objective='reg:gamma', **config)
+        super().__init__(objective="reg:gamma", **config)
 ```

 #### Add the customized learners and tune them

 ```python
 automl = AutoML()
-automl.add_learner(learner_name='my_xgb1', learner_class=MyXGB1)
-automl.add_learner(learner_name='my_xgb2', learner_class=MyXGB2)
-settings["estimator_list"] = ['my_xgb1', 'my_xgb2']  # change the estimator list
+automl.add_learner(learner_name="my_xgb1", learner_class=MyXGB1)
+automl.add_learner(learner_name="my_xgb2", learner_class=MyXGB2)
+settings["estimator_list"] = ["my_xgb1", "my_xgb2"]  # change the estimator list
 automl.fit(X_train=X_train, y_train=y_train, **settings)
 ```

--- a/website/docs/Examples/Default-Flamlized.md
+++ b/website/docs/Examples/Default-Flamlized.md
@ -6,7 +6,7 @@ Flamlized estimators automatically use data-dependent default hyperparameter con

 ### Prerequisites

-This example requires the [autozero] option.
+This example requires the \[autozero\] option.

 ```bash
 pip install flaml[autozero] lightgbm openml
@ -56,6 +56,7 @@ print(hyperparams)
 ```

 #### Sample output
+
 ```
 load dataset from ./openml_ds537.pkl
 Dataset name: houses
@ -83,7 +84,11 @@ X_train, X_test, y_train, y_test = load_openml_dataset(dataset_id=1169, data_dir
 xgb = XGBClassifier()
 xgb.fit(X_train, y_train)
 y_pred = xgb.predict(X_test)
-print("flamlized xgb accuracy", "=", 1 - sklearn_metric_loss_score("accuracy", y_pred, y_test))
+print(
+    "flamlized xgb accuracy",
+    "=",
+    1 - sklearn_metric_loss_score("accuracy", y_pred, y_test),
+)
 print(xgb)
 ```

--- a/website/docs/Examples/Integrate
+++ b/website/docs/Examples/Integrate
@ -2,16 +2,22 @@ FLAML can be used together with AzureML. On top of that, using mlflow and ray is

 ### Prerequisites

-Install the [automl,azureml] option.
+Install the \[automl,azureml\] option.
+
 ```bash
 pip install "flaml[automl,azureml]"
 ```

 Setup a AzureML workspace:
+
 ```python
 from azureml.core import Workspace

-ws = Workspace.create(name='myworkspace', subscription_id='<azure-subscription-id>', resource_group='myresourcegroup')
+ws = Workspace.create(
+    name="myworkspace",
+    subscription_id="<azure-subscription-id>",
+    resource_group="myresourcegroup",
+)
 ```

 ### Enable mlflow in AzureML workspace
@ -49,10 +55,14 @@ with mlflow.start_run() as run:  # create a mlflow run
 The metrics in the run will be automatically logged in an experiment named "flaml" in your AzureML workspace. They can be retrieved by `mlflow.search_runs`:

 ```python
-mlflow.search_runs(experiment_ids=[experiment.experiment_id], filter_string="params.learner = 'xgboost'")
+mlflow.search_runs(
+    experiment_ids=[experiment.experiment_id],
+    filter_string="params.learner = 'xgboost'",
+)
 ```

 The logged model can be loaded and used to make predictions:
+
 ```python
 automl = mlflow.sklearn.load_model(f"{run.info.artifact_uri}/automl")
 print(automl.predict(X_test))
@ -75,13 +85,18 @@ ray_environment_name = "aml-ray-cpu"
 ray_environment_dockerfile_path = "./Docker/Dockerfile-cpu"

 # Build CPU image for Ray
-ray_cpu_env = Environment.from_dockerfile(name=ray_environment_name, dockerfile=ray_environment_dockerfile_path)
+ray_cpu_env = Environment.from_dockerfile(
+    name=ray_environment_name, dockerfile=ray_environment_dockerfile_path
+)
 ray_cpu_env.register(workspace=ws)
 ray_cpu_build_details = ray_cpu_env.build(workspace=ws)

 import time
+
 while ray_cpu_build_details.status not in ["Succeeded", "Failed"]:
-    print(f"Awaiting completion of ray CPU environment build. Current status is: {ray_cpu_build_details.status}")
+    print(
+        f"Awaiting completion of ray CPU environment build. Current status is: {ray_cpu_build_details.status}"
+    )
    time.sleep(10)
 ```

@ -105,20 +120,23 @@ if compute_target_name in ws.compute_targets:
            print("Found compute target; using it:", compute_target_name)
        else:
            raise Exception(
-                "Found compute target but it is in state", compute_target.provisioning_state)
+                "Found compute target but it is in state",
+                compute_target.provisioning_state,
+            )
 else:
    print("creating a new compute target...")
    provisioning_config = AmlCompute.provisioning_configuration(
-        vm_size=compute_target_size,
-        min_nodes=0,
-        max_nodes=node_count)
+        vm_size=compute_target_size, min_nodes=0, max_nodes=node_count
+    )

    # Create the cluster
    compute_target = ComputeTarget.create(ws, compute_target_name, provisioning_config)

    # Can poll for a minimum number of nodes and for a specific timeout.
    # If no min node count is provided it will use the scale settings for the cluster
-    compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)
+    compute_target.wait_for_completion(
+        show_output=True, min_node_count=None, timeout_in_minutes=20
+    )

    # For a more detailed view of current AmlCompute status, use get_status()
    print(compute_target.get_status().serialize())
--- a/website/docs/Examples/Integrate
+++ b/website/docs/Examples/Integrate
@ -2,7 +2,8 @@ As FLAML's AutoML module can be used a transformer in the Sklearn's pipeline we

 ### Prerequisites

-Install the [automl] option.
+Install the \[automl\] option.
+
 ```bash
 pip install "flaml[automl] openml"
 ```
@ -14,7 +15,8 @@ from flaml.automl.data import load_openml_dataset

 # Download [Airlines dataset](https://www.openml.org/d/1169) from OpenML. The task is to predict whether a given flight will be delayed, given the information of the scheduled departure.
 X_train, X_test, y_train, y_test = load_openml_dataset(
-    dataset_id=1169, data_dir='./', random_state=1234, dataset_format='array')
+    dataset_id=1169, data_dir="./", random_state=1234, dataset_format="array"
+)
 ```

 ### Create a pipeline
@ -26,17 +28,15 @@ from sklearn.impute import SimpleImputer
 from sklearn.preprocessing import StandardScaler
 from flaml import AutoML

-set_config(display='diagram')
+set_config(display="diagram")

 imputer = SimpleImputer()
 standardizer = StandardScaler()
 automl = AutoML()

-automl_pipeline = Pipeline([
-    ("imputuer",imputer),
-    ("standardizer", standardizer),
-    ("automl", automl)
-])
+automl_pipeline = Pipeline(
+    [("imputuer", imputer), ("standardizer", standardizer), ("automl", automl)]
+)
 automl_pipeline
 ```

@ -52,9 +52,7 @@ automl_settings = {
    "estimator_list": ["xgboost", "catboost", "lgbm"],
    "log_file_name": "airlines_experiment.log",  # flaml log file
 }
-pipeline_settings = {
-    f"automl__{key}": value for key, value in automl_settings.items()
-}
+pipeline_settings = {f"automl__{key}": value for key, value in automl_settings.items()}
 automl_pipeline.fit(X_train, y_train, **pipeline_settings)
 ```

@ -63,10 +61,10 @@ automl_pipeline.fit(X_train, y_train, **pipeline_settings)
 ```python
 automl = automl_pipeline.steps[2][1]
 # Get the best config and best learner
-print('Best ML leaner:', automl.best_estimator)
-print('Best hyperparmeter config:', automl.best_config)
-print('Best accuracy on validation data: {0:.4g}'.format(1 - automl.best_loss))
-print('Training duration of best run: {0:.4g} s'.format(automl.best_config_train_time))
+print("Best ML leaner:", automl.best_estimator)
+print("Best hyperparmeter config:", automl.best_config)
+print("Best accuracy on validation data: {0:.4g}".format(1 - automl.best_loss))
+print("Training duration of best run: {0:.4g} s".format(automl.best_config_train_time))
 ```

 [Link to notebook](https://github.com/microsoft/FLAML/blob/main/notebook/integrate_sklearn.ipynb) | [Open in colab](https://colab.research.google.com/github/microsoft/FLAML/blob/main/notebook/integrate_sklearn.ipynb)
--- a/website/docs/Examples/Integrate
+++ b/website/docs/Examples/Integrate
@ -1,6 +1,7 @@
 # Integrate - Spark

 FLAML has integrated Spark for distributed training. There are two main aspects of integration with Spark:
+
 - Use Spark ML estimators for AutoML.
 - Use Spark to run training in parallel spark jobs.

@ -15,6 +16,7 @@ For Spark estimators, AutoML only consumes Spark data. FLAML provides a convenie
 This utility function takes data in the form of a `pandas.Dataframe` or `pyspark.sql.Dataframe` and converts it into a pandas-on-spark dataframe. It also takes `pandas.Series` or `pyspark.sql.Dataframe` and converts it into a [pandas-on-spark](https://spark.apache.org/docs/latest/api/python/user_guide/pandas_on_spark/index.html) series. If you pass in a `pyspark.pandas.Dataframe`, it will not make any changes.

 This function also accepts optional arguments `index_col` and `default_index_type`.
+
 - `index_col` is the column name to use as the index, default is None.
 - `default_index_type` is the default index type, default is "distributed-sequence". More info about default index type could be found on Spark official [documentation](https://spark.apache.org/docs/latest/api/python/user_guide/pandas_on_spark/options.html#default-index-type)

@ -23,10 +25,13 @@ Here is an example code snippet for Spark Data:
 ```python
 import pandas as pd
 from flaml.automl.spark.utils import to_pandas_on_spark
+
 # Creating a dictionary
-data = {"Square_Feet": [800, 1200, 1800, 1500, 850],
-      "Age_Years": [20, 15, 10, 7, 25],
-      "Price": [100000, 200000, 300000, 240000, 120000]}
+data = {
+    "Square_Feet": [800, 1200, 1800, 1500, 850],
+    "Age_Years": [20, 15, 10, 7, 25],
+    "Price": [100000, 200000, 300000, 240000, 120000],
+}

 # Creating a pandas DataFrame
 dataframe = pd.DataFrame(data)
@ -39,8 +44,10 @@ psdf = to_pandas_on_spark(dataframe)
 To use Spark ML models you need to format your data appropriately. Specifically, use [`VectorAssembler`](https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.ml.feature.VectorAssembler.html) to merge all feature columns into a single vector column.

 Here is an example of how to use it:
+
 ```python
 from pyspark.ml.feature import VectorAssembler
+
 columns = psdf.columns
 feature_cols = [col for col in columns if col != label]
 featurizer = VectorAssembler(inputCols=feature_cols, outputCol="features")
@ -50,10 +57,13 @@ psdf = featurizer.transform(psdf.to_spark(index_col="index"))["index", "features
 Later in conducting the experiment, use your pandas-on-spark data like non-spark data and pass them using `X_train, y_train` or `dataframe, label`.

 ### Estimators
+
 #### Model List
+
 - `lgbm_spark`: The class for fine-tuning Spark version LightGBM models, using [SynapseML](https://microsoft.github.io/SynapseML/docs/features/lightgbm/about/) API.

 #### Usage
+
 First, prepare your data in the required format as described in the previous section.

 By including the models you intend to try in the `estimators_list` argument to `flaml.automl`, FLAML will start trying configurations for these models. If your input is Spark data, FLAML will also use estimators with the `_spark` postfix by default, even if you haven't specified them.
@ -62,6 +72,7 @@ Here is an example code snippet using SparkML models in AutoML:

 ```python
 import flaml
+
 # prepare your data in pandas-on-spark format as we previously mentioned

 automl = flaml.AutoML()
@ -79,24 +90,25 @@ automl.fit(
 )
 ```

-
 [Link to notebook](https://github.com/microsoft/FLAML/blob/main/notebook/automl_bankrupt_synapseml.ipynb) | [Open in colab](https://colab.research.google.com/github/microsoft/FLAML/blob/main/notebook/automl_bankrupt_synapseml.ipynb)

 ## Parallel Spark Jobs
+
 You can activate Spark as the parallel backend during parallel tuning in both [AutoML](/docs/Use-Cases/Task-Oriented-AutoML#parallel-tuning) and [Hyperparameter Tuning](/docs/Use-Cases/Tune-User-Defined-Function#parallel-tuning), by setting the `use_spark` to `true`. FLAML will dispatch your job to the distributed Spark backend using [`joblib-spark`](https://github.com/joblib/joblib-spark).

 Please note that you should not set `use_spark` to `true` when applying AutoML and Tuning for Spark Data. This is because only SparkML models will be used for Spark Data in AutoML and Tuning. As SparkML models run in parallel, there is no need to distribute them with `use_spark` again.

 All the Spark-related arguments are stated below. These arguments are available in both Hyperparameter Tuning and AutoML:

-
 - `use_spark`: boolean, default=False | Whether to use spark to run the training in parallel spark jobs. This can be used to accelerate training on large models and large datasets, but will incur more overhead in time and thus slow down training in some cases. GPU training is not supported yet when use_spark is True. For Spark clusters, by default, we will launch one trial per executor. However, sometimes we want to launch more trials than the number of executors (e.g., local mode). In this case, we can set the environment variable `FLAML_MAX_CONCURRENT` to override the detected `num_executors`. The final number of concurrent trials will be the minimum of `n_concurrent_trials` and `num_executors`.
 - `n_concurrent_trials`: int, default=1 | The number of concurrent trials. When n_concurrent_trials > 1, FLAML performes parallel tuning.
 - `force_cancel`: boolean, default=False | Whether to forcely cancel Spark jobs if the search time exceeded the time budget. Spark jobs include parallel tuning jobs and Spark-based model training jobs.

 An example code snippet for using parallel Spark jobs:
+
 ```python
 import flaml
+
 automl_experiment = flaml.AutoML()
 automl_settings = {
    "time_budget": 30,
@ -104,7 +116,7 @@ automl_settings = {
    "task": "regression",
    "n_concurrent_trials": 2,
    "use_spark": True,
-    "force_cancel": True, # Activating the force_cancel option can immediately halt Spark jobs once they exceed the allocated time_budget.
+    "force_cancel": True,  # Activating the force_cancel option can immediately halt Spark jobs once they exceed the allocated time_budget.
 }

 automl.fit(
@ -114,5 +126,4 @@ automl.fit(
 )
 ```

-
 [Link to notebook](https://github.com/microsoft/FLAML/blob/main/notebook/integrate_spark.ipynb) | [Open in colab](https://colab.research.google.com/github/microsoft/FLAML/blob/main/notebook/integrate_spark.ipynb)
--- a/website/docs/Examples/Tune-AzureML-pipeline.md
+++ b/website/docs/Examples/Tune-AzureML-pipeline.md
@ -1,6 +1,6 @@
 # Tune - AzureML pipeline

-This example uses flaml to tune an Azure ML pipeline that fits a lightgbm classifier on the [sklearn breast cancer dataset](https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic)).
+This example uses flaml to tune an Azure ML pipeline that fits a lightgbm classifier on the [sklearn breast cancer dataset](<https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic)>).
 If you already have an Azure ML pipeline, you can use the approach to tune your pipeline with flaml.

 ## Prepare for tuning
@ -51,7 +51,7 @@ Dataset.File.upload_directory(
    overwrite=True,
 )

-dataset = Dataset.File.from_files(path=(datastore, 'classification_data'))
+dataset = Dataset.File.from_files(path=(datastore, "classification_data"))
 ```

 ### Configurations for the pipeline
@ -124,7 +124,6 @@ def tune_pipeline(concurrent_run=1):
    mode = "max"
    num_samples = 2

-
    if concurrent_run > 1:
        import ray  # For parallel tuning

@ -158,8 +157,7 @@ The interaction between FLAML and AzureML pipeline jobs is in `tuner_func.run_wi

 ```python
 def run_with_config(config: dict):
-    """Run the pipeline with a given config dict
-    """
+    """Run the pipeline with a given config dict"""

    # pass the hyperparameters to AzureML jobs by overwriting the config file.
    overrides = [f"{key}={value}" for key, value in config.items()]
@ -174,25 +172,25 @@ def run_with_config(config: dict):
    while not stop:
        # get status
        status = run._core_run.get_status()
-        print(f'status: {status}')
+        print(f"status: {status}")

        # get metrics
        metrics = run._core_run.get_metrics(recursive=True)
        if metrics:
            run_metrics = list(metrics.values())

-            new_metric = run_metrics[0]['eval_binary_error']
+            new_metric = run_metrics[0]["eval_binary_error"]

            if type(new_metric) == list:
                new_metric = new_metric[-1]

-            print(f'eval_binary_error: {new_metric}')
+            print(f"eval_binary_error: {new_metric}")

            tune.report(eval_binary_error=new_metric)

        time.sleep(5)

-        if status == 'FAILED' or status == 'Completed':
+        if status == "FAILED" or status == "Completed":
            stop = True

    print("The run is terminated.")
--- a/website/docs/Examples/Tune-HuggingFace.md
+++ b/website/docs/Examples/Tune-HuggingFace.md
@ -9,6 +9,7 @@ It may be easier to use that API unless you have special requirements not handle
 ### Requirements

 This example requires GPU. Install dependencies:
+
 ```python
 pip install torch transformers datasets "flaml[blendsearch,ray]"
 ```
@ -24,6 +25,7 @@ MODEL_NAME = "distilbert-base-uncased"
 tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, use_fast=True)
 COLUMN_NAME = "sentence"

+
 def tokenize(examples):
    return tokenizer(examples[COLUMN_NAME], truncation=True)
 ```
@ -38,6 +40,7 @@ from transformers import AutoModelForSequenceClassification
 TASK = "cola"
 NUM_LABELS = 2

+
 def train_distilbert(config: dict):
    # Load CoLA dataset and apply tokenizer
    cola_raw = datasets.load_dataset("glue", TASK)
@ -55,7 +58,7 @@ def train_distilbert(config: dict):
        return metric.compute(predictions=predictions, references=labels)

    training_args = TrainingArguments(
-        output_dir='.',
+        output_dir=".",
        do_eval=False,
        disable_tqdm=True,
        logging_steps=20000,
@ -96,12 +99,12 @@ We are now ready to define our search. This includes:
 ```python
 max_num_epoch = 64
 search_space = {
-        # You can mix constants with search space objects.
-        "num_train_epochs": flaml.tune.loguniform(1, max_num_epoch),
-        "learning_rate": flaml.tune.loguniform(1e-6, 1e-4),
-        "adam_epsilon": flaml.tune.loguniform(1e-9, 1e-7),
-        "adam_beta1": flaml.tune.uniform(0.8, 0.99),
-        "adam_beta2": flaml.tune.loguniform(98e-2, 9999e-4),
+    # You can mix constants with search space objects.
+    "num_train_epochs": flaml.tune.loguniform(1, max_num_epoch),
+    "learning_rate": flaml.tune.loguniform(1e-6, 1e-4),
+    "adam_epsilon": flaml.tune.loguniform(1e-9, 1e-7),
+    "adam_beta1": flaml.tune.uniform(0.8, 0.99),
+    "adam_beta2": flaml.tune.loguniform(98e-2, 9999e-4),
 }

 # optimization objective
@ -131,9 +134,10 @@ analysis = flaml.tune.run(
        space=search_space,
        metric=HP_METRIC,
        mode=MODE,
-        low_cost_partial_config={"num_train_epochs": 1}),
+        low_cost_partial_config={"num_train_epochs": 1},
+    ),
    resources_per_trial={"gpu": num_gpus, "cpu": num_cpus},
-    local_dir='logs/',
+    local_dir="logs/",
    num_samples=num_samples,
    time_budget_s=time_budget_s,
    use_ray=True,
@ -141,6 +145,7 @@ analysis = flaml.tune.run(
 ```

 This will run tuning for one hour. At the end we will see a summary.
+
 ```
 == Status ==
 Memory usage on this node: 32.0/251.6 GiB
--- a/website/docs/Examples/Tune-Lexicographic-objectives.md
+++ b/website/docs/Examples/Tune-Lexicographic-objectives.md
@ -5,6 +5,7 @@
 ```python
 pip install "flaml>=1.1.0" thop torchvision torch
 ```
+
 Tuning multiple objectives with Lexicographic preference is a new feature added in version 1.1.0 and is subject to change in future versions.

 ## Tuning accurate and efficient neural networks with lexicographic preference
@ -100,8 +101,6 @@ def eval_model(model, valid_loader):
    return np.log2(flops), 1 - accuracy, params
 ```

-
-
 ### Evaluation function

 ```python
@ -116,6 +115,7 @@ def evaluate_function(configuration):
 ```

 ### Search space
+
 ```python
 search_space = {
    "n_layers": tune.randint(lower=1, upper=3),
@ -133,7 +133,6 @@ search_space = {
 ### Launch the tuning process

 ```python
-
 # Low cost initial point
 low_cost_partial_config = {
    "n_layers": 1,
@ -155,10 +154,10 @@ analysis = tune.run(
    evaluate_function,
    num_samples=-1,
    time_budget_s=100,
-    config=search_space, # search space of NN
+    config=search_space,  # search space of NN
    use_ray=False,
    lexico_objectives=lexico_objectives,
-    low_cost_partial_config=low_cost_partial_config, # low cost initial point
+    low_cost_partial_config=low_cost_partial_config,  # low cost initial point
 )
 ```

--- a/website/docs/Examples/Tune-PyTorch.md
+++ b/website/docs/Examples/Tune-PyTorch.md
@ -5,6 +5,7 @@ This example uses flaml to tune a pytorch model on CIFAR10.
 ## Prepare for tuning

 ### Requirements
+
 ```bash
 pip install torchvision "flaml[blendsearch,ray]"
 ```
@ -24,7 +25,6 @@ import torchvision.transforms as transforms


 class Net(nn.Module):
-
    def __init__(self, l1=120, l2=84):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
@ -48,16 +48,17 @@ class Net(nn.Module):

 ```python
 def load_data(data_dir="data"):
-    transform = transforms.Compose([
-        transforms.ToTensor(),
-        transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
-    ])
+    transform = transforms.Compose(
+        [transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]
+    )

    trainset = torchvision.datasets.CIFAR10(
-        root=data_dir, train=True, download=True, transform=transform)
+        root=data_dir, train=True, download=True, transform=transform
+    )

    testset = torchvision.datasets.CIFAR10(
-        root=data_dir, train=False, download=True, transform=transform)
+        root=data_dir, train=False, download=True, transform=transform
+    )

    return trainset, testset
 ```
@ -67,10 +68,11 @@ def load_data(data_dir="data"):
 ```python
 from ray import tune

+
 def train_cifar(config, checkpoint_dir=None, data_dir=None):
    if "l1" not in config:
        logger.warning(config)
-    net = Net(2**config["l1"], 2**config["l2"])
+    net = Net(2 ** config["l1"], 2 ** config["l2"])

    device = "cpu"
    if torch.cuda.is_available():
@ -94,20 +96,25 @@ def train_cifar(config, checkpoint_dir=None, data_dir=None):

    test_abs = int(len(trainset) * 0.8)
    train_subset, val_subset = random_split(
-        trainset, [test_abs, len(trainset) - test_abs])
+        trainset, [test_abs, len(trainset) - test_abs]
+    )

    trainloader = torch.utils.data.DataLoader(
        train_subset,
-        batch_size=int(2**config["batch_size"]),
+        batch_size=int(2 ** config["batch_size"]),
        shuffle=True,
-        num_workers=4)
+        num_workers=4,
+    )
    valloader = torch.utils.data.DataLoader(
        val_subset,
-        batch_size=int(2**config["batch_size"]),
+        batch_size=int(2 ** config["batch_size"]),
        shuffle=True,
-        num_workers=4)
+        num_workers=4,
+    )

-    for epoch in range(int(round(config["num_epochs"]))):  # loop over the dataset multiple times
+    for epoch in range(
+        int(round(config["num_epochs"]))
+    ):  # loop over the dataset multiple times
        running_loss = 0.0
        epoch_steps = 0
        for i, data in enumerate(trainloader, 0):
@ -128,8 +135,10 @@ def train_cifar(config, checkpoint_dir=None, data_dir=None):
            running_loss += loss.item()
            epoch_steps += 1
            if i % 2000 == 1999:  # print every 2000 mini-batches
-                print("[%d, %5d] loss: %.3f" % (epoch + 1, i + 1,
-                                                running_loss / epoch_steps))
+                print(
+                    "[%d, %5d] loss: %.3f"
+                    % (epoch + 1, i + 1, running_loss / epoch_steps)
+                )
                running_loss = 0.0

        # Validation loss
@ -156,8 +165,7 @@ def train_cifar(config, checkpoint_dir=None, data_dir=None):
        # parameter in future iterations.
        with tune.checkpoint_dir(step=epoch) as checkpoint_dir:
            path = os.path.join(checkpoint_dir, "checkpoint")
-            torch.save(
-                (net.state_dict(), optimizer.state_dict()), path)
+            torch.save((net.state_dict(), optimizer.state_dict()), path)

        tune.report(loss=(val_loss / val_steps), accuracy=correct / total)
    print("Finished Training")
@ -170,7 +178,8 @@ def _test_accuracy(net, device="cpu"):
    trainset, testset = load_data()

    testloader = torch.utils.data.DataLoader(
-        testset, batch_size=4, shuffle=False, num_workers=2)
+        testset, batch_size=4, shuffle=False, num_workers=2
+    )

    correct = 0
    total = 0
@ -202,20 +211,22 @@ load_data(data_dir)  # Download data for all trials before starting the run
 ```python
 max_num_epoch = 100
 config = {
-    "l1": tune.randint(2, 9),   # log transformed with base 2
-    "l2": tune.randint(2, 9),   # log transformed with base 2
+    "l1": tune.randint(2, 9),  # log transformed with base 2
+    "l2": tune.randint(2, 9),  # log transformed with base 2
    "lr": tune.loguniform(1e-4, 1e-1),
    "num_epochs": tune.loguniform(1, max_num_epoch),
-    "batch_size": tune.randint(1, 5)    # log transformed with base 2
+    "batch_size": tune.randint(1, 5),  # log transformed with base 2
 }
 ```

 ### Budget and resource constraints

 ```python
-time_budget_s = 600     # time budget in seconds
-gpus_per_trial = 0.5    # number of gpus for each trial; 0.5 means two training jobs can share one gpu
-num_samples = 500       # maximal number of trials
+time_budget_s = 600  # time budget in seconds
+gpus_per_trial = (
+    0.5  # number of gpus for each trial; 0.5 means two training jobs can share one gpu
+)
+num_samples = 500  # maximal number of trials
 np.random.seed(7654321)
 ```

@ -223,6 +234,7 @@ np.random.seed(7654321)

 ```python
 import time
+
 start_time = time.time()
 result = flaml.tune.run(
    tune.with_parameters(train_cifar, data_dir=data_dir),
@ -234,10 +246,11 @@ result = flaml.tune.run(
    min_resource=1,
    scheduler="asha",  # Use asha scheduler to perform early stopping based on intermediate results reported
    resources_per_trial={"cpu": 1, "gpu": gpus_per_trial},
-    local_dir='logs/',
+    local_dir="logs/",
    num_samples=num_samples,
    time_budget_s=time_budget_s,
-    use_ray=True)
+    use_ray=True,
+)
 ```

 ### Check the result
@ -247,13 +260,18 @@ print(f"#trials={len(result.trials)}")
 print(f"time={time.time()-start_time}")
 best_trial = result.get_best_trial("loss", "min", "all")
 print("Best trial config: {}".format(best_trial.config))
-print("Best trial final validation loss: {}".format(
-    best_trial.metric_analysis["loss"]["min"]))
-print("Best trial final validation accuracy: {}".format(
-    best_trial.metric_analysis["accuracy"]["max"]))
+print(
+    "Best trial final validation loss: {}".format(
+        best_trial.metric_analysis["loss"]["min"]
+    )
+)
+print(
+    "Best trial final validation accuracy: {}".format(
+        best_trial.metric_analysis["accuracy"]["max"]
+    )
+)

-best_trained_model = Net(2**best_trial.config["l1"],
-                         2**best_trial.config["l2"])
+best_trained_model = Net(2 ** best_trial.config["l1"], 2 ** best_trial.config["l2"])
 device = "cpu"
 if torch.cuda.is_available():
    device = "cuda:0"
@ -261,7 +279,9 @@ if torch.cuda.is_available():
        best_trained_model = nn.DataParallel(best_trained_model)
 best_trained_model.to(device)

-checkpoint_value = getattr(best_trial.checkpoint, "dir_or_data", None) or best_trial.checkpoint.value
+checkpoint_value = (
+    getattr(best_trial.checkpoint, "dir_or_data", None) or best_trial.checkpoint.value
+)
 checkpoint_path = os.path.join(checkpoint_value, "checkpoint")

 model_state, optimizer_state = torch.load(checkpoint_path)
--- a/website/docs/FAQ.md
+++ b/website/docs/FAQ.md
@ -6,26 +6,24 @@

 ### [Guidelines on creating and tuning a custom estimator](Use-Cases/Task-Oriented-AutoML#guidelines-on-tuning-a-custom-estimator)

-
 ### About `low_cost_partial_config` in `tune`.

 - Definition and purpose: The `low_cost_partial_config` is a dictionary of subset of the hyperparameter coordinates whose value corresponds to a configuration with known low-cost (i.e., low computation cost for training the corresponding model).  The concept of low/high-cost is meaningful in the case where a subset of the hyperparameters to tune directly affects the computation cost for training the model. For example, `n_estimators` and `max_leaves` are known to affect the training cost of tree-based learners. We call this subset of hyperparameters, *cost-related hyperparameters*. In such scenarios, if you are aware of low-cost configurations for the cost-related hyperparameters, you are recommended to set them as the `low_cost_partial_config`. Using the tree-based method example again, since we know that small `n_estimators` and  `max_leaves` generally correspond to simpler models and thus lower cost, we set `{'n_estimators': 4, 'max_leaves': 4}` as the `low_cost_partial_config` by default (note that `4` is the lower bound of search space for these two hyperparameters), e.g., in [LGBM](https://github.com/microsoft/FLAML/blob/main/flaml/model.py#L215).  Configuring `low_cost_partial_config` helps the search algorithms make more cost-efficient choices.
-In AutoML, the `low_cost_init_value` in `search_space()` function for each estimator serves the same role.
+  In AutoML, the `low_cost_init_value` in `search_space()` function for each estimator serves the same role.

 - Usage in practice: It is recommended to configure it if there are cost-related hyperparameters in your tuning task and you happen to know the low-cost values for them, but it is not required (It is fine to leave it the default value, i.e., `None`).

 - How does it work: `low_cost_partial_config` if configured, will be used as an initial point of the search. It also affects the search trajectory. For more details about how does it play a role in the search algorithms, please refer to the papers about the search algorithms used: Section 2 of [Frugal Optimization for Cost-related Hyperparameters (CFO)](https://arxiv.org/pdf/2005.01571.pdf) and Section 3 of [Economical Hyperparameter Optimization with Blended Search Strategy (BlendSearch)](https://openreview.net/pdf?id=VbLH04pRA3).

-
 ### How does FLAML handle imbalanced data (unequal distribution of target classes in classification task)?

 Currently FLAML does several things for imbalanced data.

 1. When a class contains fewer than 20 examples, we repeatedly add these examples to the training data until the count is at least 20.
-2. We use stratified sampling when doing holdout and kf.
-3. We make sure no class is empty in both training and holdout data.
-4. We allow users to pass `sample_weight` to `AutoML.fit()`.
-5. User can customize the weight of each class by setting the `custom_hp` or `fit_kwargs_by_estimator` arguments. For example, the following code sets the weight for pos vs. neg as 2:1 for the RandomForest estimator:
+1. We use stratified sampling when doing holdout and kf.
+1. We make sure no class is empty in both training and holdout data.
+1. We allow users to pass `sample_weight` to `AutoML.fit()`.
+1. User can customize the weight of each class by setting the `custom_hp` or `fit_kwargs_by_estimator` arguments. For example, the following code sets the weight for pos vs. neg as 2:1 for the RandomForest estimator:

 ```python
 from flaml import AutoML
@ -47,35 +45,28 @@ automl_settings["custom_hp"] = {
            "init_value": 0.5,
        }
    },
-    "rf": {
-        "class_weight": {
-            "domain": "balanced",
-            "init_value": "balanced"
-        }
-    }
+    "rf": {"class_weight": {"domain": "balanced", "init_value": "balanced"}},
 }
 print(automl.model)
 ```

-
 ### How to interpret model performance? Is it possible for me to visualize feature importance, SHAP values, optimization history?

-You can use ```automl.model.estimator.feature_importances_``` to get the `feature_importances_` for the best model found by automl. See an [example](Examples/AutoML-for-XGBoost#plot-feature-importance).
+You can use `automl.model.estimator.feature_importances_` to get the `feature_importances_` for the best model found by automl. See an [example](Examples/AutoML-for-XGBoost#plot-feature-importance).

 Packages such as `azureml-interpret` and `sklearn.inspection.permutation_importance` can be used on `automl.model.estimator` to explain the selected model.
 Model explanation is frequently asked and adding a native support may be a good feature. Suggestions/contributions are welcome.

 Optimization history can be checked from the [log](Use-Cases/Task-Oriented-AutoML#log-the-trials). You can also [retrieve the log and plot the learning curve](Use-Cases/Task-Oriented-AutoML#plot-learning-curve).

-
 ### How to resolve out-of-memory error in `AutoML.fit()`

-* Set `free_mem_ratio` a float between 0 and 1. For example, 0.2 means try to keep free memory above 20% of total memory. Training may be early stopped for memory consumption reason when this is set.
-* Set `model_history` False.
-* If your data are already preprocessed, set `skip_transform` False. If you can preprocess the data before the fit starts, this setting can save memory needed for preprocessing in `fit`.
-* If the OOM error only happens for some particular trials:
-    - set `use_ray` True. This will increase the overhead per trial but can keep the AutoML process running when a single trial fails due to OOM error.
-    - provide a more accurate [`size`](reference/automl/model#size) function for the memory bytes consumption of each config for the estimator causing this error.
-    - modify the [search space](Use-Cases/Task-Oriented-AutoML#a-shortcut-to-override-the-search-space) for the estimators causing this error.
-    - or remove this estimator from the `estimator_list`.
-* If the OOM error happens when ensembling, consider disabling ensemble, or use a cheaper ensemble option. ([Example](Use-Cases/Task-Oriented-AutoML#ensemble)).
+- Set `free_mem_ratio` a float between 0 and 1. For example, 0.2 means try to keep free memory above 20% of total memory. Training may be early stopped for memory consumption reason when this is set.
+- Set `model_history` False.
+- If your data are already preprocessed, set `skip_transform` False. If you can preprocess the data before the fit starts, this setting can save memory needed for preprocessing in `fit`.
+- If the OOM error only happens for some particular trials:
+  - set `use_ray` True. This will increase the overhead per trial but can keep the AutoML process running when a single trial fails due to OOM error.
+  - provide a more accurate [`size`](reference/automl/model#size) function for the memory bytes consumption of each config for the estimator causing this error.
+  - modify the [search space](Use-Cases/Task-Oriented-AutoML#a-shortcut-to-override-the-search-space) for the estimators causing this error.
+  - or remove this estimator from the `estimator_list`.
+- If the OOM error happens when ensembling, consider disabling ensemble, or use a cheaper ensemble option. ([Example](Use-Cases/Task-Oriented-AutoML#ensemble)).
--- a/website/docs/Getting-Started.md
+++ b/website/docs/Getting-Started.md
@ -8,9 +8,9 @@ and optimizes their performance.

 ### Main Features

-* FLAML enables building next-gen GPT-X applications based on multi-agent conversations with minimal effort. It simplifies the orchestration, automation and optimization of a complex GPT-X workflow. It maximizes the performance of GPT-X models and augments their weakness.
-* For common machine learning tasks like classification and regression, it quickly finds quality models for user-provided data with low computational resources. It is easy to customize or extend.
-* It supports fast and economical automatic tuning, capable of handling large search space with heterogeneous evaluation cost and complex constraints/guidance/early stopping.
+- FLAML enables building next-gen GPT-X applications based on multi-agent conversations with minimal effort. It simplifies the orchestration, automation and optimization of a complex GPT-X workflow. It maximizes the performance of GPT-X models and augments their weakness.
+- For common machine learning tasks like classification and regression, it quickly finds quality models for user-provided data with low computational resources. It is easy to customize or extend.
+- It supports fast and economical automatic tuning, capable of handling large search space with heterogeneous evaluation cost and complex constraints/guidance/early stopping.

 FLAML is powered by a series of [research studies](/docs/Research) from Microsoft Research and collaborators such as Penn State University, Stevens Institute of Technology, University of Washington, and University of Waterloo.

@ -25,15 +25,21 @@ There are several ways of using flaml:
 Autogen enables the next-gen GPT-X applications with a generic multi-agent conversation framework.
 It offers customizable and conversable agents which integrate LLMs, tools and human.
 By automating chat among multiple capable agents, one can easily make them collectively perform tasks autonomously or with human feedback, including tasks that require using tools via code. For example,
+
 ```python
 from flaml import autogen
+
 assistant = autogen.AssistantAgent("assistant")
 user_proxy = autogen.UserProxyAgent("user_proxy")
-user_proxy.initiate_chat(assistant, message="Show me the YTD gain of 10 largest technology companies as of today.")
+user_proxy.initiate_chat(
+    assistant,
+    message="Show me the YTD gain of 10 largest technology companies as of today.",
+)
 # This initiates an automated chat between the two agents to solve the task
 ```

 Autogen also helps maximize the utility out of the expensive LLMs such as ChatGPT and GPT-4. It offers a drop-in replacement of `openai.Completion` or `openai.ChatCompletion` with powerful functionalites like tuning, caching, error handling, templating. For example, you can optimize generations by LLM with your own tuning data, success metrics and budgets.
+
 ```python
 # perform tuning
 config, analysis = autogen.Completion.tune(
@ -55,6 +61,7 @@ With three lines of code, you can start using this economical and fast AutoML en

 ```python
 from flaml import AutoML
+
 automl = AutoML()
 automl.fit(X_train, y_train, task="classification", time_budget=60)
 ```
@ -63,7 +70,14 @@ It automatically tunes the hyperparameters and selects the best model from defau

 ```python
 automl.add_learner("mylgbm", MyLGBMEstimator)
-automl.fit(X_train, y_train, task="classification", metric=custom_metric, estimator_list=["mylgbm"], time_budget=60)
+automl.fit(
+    X_train,
+    y_train,
+    task="classification",
+    metric=custom_metric,
+    estimator_list=["mylgbm"],
+    time_budget=60,
+)
 ```

 #### [Tune user-defined function](/docs/Use-Cases/Tune-User-Defined-Function)
@ -91,7 +105,9 @@ def train_lgbm(config: dict) -> dict:
 # load a built-in search space from flaml
 flaml_lgbm_search_space = LGBMEstimator.search_space(X_train.shape)
 # specify the search space as a dict from hp name to domain; you can define your own search space same way
-config_search_space = {hp: space["domain"] for hp, space in flaml_lgbm_search_space.items()}
+config_search_space = {
+    hp: space["domain"] for hp, space in flaml_lgbm_search_space.items()
+}
 # give guidance about hp values corresponding to low training cost, i.e., {"n_estimators": 4, "num_leaves": 4}
 low_cost_partial_config = {
    hp: space["low_cost_init_value"]
@ -100,10 +116,16 @@ low_cost_partial_config = {
 }
 # run the tuning, minimizing mse, with total time budget 3 seconds
 analysis = tune.run(
-    train_lgbm, metric="mse", mode="min", config=config_search_space,
-    low_cost_partial_config=low_cost_partial_config, time_budget_s=3, num_samples=-1,
+    train_lgbm,
+    metric="mse",
+    mode="min",
+    config=config_search_space,
+    low_cost_partial_config=low_cost_partial_config,
+    time_budget_s=3,
+    num_samples=-1,
 )
 ```
+
 Please see this [script](https://github.com/microsoft/FLAML/blob/main/test/tune_example.py) for the complete version of the above example.

 #### [Zero-shot AutoML](/docs/Use-Cases/Zero-Shot-AutoML)
@ -118,10 +140,10 @@ Then, you can use it just like you use the original `LGMBClassifier`. Your other

 ### Where to Go Next?

-* Understand the use cases for [AutoGen](https://microsoft.github.io/autogen/), [Task-oriented AutoML](/docs/Use-Cases/Task-Oriented-Automl), [Tune user-defined function](/docs/Use-Cases/Tune-User-Defined-Function) and [Zero-shot AutoML](/docs/Use-Cases/Zero-Shot-AutoML).
-* Find code examples under "Examples": from [AutoGen - AgentChat](/docs/Examples/AutoGen-AgentChat) to [Tune - PyTorch](/docs/Examples/Tune-PyTorch).
-* Learn about [research](/docs/Research) around FLAML and check [blogposts](/blog).
-* Chat on [Discord](https://discord.gg/Cppx2vSPVP).
+- Understand the use cases for [AutoGen](https://microsoft.github.io/autogen/), [Task-oriented AutoML](/docs/Use-Cases/Task-Oriented-Automl), [Tune user-defined function](/docs/Use-Cases/Tune-User-Defined-Function) and [Zero-shot AutoML](/docs/Use-Cases/Zero-Shot-AutoML).
+- Find code examples under "Examples": from [AutoGen - AgentChat](/docs/Examples/AutoGen-AgentChat) to [Tune - PyTorch](/docs/Examples/Tune-PyTorch).
+- Learn about [research](/docs/Research) around FLAML and check [blogposts](/blog).
+- Chat on [Discord](https://discord.gg/Cppx2vSPVP).

 If you like our project, please give it a [star](https://github.com/microsoft/FLAML/stargazers) on GitHub. If you are interested in contributing, please read [Contributor's Guide](/docs/Contribute).

--- a/website/docs/Installation.md
+++ b/website/docs/Installation.md
@ -9,6 +9,7 @@ pip install flaml
 ```

 or conda:
+
 ```
 conda install flaml -c conda-forge
 ```
@ -29,23 +30,32 @@ pip install "flaml[automl]"

 #### Extra learners/models

-* openai models
+- openai models
+
 ```bash
 pip install "flaml[openai]"
 ```
-* catboost
+
+- catboost
+
 ```bash
 pip install "flaml[catboost]"
 ```
-* vowpal wabbit
+
+- vowpal wabbit
+
 ```bash
 pip install "flaml[vw]"
 ```
-* time series forecaster: prophet, statsmodels
+
+- time series forecaster: prophet, statsmodels
+
 ```bash
 pip install "flaml[forecast]"
 ```
-* huggingface transformers
+
+- huggingface transformers
+
 ```bash
 pip install "flaml[hf]"
 ```
@ -53,7 +63,7 @@ pip install "flaml[hf]"
 #### Notebook

 To run the [notebook examples](https://github.com/microsoft/FLAML/tree/main/notebook),
-install flaml with the [notebook] option:
+install flaml with the \[notebook\] option:

 ```bash
 pip install "flaml[notebook]"
@ -61,12 +71,16 @@ pip install "flaml[notebook]"

 #### Distributed tuning

-* ray
+- ray
+
 ```bash
 pip install "flaml[ray]"
 ```
-* spark
+
+- spark
+
 > *Spark support is added in v1.1.0*
+
 ```bash
 pip install "flaml[spark]>=1.1.0"
 ```
@ -75,6 +89,7 @@ For cloud platforms such as [Azure Synapse](https://azure.microsoft.com/en-us/pr
 But you may also need to install `Spark` manually when setting up your own environment.
 For latest Ubuntu system, you can install Spark 3.3.0 standalone version with below script.
 For more details of installing Spark, please refer to [Spark Doc](https://spark.apache.org/docs/latest/api/python/getting_started/install.html).
+
 ```bash
 sudo apt-get update && sudo apt-get install -y --allow-downgrades --allow-change-held-packages --no-install-recommends \
    ca-certificates-java ca-certificates openjdk-17-jdk-headless \
@ -87,28 +102,36 @@ export PYTHONPATH=/spark/python/lib/py4j-0.10.9.5-src.zip:/spark/python
 export PATH=$PATH:$SPARK_HOME/bin
 ```

-* nni
+- nni
+
 ```bash
 pip install "flaml[nni]"
 ```
-* blendsearch
+
+- blendsearch
+
 ```bash
 pip install "flaml[blendsearch]"
 ```

-* synapse
+- synapse
+
 > *To install flaml in Azure Synapse and similar cloud platform*
+
 ```bash
 pip install flaml[synapse]
 ```

 #### Test and Benchmark

-* test
+- test
+
 ```bash
 pip install flaml[test]
 ```
-* benchmark
+
+- benchmark
+
 ```bash
 pip install flaml[benchmark]
 ```
--- a/website/docs/Research.md
+++ b/website/docs/Research.md
@ -2,7 +2,7 @@

 For technical details, please check our research publications.

-* [FLAML: A Fast and Lightweight AutoML Library](https://www.microsoft.com/en-us/research/publication/flaml-a-fast-and-lightweight-automl-library/). Chi Wang, Qingyun Wu, Markus Weimer, Erkang Zhu. MLSys 2021.
+- [FLAML: A Fast and Lightweight AutoML Library](https://www.microsoft.com/en-us/research/publication/flaml-a-fast-and-lightweight-automl-library/). Chi Wang, Qingyun Wu, Markus Weimer, Erkang Zhu. MLSys 2021.

 ```bibtex
@inproceedings{wang2021flaml,
@ -13,7 +13,7 @@ For technical details, please check our research publications.
 }
 ```

-* [Frugal Optimization for Cost-related Hyperparameters](https://arxiv.org/abs/2005.01571). Qingyun Wu, Chi Wang, Silu Huang. AAAI 2021.
+- [Frugal Optimization for Cost-related Hyperparameters](https://arxiv.org/abs/2005.01571). Qingyun Wu, Chi Wang, Silu Huang. AAAI 2021.

 ```bibtex
@inproceedings{wu2021cfo,
@ -24,7 +24,7 @@ For technical details, please check our research publications.
 }
 ```

-* [Economical Hyperparameter Optimization With Blended Search Strategy](https://www.microsoft.com/en-us/research/publication/economical-hyperparameter-optimization-with-blended-search-strategy/). Chi Wang, Qingyun Wu, Silu Huang, Amin Saied. ICLR 2021.
+- [Economical Hyperparameter Optimization With Blended Search Strategy](https://www.microsoft.com/en-us/research/publication/economical-hyperparameter-optimization-with-blended-search-strategy/). Chi Wang, Qingyun Wu, Silu Huang, Amin Saied. ICLR 2021.

 ```bibtex
@inproceedings{wang2021blendsearch,
@ -35,7 +35,7 @@ For technical details, please check our research publications.
 }
 ```

-* [An Empirical Study on Hyperparameter Optimization for Fine-Tuning Pre-trained Language Models](https://aclanthology.org/2021.acl-long.178.pdf). Susan Xueqing Liu, Chi Wang. ACL 2021.
+- [An Empirical Study on Hyperparameter Optimization for Fine-Tuning Pre-trained Language Models](https://aclanthology.org/2021.acl-long.178.pdf). Susan Xueqing Liu, Chi Wang. ACL 2021.

 ```bibtex
@inproceedings{liuwang2021hpolm,
@ -46,7 +46,7 @@ For technical details, please check our research publications.
 }
 ```

-* [ChaCha for Online AutoML](https://www.microsoft.com/en-us/research/publication/chacha-for-online-automl/). Qingyun Wu, Chi Wang, John Langford, Paul Mineiro and Marco Rossi. ICML 2021.
+- [ChaCha for Online AutoML](https://www.microsoft.com/en-us/research/publication/chacha-for-online-automl/). Qingyun Wu, Chi Wang, John Langford, Paul Mineiro and Marco Rossi. ICML 2021.

 ```bibtex
@inproceedings{wu2021chacha,
@ -57,7 +57,7 @@ For technical details, please check our research publications.
 }
 ```

-* [Fair AutoML](https://arxiv.org/abs/2111.06495). Qingyun Wu, Chi Wang. ArXiv preprint arXiv:2111.06495 (2021).
+- [Fair AutoML](https://arxiv.org/abs/2111.06495). Qingyun Wu, Chi Wang. ArXiv preprint arXiv:2111.06495 (2021).

 ```bibtex
@inproceedings{wuwang2021fairautoml,
@ -68,7 +68,7 @@ For technical details, please check our research publications.
 }
 ```

-* [Mining Robust Default Configurations for Resource-constrained AutoML](https://arxiv.org/abs/2202.09927). Moe Kayali, Chi Wang. ArXiv preprint arXiv:2202.09927 (2022).
+- [Mining Robust Default Configurations for Resource-constrained AutoML](https://arxiv.org/abs/2202.09927). Moe Kayali, Chi Wang. ArXiv preprint arXiv:2202.09927 (2022).

 ```bibtex
@inproceedings{kayaliwang2022default,
@ -79,7 +79,7 @@ For technical details, please check our research publications.
 }
 ```

-* [Targeted Hyperparameter Optimization with Lexicographic Preferences Over Multiple Objectives](https://openreview.net/forum?id=0Ij9_q567Ma). Shaokun Zhang, Feiran Jia, Chi Wang, Qingyun Wu. ICLR 2023 (notable-top-5%).
+- [Targeted Hyperparameter Optimization with Lexicographic Preferences Over Multiple Objectives](https://openreview.net/forum?id=0Ij9_q567Ma). Shaokun Zhang, Feiran Jia, Chi Wang, Qingyun Wu. ICLR 2023 (notable-top-5%).

 ```bibtex
@inproceedings{zhang2023targeted,
@ -91,7 +91,7 @@ For technical details, please check our research publications.
 }
 ```

-* [Cost-Effective Hyperparameter Optimization for Large Language Model Generation Inference](https://arxiv.org/abs/2303.04673). Chi Wang, Susan Xueqing Liu, Ahmed H. Awadallah. ArXiv preprint arXiv:2303.04673 (2023).
+- [Cost-Effective Hyperparameter Optimization for Large Language Model Generation Inference](https://arxiv.org/abs/2303.04673). Chi Wang, Susan Xueqing Liu, Ahmed H. Awadallah. ArXiv preprint arXiv:2303.04673 (2023).

 ```bibtex
@inproceedings{wang2023EcoOptiGen,
@ -102,7 +102,7 @@ For technical details, please check our research publications.
 }
 ```

-* [An Empirical Study on Challenging Math Problem Solving with GPT-4](https://arxiv.org/abs/2306.01337). Yiran Wu, Feiran Jia, Shaokun Zhang, Hangyu Li, Erkang Zhu, Yue Wang, Yin Tat Lee, Richard Peng, Qingyun Wu, Chi Wang. ArXiv preprint arXiv:2306.01337 (2023).
+- [An Empirical Study on Challenging Math Problem Solving with GPT-4](https://arxiv.org/abs/2306.01337). Yiran Wu, Feiran Jia, Shaokun Zhang, Hangyu Li, Erkang Zhu, Yue Wang, Yin Tat Lee, Richard Peng, Qingyun Wu, Chi Wang. ArXiv preprint arXiv:2306.01337 (2023).

 ```bibtex
@inproceedings{wu2023empirical,
--- a/website/docs/Use-Cases/Task-Oriented-AutoML.md
+++ b/website/docs/Use-Cases/Task-Oriented-AutoML.md
@ -4,21 +4,21 @@

 [`flaml.AutoML`](/docs/reference/automl/automl#automl-objects) is a class for task-oriented AutoML. It can be used as a scikit-learn style estimator with the standard `fit` and `predict` functions. The minimal inputs from users are the training data and the task type.

-* Training data:
-    - numpy array. When the input data are stored in numpy array, they are passed to `fit()` as `X_train` and `y_train`.
-    - pandas dataframe. When the input data are stored in pandas dataframe, they are passed to `fit()` either as `X_train` and `y_train`, or as `dataframe` and `label`.
-* Tasks (specified via `task`):
-    - 'classification': classification with tabular data.
-    - 'regression': regression with tabular data.
-    - 'ts_forecast': time series forecasting.
-    - 'ts_forecast_classification': time series forecasting for classification.
-    - 'ts_forecast_panel': time series forecasting for panel datasets (multiple time series).
-    - 'rank': learning to rank.
-    - 'seq-classification': sequence classification.
-    - 'seq-regression': sequence regression.
-    - 'summarization': text summarization.
-    - 'token-classification': token classification.
-    - 'multichoice-classification': multichoice classification.
+- Training data:
+  - numpy array. When the input data are stored in numpy array, they are passed to `fit()` as `X_train` and `y_train`.
+  - pandas dataframe. When the input data are stored in pandas dataframe, they are passed to `fit()` either as `X_train` and `y_train`, or as `dataframe` and `label`.
+- Tasks (specified via `task`):
+  - 'classification': classification with tabular data.
+  - 'regression': regression with tabular data.
+  - 'ts_forecast': time series forecasting.
+  - 'ts_forecast_classification': time series forecasting for classification.
+  - 'ts_forecast_panel': time series forecasting for panel datasets (multiple time series).
+  - 'rank': learning to rank.
+  - 'seq-classification': sequence classification.
+  - 'seq-regression': sequence regression.
+  - 'summarization': text summarization.
+  - 'token-classification': token classification.
+  - 'multichoice-classification': multichoice classification.

 Two optional inputs are `time_budget` and `max_iter` for searching models and hyperparameters. When both are unspecified, only one model per estimator will be trained (using our [zero-shot](Zero-Shot-AutoML) technique). When `time_budget` is provided, there can be randomness in the result due to runtime variance.

@ -28,6 +28,7 @@ A typical way to use `flaml.AutoML`:
 # Prepare training data
 # ...
 from flaml import AutoML
+
 automl = AutoML()
 automl.fit(X_train, y_train, task="regression", time_budget=60, **other_settings)
 # Save the model
@ -48,43 +49,58 @@ If users provide the minimal inputs only, `AutoML` uses the default settings for

 The optimization metric is specified via the `metric` argument. It can be either a string which refers to a built-in metric, or a user-defined function.

-* Built-in metric.
-    - 'accuracy': 1 - accuracy as the corresponding metric to minimize.
-    - 'log_loss': default metric for multiclass classification.
-    - 'r2': 1 - r2_score as the corresponding metric to minimize. Default metric for regression.
-    - 'rmse': root mean squared error.
-    - 'mse': mean squared error.
-    - 'mae': mean absolute error.
-    - 'mape': mean absolute percentage error.
-    - 'roc_auc': minimize 1 - roc_auc_score. Default metric for binary classification.
-    - 'roc_auc_ovr': minimize 1 - roc_auc_score with `multi_class="ovr"`.
-    - 'roc_auc_ovo': minimize 1 - roc_auc_score with `multi_class="ovo"`.
-    - 'roc_auc_weighted': minimize 1 - roc_auc_score with `average="weighted"`.
-    - 'roc_auc_ovr_weighted': minimize 1 - roc_auc_score with `multi_class="ovr"` and `average="weighted"`.
-    - 'roc_auc_ovo_weighted': minimize 1 - roc_auc_score with `multi_class="ovo"` and `average="weighted"`.
-    - 'f1': minimize 1 - f1_score.
-    - 'micro_f1': minimize 1 - f1_score with `average="micro"`.
-    - 'macro_f1': minimize 1 - f1_score with `average="macro"`.
-    - 'ap': minimize 1 - average_precision_score.
-    - 'ndcg': minimize 1 - ndcg_score.
-    - 'ndcg@k': minimize 1 - ndcg_score@k. k is an integer.
-* User-defined function.
-A customized metric function that requires the following (input) signature, and returns the input config’s value in terms of the metric you want to minimize, and a dictionary of auxiliary information at your choice:
+- Built-in metric.
+  - 'accuracy': 1 - accuracy as the corresponding metric to minimize.
+  - 'log_loss': default metric for multiclass classification.
+  - 'r2': 1 - r2_score as the corresponding metric to minimize. Default metric for regression.
+  - 'rmse': root mean squared error.
+  - 'mse': mean squared error.
+  - 'mae': mean absolute error.
+  - 'mape': mean absolute percentage error.
+  - 'roc_auc': minimize 1 - roc_auc_score. Default metric for binary classification.
+  - 'roc_auc_ovr': minimize 1 - roc_auc_score with `multi_class="ovr"`.
+  - 'roc_auc_ovo': minimize 1 - roc_auc_score with `multi_class="ovo"`.
+  - 'roc_auc_weighted': minimize 1 - roc_auc_score with `average="weighted"`.
+  - 'roc_auc_ovr_weighted': minimize 1 - roc_auc_score with `multi_class="ovr"` and `average="weighted"`.
+  - 'roc_auc_ovo_weighted': minimize 1 - roc_auc_score with `multi_class="ovo"` and `average="weighted"`.
+  - 'f1': minimize 1 - f1_score.
+  - 'micro_f1': minimize 1 - f1_score with `average="micro"`.
+  - 'macro_f1': minimize 1 - f1_score with `average="macro"`.
+  - 'ap': minimize 1 - average_precision_score.
+  - 'ndcg': minimize 1 - ndcg_score.
+  - 'ndcg@k': minimize 1 - ndcg_score@k. k is an integer.
+- User-defined function.
+  A customized metric function that requires the following (input) signature, and returns the input config’s value in terms of the metric you want to minimize, and a dictionary of auxiliary information at your choice:

 ```python
 def custom_metric(
-    X_val, y_val, estimator, labels,
-    X_train, y_train, weight_val=None, weight_train=None,
-    config=None, groups_val=None, groups_train=None,
+    X_val,
+    y_val,
+    estimator,
+    labels,
+    X_train,
+    y_train,
+    weight_val=None,
+    weight_train=None,
+    config=None,
+    groups_val=None,
+    groups_train=None,
 ):
    return metric_to_minimize, metrics_to_log
 ```

 For example,
+
 ```python
 def custom_metric(
-    X_val, y_val, estimator, labels,
-    X_train, y_train, weight_val=None, weight_train=None,
+    X_val,
+    y_val,
+    estimator,
+    labels,
+    X_train,
+    y_train,
+    weight_val=None,
+    weight_train=None,
    *args,
 ):
    from sklearn.metrics import log_loss
@ -103,6 +119,7 @@ def custom_metric(
        "pred_time": pred_time,
    }
 ```
+
 It returns the validation loss penalized by the gap between validation and training loss as the metric to minimize, and three metrics to log: val_loss, train_loss and pred_time. The arguments `config`, `groups_val` and `groups_train` are not used in the function.

 ### Estimator and search space
@ -110,34 +127,36 @@ It returns the validation loss penalized by the gap between validation and train
 The estimator list can contain one or more estimator names, each corresponding to a built-in estimator or a custom estimator. Each estimator has a search space for hyperparameter configurations. FLAML supports both classical machine learning models and deep neural networks.

 #### Estimator
-* Built-in estimator.
-    - 'lgbm': LGBMEstimator for task "classification", "regression", "rank", "ts_forecast" and "ts_forecast_classification". Hyperparameters: n_estimators, num_leaves, min_child_samples, learning_rate, log_max_bin (logarithm of (max_bin + 1) with base 2), colsample_bytree, reg_alpha, reg_lambda.
-    - 'xgboost': XGBoostSkLearnEstimator for task "classification", "regression", "rank", "ts_forecast" and "ts_forecast_classification". Hyperparameters: n_estimators, max_leaves, min_child_weight, learning_rate, subsample, colsample_bylevel, colsample_bytree, reg_alpha, reg_lambda.
-    - 'xgb_limitdepth': XGBoostLimitDepthEstimator for task "classification", "regression", "rank", "ts_forecast" and "ts_forecast_classification". Hyperparameters: n_estimators,  max_depth, min_child_weight, learning_rate, subsample, colsample_bylevel, colsample_bytree, reg_alpha, reg_lambda.
-    - 'rf': RandomForestEstimator for task "classification", "regression", "ts_forecast" and "ts_forecast_classification". Hyperparameters: n_estimators, max_features, max_leaves, criterion (for classification only). Starting from v1.1.0,
+
+- Built-in estimator.
+  - 'lgbm': LGBMEstimator for task "classification", "regression", "rank", "ts_forecast" and "ts_forecast_classification". Hyperparameters: n_estimators, num_leaves, min_child_samples, learning_rate, log_max_bin (logarithm of (max_bin + 1) with base 2), colsample_bytree, reg_alpha, reg_lambda.
+  - 'xgboost': XGBoostSkLearnEstimator for task "classification", "regression", "rank", "ts_forecast" and "ts_forecast_classification". Hyperparameters: n_estimators, max_leaves, min_child_weight, learning_rate, subsample, colsample_bylevel, colsample_bytree, reg_alpha, reg_lambda.
+  - 'xgb_limitdepth': XGBoostLimitDepthEstimator for task "classification", "regression", "rank", "ts_forecast" and "ts_forecast_classification". Hyperparameters: n_estimators,  max_depth, min_child_weight, learning_rate, subsample, colsample_bylevel, colsample_bytree, reg_alpha, reg_lambda.
+  - 'rf': RandomForestEstimator for task "classification", "regression", "ts_forecast" and "ts_forecast_classification". Hyperparameters: n_estimators, max_features, max_leaves, criterion (for classification only). Starting from v1.1.0,
    it uses a fixed random_state by default.
-    - 'extra_tree': ExtraTreesEstimator for task "classification", "regression", "ts_forecast" and "ts_forecast_classification". Hyperparameters: n_estimators, max_features, max_leaves, criterion (for classification only). Starting from v1.1.0,
+  - 'extra_tree': ExtraTreesEstimator for task "classification", "regression", "ts_forecast" and "ts_forecast_classification". Hyperparameters: n_estimators, max_features, max_leaves, criterion (for classification only). Starting from v1.1.0,
    it uses a fixed random_state by default.
-    - 'histgb': HistGradientBoostingEstimator for task "classification", "regression", "ts_forecast" and "ts_forecast_classification". Hyperparameters: n_estimators, max_leaves, min_samples_leaf, learning_rate, log_max_bin (logarithm of (max_bin + 1) with base 2), l2_regularization. It uses a fixed random_state by default.
-    - 'lrl1': LRL1Classifier (sklearn.LogisticRegression with L1 regularization) for task "classification". Hyperparameters: C.
-    - 'lrl2': LRL2Classifier (sklearn.LogisticRegression with L2 regularization) for task "classification". Hyperparameters: C.
-    - 'catboost': CatBoostEstimator for task "classification" and "regression". Hyperparameters: early_stopping_rounds, learning_rate, n_estimators.
-    - 'kneighbor': KNeighborsEstimator for task "classification" and "regression". Hyperparameters: n_neighbors.
-    - 'prophet': Prophet for task "ts_forecast". Hyperparameters: changepoint_prior_scale, seasonality_prior_scale, holidays_prior_scale, seasonality_mode.
-    - 'arima': ARIMA for task "ts_forecast". Hyperparameters: p, d, q.
-    - 'sarimax': SARIMAX for task "ts_forecast". Hyperparameters: p, d, q, P, D, Q, s.
-    - 'holt-winters': Holt-Winters (triple exponential smoothing) model for task "ts_forecast". Hyperparameters: seasonal_perdiods, seasonal, use_boxcox, trend, damped_trend.
-    - 'transformer': Huggingface transformer models for task "seq-classification", "seq-regression", "multichoice-classification", "token-classification" and "summarization". Hyperparameters: learning_rate, num_train_epochs, per_device_train_batch_size, warmup_ratio, weight_decay, adam_epsilon, seed.
-    - 'temporal_fusion_transformer': TemporalFusionTransformerEstimator for task "ts_forecast_panel". Hyperparameters: gradient_clip_val, hidden_size, hidden_continuous_size, attention_head_size, dropout, learning_rate. There is a [known issue](https://github.com/jdb78/pytorch-forecasting/issues/1145) with pytorch-forecast logging.
-* Custom estimator. Use custom estimator for:
-    - tuning an estimator that is not built-in;
-    - customizing search space for a built-in estimator.
+  - 'histgb': HistGradientBoostingEstimator for task "classification", "regression", "ts_forecast" and "ts_forecast_classification". Hyperparameters: n_estimators, max_leaves, min_samples_leaf, learning_rate, log_max_bin (logarithm of (max_bin + 1) with base 2), l2_regularization. It uses a fixed random_state by default.
+  - 'lrl1': LRL1Classifier (sklearn.LogisticRegression with L1 regularization) for task "classification". Hyperparameters: C.
+  - 'lrl2': LRL2Classifier (sklearn.LogisticRegression with L2 regularization) for task "classification". Hyperparameters: C.
+  - 'catboost': CatBoostEstimator for task "classification" and "regression". Hyperparameters: early_stopping_rounds, learning_rate, n_estimators.
+  - 'kneighbor': KNeighborsEstimator for task "classification" and "regression". Hyperparameters: n_neighbors.
+  - 'prophet': Prophet for task "ts_forecast". Hyperparameters: changepoint_prior_scale, seasonality_prior_scale, holidays_prior_scale, seasonality_mode.
+  - 'arima': ARIMA for task "ts_forecast". Hyperparameters: p, d, q.
+  - 'sarimax': SARIMAX for task "ts_forecast". Hyperparameters: p, d, q, P, D, Q, s.
+  - 'holt-winters': Holt-Winters (triple exponential smoothing) model for task "ts_forecast". Hyperparameters: seasonal_perdiods, seasonal, use_boxcox, trend, damped_trend.
+  - 'transformer': Huggingface transformer models for task "seq-classification", "seq-regression", "multichoice-classification", "token-classification" and "summarization". Hyperparameters: learning_rate, num_train_epochs, per_device_train_batch_size, warmup_ratio, weight_decay, adam_epsilon, seed.
+  - 'temporal_fusion_transformer': TemporalFusionTransformerEstimator for task "ts_forecast_panel". Hyperparameters: gradient_clip_val, hidden_size, hidden_continuous_size, attention_head_size, dropout, learning_rate. There is a [known issue](https://github.com/jdb78/pytorch-forecasting/issues/1145) with pytorch-forecast logging.
+- Custom estimator. Use custom estimator for:
+  - tuning an estimator that is not built-in;
+  - customizing search space for a built-in estimator.

 #### Guidelines on tuning a custom estimator

 To tune a custom estimator that is not built-in, you need to:
+
 1. Build a custom estimator by inheritting [`flaml.automl.model.BaseEstimator`](/docs/reference/automl/model#baseestimator-objects) or a derived class.
-For example, if you have a estimator class with scikit-learn style `fit()` and `predict()` functions, you only need to set `self.estimator_class` to be that class in your constructor.
+   For example, if you have a estimator class with scikit-learn style `fit()` and `predict()` functions, you only need to set `self.estimator_class` to be that class in your constructor.

 ```python
 from flaml.automl.model import SKLearnEstimator
@ -184,6 +203,7 @@ In the constructor, we set `self.estimator_class` as `RGFClassifier` or `RGFRegr

 ```python
 from flaml import AutoML
+
 automl = AutoML()
 automl.add_learner("rgf", MyRegularizedGreedyForest)
 ```
@ -191,15 +211,17 @@ automl.add_learner("rgf", MyRegularizedGreedyForest)
 This registers the `MyRegularizedGreedyForest` class in AutoML, with the name "rgf".

 3. Tune the newly added custom estimator in either of the following two ways depending on your needs:
+
 - tune rgf alone: `automl.fit(..., estimator_list=["rgf"])`; or
 - mix it with other built-in learners: `automl.fit(..., estimator_list=["rgf", "lgbm", "xgboost", "rf"])`.

 #### Search space

 Each estimator class, built-in or not, must have a `search_space` function. In the `search_space` function, we return a dictionary about the hyperparameters, the keys of which are the names of the hyperparameters to tune, and each value is a set of detailed search configurations about the corresponding hyperparameters represented in a dictionary. A search configuration dictionary includes the following fields:
-* `domain`, which specifies the possible values of the hyperparameter and their distribution. Please refer to [more details about the search space domain](Tune-User-Defined-Function#more-details-about-the-search-space-domain).
-* `init_value` (optional), which specifies the initial value of the hyperparameter.
-* `low_cost_init_value`(optional), which specifies the value of the hyperparameter that is associated with low computation cost. See [cost related hyperparameters](Tune-User-Defined-Function#cost-related-hyperparameters) or [FAQ](/docs/FAQ#about-low_cost_partial_config-in-tune) for more details.
+
+- `domain`, which specifies the possible values of the hyperparameter and their distribution. Please refer to [more details about the search space domain](Tune-User-Defined-Function#more-details-about-the-search-space-domain).
+- `init_value` (optional), which specifies the initial value of the hyperparameter.
+- `low_cost_init_value`(optional), which specifies the value of the hyperparameter that is associated with low computation cost. See [cost related hyperparameters](Tune-User-Defined-Function#cost-related-hyperparameters) or [FAQ](/docs/FAQ#about-low_cost_partial_config-in-tune) for more details.

 In the example above, we tune four hyperparameters, three integers and one float. They all follow a log-uniform distribution. "max_leaf" and "n_iter" have "low_cost_init_value" specified as their values heavily influence the training cost.

@ -303,9 +325,7 @@ A shortcut to do this is to use the [`custom_hp`](#a-shortcut-to-override-the-se
 ```python
 custom_hp = {
    "xgboost": {
-        "monotone_constraints": {
-            "domain": "(1, -1)"  # fix the domain as a constant
-        }
+        "monotone_constraints": {"domain": "(1, -1)"}  # fix the domain as a constant
    }
 }
 ```
@ -313,10 +333,12 @@ custom_hp = {
 3. Constraints on the models tried in AutoML.

 Users can set constraints such as the maximal number of models to try, limit on training time and prediction time per model.
-* `train_time_limit`: training time in seconds.
-* `pred_time_limit`: prediction time per instance in seconds.
+
+- `train_time_limit`: training time in seconds.
+- `pred_time_limit`: prediction time per instance in seconds.

 For example,
+
 ```python
 automl.fit(X_train, y_train, max_iter=100, train_time_limit=1, pred_time_limit=1e-3)
 ```
@ -328,22 +350,31 @@ When users provide a [custom metric function](#optimization-metric), which retur
 Users need to provide a list of such constraints in the following format:
 Each element in this list is a 3-tuple, which shall be expressed
 in the following format: the first element of the 3-tuple is the name of the
-metric, the second element is the inequality sign chosen from ">=" and "<=",
+metric, the second element is the inequality sign chosen from ">=" and "\<=",
 and the third element is the constraint value. E.g., `('val_loss', '<=', 0.1)`.

 For example,
+
 ```python
 metric_constraints = [("train_loss", "<=", 0.1), ("val_loss", "<=", 0.1)]
-automl.fit(X_train, y_train, max_iter=100, train_time_limit=1, metric_constraints=metric_constraints)
+automl.fit(
+    X_train,
+    y_train,
+    max_iter=100,
+    train_time_limit=1,
+    metric_constraints=metric_constraints,
+)
 ```

 ### Ensemble

 To use stacked ensemble after the model search, set `ensemble=True` or a dict. When `ensemble=True`, the final estimator and `passthrough` in the stacker will be automatically chosen. You can specify customized final estimator or passthrough option:
-* "final_estimator": an instance of the final estimator in the stacker.
-* "passthrough": True (default) or False, whether to pass the original features to the stacker.
+
+- "final_estimator": an instance of the final estimator in the stacker.
+- "passthrough": True (default) or False, whether to pass the original features to the stacker.

 For example,
+
 ```python
 automl.fit(
    X_train, y_train, task="classification",
@ -359,29 +390,32 @@ automl.fit(
 By default, flaml decides the resampling automatically according to the data size and the time budget. If you would like to enforce a certain resampling strategy, you can set `eval_method` to be "holdout" or "cv" for holdout or cross-validation.

 For holdout, you can also set:
-* `split_ratio`: the fraction for validation data, 0.1 by default.
-* `X_val`, `y_val`: a separate validation dataset. When they are passed, the validation metrics will be computed against this given validation dataset. If they are not passed, then a validation dataset will be split from the training data and held out from training during the model search. After the model search, flaml will retrain the model with best configuration on the full training data.
-You can set`retrain_full` to be `False` to skip the final retraining or "budget" to ask flaml to do its best to retrain within the time budget.
+
+- `split_ratio`: the fraction for validation data, 0.1 by default.
+- `X_val`, `y_val`: a separate validation dataset. When they are passed, the validation metrics will be computed against this given validation dataset. If they are not passed, then a validation dataset will be split from the training data and held out from training during the model search. After the model search, flaml will retrain the model with best configuration on the full training data.
+  You can set`retrain_full` to be `False` to skip the final retraining or "budget" to ask flaml to do its best to retrain within the time budget.

 For cross validation, you can also set `n_splits` of the number of folds. By default it is 5.

 #### Data split method

 flaml relies on the provided task type to infer the default splitting strategy:
-* stratified split for classification;
-* uniform split for regression;
-* time-based split for time series forecasting;
-* group-based split for learning to rank.
+
+- stratified split for classification;
+- uniform split for regression;
+- time-based split for time series forecasting;
+- group-based split for learning to rank.

 The data split method for classification can be changed into uniform split by setting `split_type="uniform"`. The data are shuffled when `split_type in ("uniform", "stratified")`.

 For both classification and regression tasks more advanced split configurations are possible:
+
 - time-based split can be enforced if the data are sorted by timestamps, by setting `split_type="time"`,
 - group-based splits can be set by using `split_type="group"` while providing the group identifier for each sample through the `groups` argument. This is also shown in an [example notebook](https://github.com/microsoft/FLAML/blob/main/notebook/basics/understanding_cross_validation.ipynb).

 More in general, `split_type` can also be set as a custom splitter object, when `eval_method="cv"`. It needs to be an instance of a derived class of scikit-learn
 [KFold](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.KFold.html#sklearn.model_selection.KFold)
-and have ``split`` and ``get_n_splits`` methods with the same signatures.  To disable shuffling, the splitter instance must contain the attribute `shuffle=False`.
+and have `split` and `get_n_splits` methods with the same signatures.  To disable shuffling, the splitter instance must contain the attribute `shuffle=False`.

 ### Parallel tuning

@ -392,18 +426,23 @@ FLAML now support two backends for parallel tuning, i.e., `Ray` and `Spark`. You
 #### Parallel tuning with Ray

 To do parallel tuning with Ray, install the `ray` and `blendsearch` options:
+
 ```bash
 pip install flaml[ray,blendsearch]
 ```

 `ray` is used to manage the resources. For example,
+
 ```python
 ray.init(num_cpus=16)
 ```
+
 allocates 16 CPU cores. Then, when you run:
+
 ```python
 automl.fit(X_train, y_train, n_jobs=4, n_concurrent_trials=4)
 ```
+
 flaml will perform 4 trials in parallel, each consuming 4 CPU cores. The parallel tuning uses the [BlendSearch](Tune-User-Defined-Function##blendsearch-economical-hyperparameter-optimization-with-blended-search-strategy) algorithm.

 #### Parallel tuning with Spark
@ -411,6 +450,7 @@ flaml will perform 4 trials in parallel, each consuming 4 CPU cores. The paralle
 To do parallel tuning with Spark, install the `spark` and `blendsearch` options:

 > *Spark support is added in v1.1.0*
+
 ```bash
 pip install flaml[spark,blendsearch]>=1.1.0
 ```
@ -418,9 +458,11 @@ pip install flaml[spark,blendsearch]>=1.1.0
 For more details about installing Spark, please refer to [Installation](/docs/Installation#distributed-tuning).

 An example of using Spark for parallel tuning is:
+
 ```python
 automl.fit(X_train, y_train, n_concurrent_trials=4, use_spark=True)
 ```
+
 Details about parallel tuning with Spark could be found [here](/docs/Examples/Integrate%20-%20Spark#parallel-spark-jobs). For Spark clusters, by default, we will launch one trial per executor. However, sometimes we want to launch more trials than the number of executors (e.g., local mode). In this case, we can set the environment variable `FLAML_MAX_CONCURRENT` to override the detected `num_executors`. The final number of concurrent trials will be the minimum of `n_concurrent_trials` and `num_executors`. Also, GPU training is not supported yet when use_spark is True.

 #### **Guidelines on parallel vs sequential tuning**
@ -429,7 +471,7 @@ Details about parallel tuning with Spark could be found [here](/docs/Examples/In

 One common motivation for parallel tuning is to save wall-clock time. When sequential tuning and parallel tuning achieve a similar wall-clock time, sequential tuning should be preferred. This is a rule of thumb when the HPO algorithm is sequential by nature (e.g., Bayesian Optimization and FLAML's HPO algorithms CFO and BS). Sequential tuning allows the HPO algorithms to take advantage of the historical trial results. Then the question is **How to estimate the wall-clock-time needed by parallel tuning and sequential tuning**?

-You can use the following way to roughly estimate the wall-clock time in parallel tuning and sequential tuning: To finish $N$ trials of hyperparameter tuning, i.e., run $N$ hyperparameter configurations, the total wall-clock time needed is $N/k*(SingleTrialTime + Overhead)$, in which $SingleTrialTime$ is the trial time to evaluate a particular hyperparameter configuration, $k$ is the scale of parallelism, e.g., the number of parallel CPU/GPU cores, and $Overhead$ is the computation overhead.
+You can use the following way to roughly estimate the wall-clock time in parallel tuning and sequential tuning: To finish $N$ trials of hyperparameter tuning, i.e., run $N$ hyperparameter configurations, the total wall-clock time needed is $N/k\*(SingleTrialTime + Overhead)$, in which $SingleTrialTime$ is the trial time to evaluate a particular hyperparameter configuration, $k$ is the scale of parallelism, e.g., the number of parallel CPU/GPU cores, and $Overhead$ is the computation overhead.

 In sequential tuning, $k=1$, and in parallel tuning $k>1$. This may suggest that parallel tuning has a shorter wall-clock time. But it is not always the case considering the other two factors $SingleTrialTime$, and $Overhead$:

@ -440,9 +482,10 @@ In sequential tuning, $k=1$, and in parallel tuning $k>1$. This may suggest that
 **(2) Considerations on randomness.**

 Potential reasons that cause randomness:
+
 1. Parallel tuning: In the case of parallel tuning, the order of trials' finishing time is no longer deterministic. This non-deterministic order, combined with sequential HPO algorithms, leads to a non-deterministic hyperparameter tuning trajectory.

-2. Distributed or multi-thread training: Distributed/multi-thread training may introduce randomness in model training, i.e., the trained model with the same hyperparameter may be different because of such randomness. This model-level randomness may be undesirable in some cases.
+1. Distributed or multi-thread training: Distributed/multi-thread training may introduce randomness in model training, i.e., the trained model with the same hyperparameter may be different because of such randomness. This model-level randomness may be undesirable in some cases.

 ### Warm start

@ -452,7 +495,12 @@ We can warm start the AutoML by providing starting points of hyperparameter conf
 automl1 = AutoML()
 automl1.fit(X_train, y_train, time_budget=3600)
 automl2 = AutoML()
-automl2.fit(X_train, y_train, time_budget=7200, starting_points=automl1.best_config_per_estimator)
+automl2.fit(
+    X_train,
+    y_train,
+    time_budget=7200,
+    starting_points=automl1.best_config_per_estimator,
+)
 ```

 `starting_points` is a dictionary or a str to specify the starting hyperparameter config. (1) When it is a dictionary, the keys are the estimator names. If you do not need to specify starting points for an estimator, exclude its name from the dictionary. The value for each key can be either a dictionary of a list of dictionaries, corresponding to one hyperparameter configuration, or multiple hyperparameter configurations, respectively. (2) When it is a str: if "data", use data-dependent defaults; if "data:path", use data-dependent defaults which are stored at path; if "static", use data-independent defaults. Please find more details about data-dependent defaults in [zero shot AutoML](Zero-Shot-AutoML#combine-zero-shot-automl-and-hyperparameter-tuning).
@ -461,6 +509,7 @@ automl2.fit(X_train, y_train, time_budget=7200, starting_points=automl1.best_con

 The trials are logged in a file if a `log_file_name` is passed.
 Each trial is logged as a json record in one line. The best trial's id is logged in the last line. For example,
+
 ```
 {"record_id": 0, "iter_per_learner": 1, "logged_metric": null, "trial_time": 0.12717914581298828, "wall_clock_time": 0.1728971004486084, "validation_loss": 0.07333333333333332, "config": {"n_estimators": 4, "num_leaves": 4, "min_child_samples": 20, "learning_rate": 0.09999999999999995, "log_max_bin": 8, "colsample_bytree": 1.0, "reg_alpha": 0.0009765625, "reg_lambda": 1.0}, "learner": "lgbm", "sample_size": 150}
 {"record_id": 1, "iter_per_learner": 3, "logged_metric": null, "trial_time": 0.07027268409729004, "wall_clock_time": 0.3756711483001709, "validation_loss": 0.05333333333333332, "config": {"n_estimators": 4, "num_leaves": 4, "min_child_samples": 12, "learning_rate": 0.2677050123105203, "log_max_bin": 7, "colsample_bytree": 1.0, "reg_alpha": 0.001348364934537134, "reg_lambda": 1.4442580148221913}, "learner": "lgbm", "sample_size": 150}
@ -472,6 +521,7 @@ Each trial is logged as a json record in one line. The best trial's id is logged
 1. flaml will adjust the `n_estimators` for lightgbm etc. according to the remaining budget and check the time budget constraint and stop in several places. Most of the time that makes `fit()` stops before the given budget. Occasionally it may run over the time budget slightly. But the log file always contains the best config info and you can recover the best model until any time point using `retrain_from_log()`.

 We can also use mlflow for logging:
+
 ```python
 mlflow.set_experiment("flaml")
 with mlflow.start_run():
@ -479,10 +529,13 @@ with mlflow.start_run():
 ```

 To disable mlflow logging pre-configured in FLAML, set `mlflow_logging=False`:
+
 ```python
 automl = AutoML(mlflow_logging=False)
 ```
+
 or
+
 ```python
 automl.fit(X_train=X_train, y_train=y_train, mlflow_logging=False, **settings)
 ```
@ -532,20 +585,25 @@ print(automl.model)

 ```python
 print(automl.model.estimator)
-'''
+"""
 LGBMRegressor(colsample_bytree=0.7610534336273627,
              learning_rate=0.41929025492645006, max_bin=255,
              min_child_samples=4, n_estimators=45, num_leaves=4,
              reg_alpha=0.0009765625, reg_lambda=0.009280655005879943,
              verbose=-1)
-'''
+"""
 ```

 Just like a normal LightGBM model, we can inspect it. For example, we can plot the feature importance:
+
 ```python
 import matplotlib.pyplot as plt
-plt.barh(automl.model.estimator.feature_name_, automl.model.estimator.feature_importances_)
+
+plt.barh(
+    automl.model.estimator.feature_name_, automl.model.estimator.feature_importances_
+)
 ```
+
 ![png](images/feature_importance.png)

 ### Get best configuration
@ -569,6 +627,7 @@ print(automl.best_config_per_estimator)
 The `None` value corresponds to the estimators which have not been tried.

 Other useful information:
+
 ```python
 print(automl.best_config_train_time)
 # 0.24841618537902832
@ -613,18 +672,23 @@ The curve suggests that increasing the time budget may further improve the accur

 ### How to set time budget

-* If you have an exact constraint for the total search time, set it as the time budget.
-* If you have flexible time constraints, for example, your desirable time budget is t1=60s, and the longest time budget you can tolerate is t2=3600s, you can try the following two ways:
+- If you have an exact constraint for the total search time, set it as the time budget.
+- If you have flexible time constraints, for example, your desirable time budget is t1=60s, and the longest time budget you can tolerate is t2=3600s, you can try the following two ways:
+
 1. set t1 as the time budget, and check the message in the console log in the end. If the budget is too small, you will see a warning like
+
 > WARNING - Time taken to find the best model is 91% of the provided time budget and not all estimators' hyperparameter search converged. Consider increasing the time budget.
+
 2. set t2 as the time budget, and also set `early_stop=True`. If the early stopping is triggered, you will see a warning like
+
 > WARNING - All estimator hyperparameters local search has converged at least once, and the total search time exceeds 10 times the time taken to find the best model.

- > WARNING - Stopping search as early_stop is set to True.
+> WARNING - Stopping search as early_stop is set to True.

 ### How much time is needed to find the best model

 If you want to get a sense of how much time is needed to find the best model, you can use `max_iter=2` to perform two trials first. The message will be like:
+
 > INFO - iteration 0, current learner lgbm

 > INFO - Estimated sufficient time budget=145194s. Estimated necessary time budget=2118s.
--- a/website/docs/Use-Cases/Tune-User-Defined-Function.md
+++ b/website/docs/Use-Cases/Tune-User-Defined-Function.md
@ -10,6 +10,7 @@
 ## Basic Tuning Procedure

 There are three essential steps (assuming the knowledge of the set of hyperparameters to tune) to use `flaml.tune` to finish a basic tuning task:
+
 1. Specify the [tuning objective](#tuning-objective) with respect to the hyperparameters.
 1. Specify a [search space](#search-space) of the hyperparameters.
 1. Specify [tuning constraints](#tuning-constraints), including constraints on the resource budget to do the tuning, constraints on the configurations, or/and constraints on a (or multiple) particular metric(s).
@ -19,9 +20,10 @@ With these steps, you can [perform a basic tuning task](#put-together) according
 ### Tuning objective

 Related arguments:
+
 - `evaluation_function`: A user-defined evaluation function.
 - `metric`: A string of the metric name to optimize for.
- `mode`:  A string in ['min', 'max'] to specify the objective as minimization or maximization.
+- `mode`:  A string in \['min', 'max'\] to specify the objective as minimization or maximization.

 The first step is to specify your tuning objective.
 To do it, you should first specify your evaluation procedure (e.g., perform a machine learning model training and validation) with respect to the hyperparameters in a user-defined function `evaluation_function`.
@ -32,6 +34,7 @@ In the following code, we define an evaluation function with respect to two hype
 ```python
 import time

+
 def evaluate_config(config: dict):
    """evaluate a hyperparameter configuration"""
    score = (config["x"] - 85000) ** 2 - config["x"] / config["y"]
@ -44,7 +47,11 @@ def evaluate_config(config: dict):
    # we can return a single float as a score on the input config:
    # return score
    # or, we can return a dictionary that maps metric name to metric value:
-    return {"score": score, "evaluation_cost": faked_evaluation_cost, "constraint_metric": config["x"] * config["y"]}
+    return {
+        "score": score,
+        "evaluation_cost": faked_evaluation_cost,
+        "constraint_metric": config["x"] * config["y"],
+    }
 ```

 When the evaluation function returns a dictionary of metrics, you need to specify the name of the metric to optimize via the argument `metric` (this can be skipped when the function is just returning a scalar). In addition, you need to specify a mode of your optimization/tuning task (maximization or minimization) via the argument `mode` by choosing from "min" or "max".
@ -58,14 +65,14 @@ flaml.tune.run(evaluation_function=evaluate_config, metric="score", mode="min",
 ### Search space

 Related arguments:
+
 - `config`: A dictionary to specify the search space.
 - `low_cost_partial_config` (optional): A dictionary from a subset of controlled dimensions to the initial low-cost values.
 - `cat_hp_cost` (optional): A dictionary from a subset of categorical dimensions to the relative cost of each choice.

 The second step is to specify a search space of the hyperparameters through the argument `config`. In the search space, you need to specify valid values for your hyperparameters and can specify how these values are sampled (e.g., from a uniform distribution or a log-uniform distribution).

-In the following code example, we include a search space for the two hyperparameters `x` and `y` as introduced above. The valid values for both are integers in the range of [1, 100000]. The values for `x` are sampled uniformly in the specified range (using `tune.randint(lower=1, upper=100000)`), and the values for `y` are sampled uniformly in logarithmic space of the specified range (using `tune.lograndit(lower=1, upper=100000)`).
-
+In the following code example, we include a search space for the two hyperparameters `x` and `y` as introduced above. The valid values for both are integers in the range of \[1, 100000\]. The values for `x` are sampled uniformly in the specified range (using `tune.randint(lower=1, upper=100000)`), and the values for `y` are sampled uniformly in logarithmic space of the specified range (using `tune.lograndit(lower=1, upper=100000)`).

 ```python
 from flaml import tune
@ -73,7 +80,7 @@ from flaml import tune
 # construct a search space for the hyperparameters x and y.
 config_search_space = {
    "x": tune.lograndint(lower=1, upper=100000),
-    "y": tune.randint(lower=1, upper=100000)
+    "y": tune.randint(lower=1, upper=100000),
 }

 # provide the search space to tune.run
@ -81,20 +88,24 @@ tune.run(..., config=config_search_space, ...)
 ```

 #### **Details and guidelines on hyperparameter search space**
+
 The corresponding value of a particular hyperparameter in the search space dictionary is called a *domain*, for example, `tune.randint(lower=1, upper=100000)` is the domain for the hyperparameter `y`.
 The domain specifies a *type* and *valid range* to sample parameters from. Supported types include float, integer, and categorical.

 - **Categorical hyperparameter**

- If it is a categorical hyperparameter, then you should use `tune.choice(possible_choices)` in which `possible_choices` is the list of possible categorical values of the hyperparameter. For example, if you are tuning the optimizer used in model training, and the candidate optimizers are "sgd" and "adam", you should specify the search space in the following way:
+If it is a categorical hyperparameter, then you should use `tune.choice(possible_choices)` in which `possible_choices` is the list of possible categorical values of the hyperparameter. For example, if you are tuning the optimizer used in model training, and the candidate optimizers are "sgd" and "adam", you should specify the search space in the following way:
+
 ```python
 {
    "optimizer": tune.choice(["sgd", "adam"]),
 }
 ```
+
 - **Numerical hyperparameter**

 If it is a numerical hyperparameter, you need to know whether it takes integer values or float values. In addition, you need to know:
+
 - The range of valid values, i.e., what are the lower limit and upper limit of the hyperparameter value?
 - Do you want to sample in linear scale or log scale? It is a common practice to sample in the log scale if the valid value range is large and the evaluation function changes more regularly with respect to the log domain, as shown in the following example for learning rate tuning. In this code example, we set the lower limit and the upper limit of the learning rate to be 1/1024 and 1.0, respectively. We sample in the log space because model performance changes more regularly in the log scale with respect to the learning rate within such a large search range.

@ -103,6 +114,7 @@ If it is a numerical hyperparameter, you need to know whether it takes integer v
    "learning_rate": tune.loguniform(lower=1 / 1024, upper=1.0),
 }
 ```
+
 When the search range of learning rate is small, it is more common to sample in the linear scale as shown in the following example,

 ```python
@ -111,10 +123,10 @@ When the search range of learning rate is small, it is more common to sample in
 }
 ```

-
 - Do you have quantization granularity requirements?

 When you have a desired quantization granularity for the hyperparameter change, you can use `tune.qlograndint` or `tune.qloguniform` to realize the quantization requirement. The following code example helps you realize the need for sampling uniformly in the range of 0.1 and 0.2 with increments of 0.02, i.e., the sampled learning rate can only take values in {0.1, 0.12, 0.14, 0.16, ..., 0.2},
+
 ```python
 {
    "learning_rate": tune.quniform(lower=0.1, upper=0.2, q=0.02),
@ -123,14 +135,12 @@ When you have a desired quantization granularity for the hyperparameter change,

 You can find the corresponding search space choice in the table below once you have answers to the aforementioned three questions.

-
-|      | Integer | Float |
-| ----------- | ----------- |-----------
-| linear scale      | tune.randint(lower: int, upper: int)| tune.uniform(lower: float, upper: float)|
-| log scale      | tune.lograndint(lower: int, upper: int, base: float = 10 | tune.loguniform(lower: float, upper: float, base: float = 10)|
-| linear scale with quantization| tune.qrandint(lower: int, upper: int, q: int = 1)| tune.quniform(lower: float, upper: float, q: float = 1)|
-log scale with quantization  | tune.qlograndint(lower: int, upper, q: int = 1, base: float = 10)| tune.qloguniform(lower: float, upper, q: float = 1, base: float = 10)
-
+|                                | Integer                                                           | Float                                                                 |
+| ------------------------------ | ----------------------------------------------------------------- | --------------------------------------------------------------------- |
+| linear scale                   | tune.randint(lower: int, upper: int)                              | tune.uniform(lower: float, upper: float)                              |
+| log scale                      | tune.lograndint(lower: int, upper: int, base: float = 10          | tune.loguniform(lower: float, upper: float, base: float = 10)         |
+| linear scale with quantization | tune.qrandint(lower: int, upper: int, q: int = 1)                 | tune.quniform(lower: float, upper: float, q: float = 1)               |
+| log scale with quantization    | tune.qlograndint(lower: int, upper, q: int = 1, base: float = 10) | tune.qloguniform(lower: float, upper, q: float = 1, base: float = 10) |

 See the example below for the commonly used types of domains.

@ -138,48 +148,38 @@ See the example below for the commonly used types of domains.
 config = {
    # Sample a float uniformly between -5.0 and -1.0
    "uniform": tune.uniform(-5, -1),
-
    # Sample a float uniformly between 3.2 and 5.4,
    # rounding to increments of 0.2
    "quniform": tune.quniform(3.2, 5.4, 0.2),
-
    # Sample a float uniformly between 0.0001 and 0.01, while
    # sampling in log space
    "loguniform": tune.loguniform(1e-4, 1e-2),
-
    # Sample a float uniformly between 0.0001 and 0.1, while
    # sampling in log space and rounding to increments of 0.00005
    "qloguniform": tune.qloguniform(1e-4, 1e-1, 5e-5),
-
    # Sample a random float from a normal distribution with
    # mean=10 and sd=2
    "randn": tune.randn(10, 2),
-
    # Sample a random float from a normal distribution with
    # mean=10 and sd=2, rounding to increments of 0.2
    "qrandn": tune.qrandn(10, 2, 0.2),
-
    # Sample a integer uniformly between -9 (inclusive) and 15 (exclusive)
    "randint": tune.randint(-9, 15),
-
    # Sample a random uniformly between -21 (inclusive) and 12 (inclusive (!))
    # rounding to increments of 3 (includes 12)
    "qrandint": tune.qrandint(-21, 12, 3),
-
    # Sample a integer uniformly between 1 (inclusive) and 10 (exclusive),
    # while sampling in log space
    "lograndint": tune.lograndint(1, 10),
-
    # Sample a integer uniformly between 2 (inclusive) and 10 (inclusive (!)),
    # while sampling in log space and rounding to increments of 2
    "qlograndint": tune.qlograndint(2, 10, 2),
-
    # Sample an option uniformly from the specified choices
    "choice": tune.choice(["a", "b", "c"]),
 }
 ```
-<!-- Please refer to [ray.tune](https://docs.ray.io/en/latest/tune/api_docs/search_space.html#overview) for a more comprehensive introduction about possible choices of the domain. -->

+<!-- Please refer to [ray.tune](https://docs.ray.io/en/latest/tune/api_docs/search_space.html#overview) for a more comprehensive introduction about possible choices of the domain. -->

 #### Cost-related hyperparameters

@ -191,12 +191,12 @@ In this case, designing a search space with proper ranges of the hyperparameter
 Our search algorithms are designed to finish the tuning process at a low total cost when the evaluation cost in the search space is heterogeneous.
 So in such scenarios, if you are aware of low-cost configurations for the cost-related hyperparameters, you are encouraged to set them as the `low_cost_partial_config`, which is a dictionary of a subset of the hyperparameter coordinates whose value corresponds to a configuration with known low cost.  Using the example of the tree-based methods again, since we know that small `n_estimators` and `max_leaves` generally correspond to simpler models and thus lower cost, we set `{'n_estimators': 4, 'max_leaves': 4}` as the `low_cost_partial_config` by default (note that 4 is the lower bound of search space for these two hyperparameters), e.g., in LGBM. Please find more details on how the algorithm works [here](#cfo-frugal-optimization-for-cost-related-hyperparameters).

-
 In addition, if you are aware of the cost relationship between different categorical hyperparameter choices, you are encouraged to provide this information through `cat_hp_cost`. It also helps the search algorithm to reduce the total cost.

 ### Tuning constraints

 Related arguments:
+
 - `time_budget_s`: The time budget in seconds.
 - `num_samples`: An integer of the number of configs to try.
 - `config_constraints` (optional): A list of config constraints to be satisfied.
@ -215,12 +215,12 @@ flaml.tune.run(..., num_samples=100, ...)
 flaml.tune.run(..., time_budget_s=60, num_samples=100, ...)
 ```

-
 Optionally, you can provide a list of config constraints to be satisfied through the argument `config_constraints` and provide a list of metric constraints to be satisfied through the argument `metric_constraints`. We provide more details about related use cases in the [Advanced Tuning Options](#more-constraints-on-the-tuning) section.

-
 ### Put together
+
 After the aforementioned key steps, one is ready to perform a tuning task by calling [`flaml.tune.run()`](/docs/reference/tune/tune#run). Below is a quick sequential tuning example using the pre-defined search space `config_search_space` and a minimization (`mode='min'`) objective for the `score` metric evaluated in `evaluate_config`, using the default serach algorithm in flaml. The time budget is 10 seconds (`time_budget_s=10`).
+
 ```python
 # require: pip install flaml[blendsearch]
 analysis = tune.run(
@ -233,7 +233,6 @@ analysis = tune.run(
 )
 ```

-
 ### Result analysis

 Once the tuning process finishes, it returns an [ExperimentAnalysis](/docs/reference/tune/analysis) object, which provides methods to analyze the tuning.
@ -259,7 +258,7 @@ There are several advanced tuning options worth mentioning.

 ### More constraints on the tuning

-A user can specify constraints on the configurations to be satisfied via the argument `config_constraints`. The `config_constraints` receives a list of such constraints to be satisfied. Specifically, each constraint is a tuple that consists of (1) a function that takes a configuration as input and returns a numerical value; (2) an operation chosen from "<=", ">=", "<" or ">"; (3) a numerical threshold.
+A user can specify constraints on the configurations to be satisfied via the argument `config_constraints`. The `config_constraints` receives a list of such constraints to be satisfied. Specifically, each constraint is a tuple that consists of (1) a function that takes a configuration as input and returns a numerical value; (2) an operation chosen from "\<=", ">=", "\<" or ">"; (3) a numerical threshold.

 In the following code example, we constrain the output of `area`, which takes a configuration as input and outputs a numerical value, to be no larger than 1000.

@ -267,23 +266,24 @@ In the following code example, we constrain the output of `area`, which takes a
 def my_model_size(config):
    return config["n_estimators"] * config["max_leaves"]

-analysis = tune.run(...,
-    config_constraints = [(my_model_size, "<=", 40)],
+
+analysis = tune.run(
+    ...,
+    config_constraints=[(my_model_size, "<=", 40)],
 )
 ```

- You can also specify a list of metric constraints to be satisfied via the argument `metric_constraints`. Each element in the `metric_constraints` list is a tuple that consists of (1) a string specifying the name of the metric (the metric name must be defined and returned in the user-defined `evaluation_function`); (2) an operation chosen from "<=" or ">="; (3) a numerical threshold.
+You can also specify a list of metric constraints to be satisfied via the argument `metric_constraints`. Each element in the `metric_constraints` list is a tuple that consists of (1) a string specifying the name of the metric (the metric name must be defined and returned in the user-defined `evaluation_function`); (2) an operation chosen from "\<=" or ">="; (3) a numerical threshold.

- In the following code example, we constrain the metric `training_cost` to be no larger than 1 second.
+In the following code example, we constrain the metric `training_cost` to be no larger than 1 second.

 ```python
-analysis = tune.run(...,
-    metric_constraints = [("training_cost", "<=", 1)]),
+analysis = (tune.run(..., metric_constraints=[("training_cost", "<=", 1)]),)
 ```

 #### **`config_constraints` vs `metric_constraints`:**
-The key difference between these two types of constraints is that the calculation of constraints in `config_constraints` does not rely on the computation procedure in the evaluation function, i.e., in `evaluation_function`. For example, when a constraint only depends on the config itself, as shown in the code example. Due to this independency, constraints in `config_constraints` will be checked before evaluation. So configurations that do not satisfy `config_constraints` will not be evaluated.

+The key difference between these two types of constraints is that the calculation of constraints in `config_constraints` does not rely on the computation procedure in the evaluation function, i.e., in `evaluation_function`. For example, when a constraint only depends on the config itself, as shown in the code example. Due to this independency, constraints in `config_constraints` will be checked before evaluation. So configurations that do not satisfy `config_constraints` will not be evaluated.

 ### Parallel tuning

@ -295,9 +295,8 @@ Related arguments:

 Details about parallel tuning with Spark could be found [here](/docs/Examples/Integrate%20-%20Spark#parallel-spark-jobs).

-
-You can perform parallel tuning by specifying `use_ray=True` (requiring flaml[ray] option installed) or `use_spark=True`
-(requiring flaml[spark] option installed). You can also limit the amount of resources allocated per trial by specifying `resources_per_trial`,
+You can perform parallel tuning by specifying `use_ray=True` (requiring flaml\[ray\] option installed) or `use_spark=True`
+(requiring flaml\[spark\] option installed). You can also limit the amount of resources allocated per trial by specifying `resources_per_trial`,
 e.g., `resources_per_trial={'cpu': 2}` when `use_ray=True`.

 ```python
@ -310,7 +309,7 @@ analysis = tune.run(
    num_samples=-1,  # the maximal number of configs to try, -1 means infinite
    time_budget_s=10,  # the time budget in seconds
    use_ray=True,
-    resources_per_trial={"cpu": 2}  # limit resources allocated per trial
+    resources_per_trial={"cpu": 2},  # limit resources allocated per trial
 )
 print(analysis.best_trial.last_result)  # the best trial's result
 print(analysis.best_config)  # the best config
@ -333,10 +332,10 @@ print(analysis.best_config)  # the best config

 **A headsup about computation overhead.** When parallel tuning is used, there will be a certain amount of computation overhead in each trial. In case each trial's original cost is much smaller than the overhead, parallel tuning can underperform sequential tuning. Sequential tuning is recommended when compute resource is limited, and each trial can consume all the resources.

-
 ### Trial scheduling

 Related arguments:
+
 - `scheduler`: A scheduler for executing the trials.
 - `resource_attr`: A string to specify the resource dimension used by the scheduler.
 - `min_resource`: A float of the minimal resource to use for the resource_attr.
@ -350,6 +349,7 @@ A scheduler can help manage the trials' execution. It can be used to perform mul
 This scheduler is authentic to the new search algorithms provided by FLAML. In a nutshell, it starts the search with the minimum resource. It switches between HPO with the current resource and increasing the resource for evaluation depending on which leads to faster improvement.

 If this scheduler is used, you need to
+
 - Specify a resource dimension. Conceptually a 'resource dimension' is a factor that affects the cost of the evaluation (e.g., sample size, the number of epochs). You need to specify the name of the resource dimension via `resource_attr`. For example, if `resource_attr="sample_size"`, then the config dict passed to the `evaluation_function` would contain a key "sample_size" and its value suggested by the search algorithm. That value should be used in the evaluation function to control the compute cost. The larger is the value, the more expensive the evaluation is.

 - Provide the lower and upper limit of the resource dimension via `min_resource` and `max_resource`, and optionally provide `reduction_factor`, which determines the magnitude of resource (multiplicative) increase when we decide to increase the resource.
@ -409,8 +409,6 @@ analysis = tune.run(

 You can find more details about this scheduler in [this paper](https://arxiv.org/pdf/1911.04706.pdf).

-
-
 #### 2. A scheduler of the  [`TrialScheduler`](https://docs.ray.io/en/latest/tune/api_docs/schedulers.html#tune-schedulers) class from `ray.tune`.

 There is a handful of schedulers of this type implemented in `ray.tune`, for example, [ASHA](https://docs.ray.io/en/latest/tune/api_docs/schedulers.html#asha-tune-schedulers-ashascheduler), [HyperBand](https://docs.ray.io/en/latest/tune/api_docs/schedulers.html#tune-original-hyperband), [BOHB](https://docs.ray.io/en/latest/tune/api_docs/schedulers.html#tune-scheduler-bohb), etc.
@ -423,6 +421,7 @@ from ray.tune.schedulers import HyperBandScheduler
 my_scheduler = HyperBandScheduler(time_attr="sample_size", max_t=max_resource, reduction_factor=2)
 tune.run(.., scheduler=my_scheduler, ...)
 ```
+
 - Similar to the case where the `flaml` scheduler is used, you need to specify the resource dimension, use the resource dimension accordingly in your `evaluation_function`, and provide the necessary information needed for scheduling, such as `min_resource`, `max_resource` and `reduction_factor` (depending on the requirements of the specific scheduler).

 - Different from the case when the `flaml` scheduler is used, the amount of resources to use at each iteration is not suggested by the search algorithm through the `resource_attr` in a configuration. You need to specify the evaluation schedule explicitly by yourself in the `evaluation_function` and **report intermediate results (using `tune.report()`) accordingly**. In the following code example, we use the ASHA scheduler by setting `scheduler="asha"`. We specify `resource_attr`, `min_resource`, `min_resource` and `reduction_factor` the same way as in the previous example (when "flaml" is used as the scheduler). We perform the evaluation in a customized schedule.
@ -430,12 +429,16 @@ tune.run(.., scheduler=my_scheduler, ...)
 - Use ray backend or not? You can choose to use ray backend or not by specifying `use_ray=True` or `use_ray=False`. When ray backend is not used, i.e., `use_ray=False`, you also need to stop the evaluation function by explicitly catching the `StopIteration` exception, as shown in the end of the evaluation function `obj_w_intermediate_report()` in the following code example.

 ```python
-def obj_w_intermediate_report(resource_attr, X_train, X_test, y_train, y_test, min_resource, max_resource, config):
+def obj_w_intermediate_report(
+    resource_attr, X_train, X_test, y_train, y_test, min_resource, max_resource, config
+):
    from lightgbm import LGBMClassifier
    from sklearn.metrics import accuracy_score

    # a customized schedule to perform the evaluation
-    eval_schedule = [res for res in range(min_resource, max_resource, 5000)] + [max_resource]
+    eval_schedule = [res for res in range(min_resource, max_resource, 5000)] + [
+        max_resource
+    ]
    for resource in eval_schedule:
        sampled_X_train = X_train.iloc[:resource]
        sampled_y_train = y_train[:resource]
@ -453,11 +456,21 @@ def obj_w_intermediate_report(resource_attr, X_train, X_test, y_train, y_test, m
            # do cleanup operation here
            return

+
 resource_attr = "sample_size"
 min_resource = 1000
 max_resource = len(y_train)
 analysis = tune.run(
-    partial(obj_w_intermediate_report, resource_attr, X_train, X_test, y_train, y_test, min_resource, max_resource),
+    partial(
+        obj_w_intermediate_report,
+        resource_attr,
+        X_train,
+        X_test,
+        y_train,
+        y_test,
+        min_resource,
+        max_resource,
+    ),
    config={
        "n_estimators": tune.lograndint(lower=4, upper=32768),
        "learning_rate": tune.loguniform(lower=1 / 1024, upper=1.0),
@ -470,12 +483,12 @@ analysis = tune.run(
    min_resource=min_resource,
    reduction_factor=2,
    time_budget_s=10,
-    num_samples = -1,
+    num_samples=-1,
 )
 ```

 - If you would like to do some cleanup opearation when the trial is stopped
-by the scheduler, you can do it when you catch the `StopIteration` (when not using ray) or `SystemExit` (when using ray) exception explicitly.
+  by the scheduler, you can do it when you catch the `StopIteration` (when not using ray) or `SystemExit` (when using ray) exception explicitly.

 ### Warm start

@ -495,17 +508,19 @@ inform `tune.run()`.
 def simple_obj(config):
    return config["a"] + config["b"]

+
 from flaml import tune
+
 config_search_space = {
    "a": tune.uniform(lower=0, upper=0.99),
-    "b": tune.uniform(lower=0, upper=3)
+    "b": tune.uniform(lower=0, upper=3),
 }

 points_to_evaluate = [
-    {"b": .99, "a": 3},
-    {"b": .99, "a": 2},
-    {"b": .80, "a": 3},
-    {"b": .80, "a": 2},
+    {"b": 0.99, "a": 3},
+    {"b": 0.99, "a": 2},
+    {"b": 0.80, "a": 3},
+    {"b": 0.80, "a": 2},
 ]
 evaluated_rewards = [3.99, 2.99]

@ -522,11 +537,12 @@ analysis = tune.run(

 ### Reproducibility

-By default, there is randomness in our tuning process (for versions <= 0.9.1). If reproducibility is desired, you could manually set a random seed before calling `tune.run()`. For example, in the following code, we call `np.random.seed(100)` to set the random seed.
+By default, there is randomness in our tuning process (for versions \<= 0.9.1). If reproducibility is desired, you could manually set a random seed before calling `tune.run()`. For example, in the following code, we call `np.random.seed(100)` to set the random seed.
 With this random seed, running the following code multiple times will generate exactly the same search trajectory. The reproducibility can only be guaranteed in sequential tuning.

 ```python
 import numpy as np
+
 np.random.seed(100)  # This line is not needed starting from version v0.9.2.
 analysis = tune.run(
    simple_obj,
@ -537,12 +553,14 @@ analysis = tune.run(
 ```

 ### Lexicographic Objectives
+
 We support tuning multiple objectives with lexicographic preference by providing argument `lexico_objectives` for `tune.run()`.
 `lexico_objectives` is a dictionary that contains the following fields of key-value pairs:
- - `metrics`: a list of optimization objectives with the orders reflecting the priorities/preferences of the objectives.
- - `modes`: (optional) a list of optimization modes (each mode either "min" or "max") corresponding to the objectives in the metric list. If not provided, we use "min" as the default mode for all the objectives.
- - `tolerances`: (optional) a dictionary to specify the optimality tolerances on objectives. The keys are the metric names (provided in "metrics"), and the values are the absolute/percentage tolerance in the form of numeric/string.
- - `targets`: (optional) a dictionary to specify the optimization targets on the objectives. The keys are the metric names (provided in "metric"), and the values are the numerical target values.
+
+- `metrics`: a list of optimization objectives with the orders reflecting the priorities/preferences of the objectives.
+- `modes`: (optional) a list of optimization modes (each mode either "min" or "max") corresponding to the objectives in the metric list. If not provided, we use "min" as the default mode for all the objectives.
+- `tolerances`: (optional) a dictionary to specify the optimality tolerances on objectives. The keys are the metric names (provided in "metrics"), and the values are the absolute/percentage tolerance in the form of numeric/string.
+- `targets`: (optional) a dictionary to specify the optimization targets on the objectives. The keys are the metric names (provided in "metric"), and the values are the numerical target values.

 In the following example, we want to minimize `val_loss` and `pred_time` of the model where `val_loss` has high priority. The tolerances for `val_loss` and `pre_time` are 0.02 and 0 respectively. We do not set targets for these two objectives and we set them to -inf for both objectives.

@ -551,7 +569,7 @@ lexico_objectives = {}
 lexico_objectives["metrics"] = ["val_loss", "pred_time"]
 lexico_objectives["modes"] = ["min", "min"]
 lexico_objectives["tolerances"] = {"val_loss": 0.02, "pred_time": 0.0}
-lexico_objectives["targets"] = {"val_loss": -float('inf'), "pred_time": -float('inf')}
+lexico_objectives["targets"] = {"val_loss": -float("inf"), "pred_time": -float("inf")}

 # provide the lexico_objectives to tune.run
 tune.run(..., search_alg=None, lexico_objectives=lexico_objectives)
@ -562,15 +580,16 @@ We also supports providing percentage tolerance as shown below.
 ```python
 lexico_objectives["tolerances"] = {"val_loss": "10%", "pred_time": "0%"}
 ```
+
 NOTE:

 1. When lexico_objectives is not None, the arguments metric, mode, will be invalid, and flaml's tune uses CFO as the `search_alg`, which makes the input (if provided) `search_alg` invalid.

-2. This is a new feature that will be released in version 1.1.0 and is subject to change in the future version.
+1. This is a new feature that will be released in version 1.1.0 and is subject to change in the future version.

 ## Hyperparameter Optimization Algorithm

-To tune the hyperparameters toward your objective, you will want to use a hyperparameter optimization algorithm which can help suggest hyperparameters with better performance (regarding your objective). `flaml` offers two HPO methods: CFO and BlendSearch. `flaml.tune` uses BlendSearch by default when the option [blendsearch] is installed.
+To tune the hyperparameters toward your objective, you will want to use a hyperparameter optimization algorithm which can help suggest hyperparameters with better performance (regarding your objective). `flaml` offers two HPO methods: CFO and BlendSearch. `flaml.tune` uses BlendSearch by default when the option \[blendsearch\] is installed.

 <!-- ![png](images/CFO.png) | ![png](images/BlendSearch.png)
 :---:|:---: -->
@ -592,8 +611,8 @@ FLOW<sup>2</sup> only requires pairwise comparisons between function values to p

 The GIFs attached below demonstrate an example search trajectory of FLOW<sup>2</sup> shown in the loss and evaluation cost (i.e., the training time ) space respectively. FLOW<sup>2</sup> is used in tuning the # of leaves and the # of trees for XGBoost. The two background heatmaps show the loss and cost distribution of all configurations. The black dots are the points evaluated in FLOW<sup>2</sup>. Black dots connected by lines are points that yield better loss performance when evaluated.

-![gif](images/heatmap_loss_cfo_12s.gif) | ![gif](images/heatmap_cost_cfo_12s.gif)
-:---:|:---:
+| ![gif](images/heatmap_loss_cfo_12s.gif) | ![gif](images/heatmap_cost_cfo_12s.gif) |
+| :-------------------------------------: | :-------------------------------------: |

 From the demonstration, we can see that (1) FLOW<sup>2</sup> can quickly move toward the low-loss region, showing good convergence property and (2) FLOW<sup>2</sup> tends to avoid exploring the high-cost region until necessary.

@ -643,7 +662,7 @@ In hyperparameter optimization, a larger search space is desirable because it is

 For more technical details, please check our papers.

-* [Frugal Optimization for Cost-related Hyperparameters](https://arxiv.org/abs/2005.01571). Qingyun Wu, Chi Wang, Silu Huang. AAAI 2021.
+- [Frugal Optimization for Cost-related Hyperparameters](https://arxiv.org/abs/2005.01571). Qingyun Wu, Chi Wang, Silu Huang. AAAI 2021.

 ```bibtex
@inproceedings{wu2021cfo,
@ -654,7 +673,7 @@ For more technical details, please check our papers.
 }
 ```

-* [Economical Hyperparameter Optimization With Blended Search Strategy](https://www.microsoft.com/en-us/research/publication/economical-hyperparameter-optimization-with-blended-search-strategy/). Chi Wang, Qingyun Wu, Silu Huang, Amin Saied. ICLR 2021.
+- [Economical Hyperparameter Optimization With Blended Search Strategy](https://www.microsoft.com/en-us/research/publication/economical-hyperparameter-optimization-with-blended-search-strategy/). Chi Wang, Qingyun Wu, Silu Huang, Amin Saied. ICLR 2021.

 ```bibtex
@inproceedings{wang2021blendsearch,
@ -665,7 +684,7 @@ For more technical details, please check our papers.
 }
 ```

-* [Targeted Hyperparameter Optimization with Lexicographic Preferences Over Multiple Objectives](https://openreview.net/forum?id=0Ij9_q567Ma). Shaokun Zhang, Feiran Jia, Chi Wang, Qingyun Wu. ICLR 2023 (notable-top-5%).
+- [Targeted Hyperparameter Optimization with Lexicographic Preferences Over Multiple Objectives](https://openreview.net/forum?id=0Ij9_q567Ma). Shaokun Zhang, Feiran Jia, Chi Wang, Qingyun Wu. ICLR 2023 (notable-top-5%).

 ```bibtex
@inproceedings{zhang2023targeted,
--- a/website/docs/Use-Cases/Zero-Shot-AutoML.md
+++ b/website/docs/Use-Cases/Zero-Shot-AutoML.md
@ -3,12 +3,13 @@
 `flaml.default` is a package for zero-shot AutoML, or "no-tuning" AutoML. It uses [`flaml.AutoML`](/docs/reference/automl/automl#automl-objects) and [`flaml.default.portfolio`](/docs/reference/default/portfolio) to mine good hyperparameter configurations across different datasets offline, and recommend data-dependent default configurations at runtime without expensive tuning.

 Zero-shot AutoML has several benefits:
-* The computation cost is just training one model. No tuning is involved.
-* The decision of hyperparameter configuration is instant. No overhead to worry about.
-* Your code remains the same. No breaking of the existing workflow.
-* It requires less input from the user. No need to specify a tuning budget etc.
-* All training data are used for, guess what, training. No need to worry about holding a subset of training data for validation (and overfitting the validation data).
-* The offline preparation can be customized for a domain and leverage the historical tuning data. No experience is wasted.
+
+- The computation cost is just training one model. No tuning is involved.
+- The decision of hyperparameter configuration is instant. No overhead to worry about.
+- Your code remains the same. No breaking of the existing workflow.
+- It requires less input from the user. No need to specify a tuning budget etc.
+- All training data are used for, guess what, training. No need to worry about holding a subset of training data for validation (and overfitting the validation data).
+- The offline preparation can be customized for a domain and leverage the historical tuning data. No experience is wasted.

 ## How to Use at Runtime

@ -31,10 +32,11 @@ from flaml.default import LGBMRegressor
 All the other code remains the same. And you are expected to get a equal or better model in most cases.

 The current list of "flamlized" learners are:
-* LGBMClassifier, LGBMRegressor.
-* XGBClassifier, XGBRegressor.
-* RandomForestClassifier, RandomForestRegressor.
-* ExtraTreesClassifier, ExtraTreesRegressor.
+
+- LGBMClassifier, LGBMRegressor.
+- XGBClassifier, XGBRegressor.
+- RandomForestClassifier, RandomForestRegressor.
+- ExtraTreesClassifier, ExtraTreesRegressor.

 ### What's the magic behind the scene?

@ -50,7 +52,12 @@ Yes. You can use `suggest_hyperparams()` to find the suggested configuration. Fo
 from flaml.default import LGBMRegressor

 estimator = LGBMRegressor()
-hyperparams, estimator_name, X_transformed, y_transformed = estimator.suggest_hyperparams(X_train, y_train)
+(
+    hyperparams,
+    estimator_name,
+    X_transformed,
+    y_transformed,
+) = estimator.suggest_hyperparams(X_train, y_train)
 print(hyperparams)
 ```

@ -60,10 +67,17 @@ If you would like more control over the training, use an equivalent, open-box wa
 from flaml.default import preprocess_and_suggest_hyperparams

 X, y = load_iris(return_X_y=True, as_frame=True)
-X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
-hyperparams, estimator_class, X_transformed, y_transformed, feature_transformer, label_transformer = preprocess_and_suggest_hyperparams(
-    "classification", X_train, y_train, "lgbm"
+X_train, X_test, y_train, y_test = train_test_split(
+    X, y, test_size=0.33, random_state=42
 )
+(
+    hyperparams,
+    estimator_class,
+    X_transformed,
+    y_transformed,
+    feature_transformer,
+    label_transformer,
+) = preprocess_and_suggest_hyperparams("classification", X_train, y_train, "lgbm")
 model = estimator_class(**hyperparams)  # estimator_class is lightgbm.LGBMClassifier
 model.fit(X_transformed, y_train)  # LGBMClassifier can handle raw labels
 X_test = feature_transformer.transform(X_test)  # preprocess test data
@ -79,6 +93,7 @@ Zero Shot AutoML is fast. If tuning from the recommended data-dependent configur

 ```python
 from flaml import AutoML
+
 automl = AutoML()
 automl_settings = {
    "task": "classification",
@ -133,6 +148,7 @@ Read the next section to understand how to generate these files if you would lik
 ## How to Prepare Offline

 This section is intended for:
+
 1. AutoML providers for a particular domain.
 1. Data scientists or engineers who need to repeatedly train models for similar tasks with varying training data.