MultiBgolearn: Multi-Objective Bayesian Global Optimization for Materials Design

Overview

MultiBgolearn is a Python package designed for multi-objective Bayesian global optimization (MOBO), specifically tailored for materials design. It extends the functionalities of the Bgolearn package, which focuses on single-objective optimization, by enabling the simultaneous optimization of multiple material properties. This makes MultiBgolearn highly suitable for real-world applications where trade-offs between competing objectives are common.

The repository provides the source code for the MultiBgolearn package along with several MOBO algorithms.

Features

Implements multiple MOBO algorithms, including Expected Hypervolume Improvement (EHVI), Probability of Improvement (PI), and Upper Confidence Bound (UCB).
Supports the simultaneous optimization of multiple objectives, making it ideal for materials with competing property targets.
Offers flexible surrogate model selection, with options such as RandomForest, GradientBoosting, SVR, GaussianProcess, and more.
Provides automatic or user-defined selection of surrogate models for optimization.
Includes bootstrap iterations for uncertainty quantification in model predictions.

Installation

To install MultiBgolearn, use the following command:

pip install MultiBgolearn

Usage

MultiBgolearn is designed for ease of use in materials design projects. Below is an example of how to use it:

Example

from MultiBgolearn import bgo

# Define your dataset and virtual space paths
dataset_path = './data/dataset.csv'
VS_path = './virtual_space/'

# Set the number of objectives (e.g., 3 for three-objective optimization)
object_num = 3

# Apply Multi-Objective Bayesian Global Optimization
VS_recommended, improvements, index = bgo.fit(
    dataset_path, 
    VS_path, 
    object_num, 
    max_search=True, 
    method='EHVI', 
    assign_model='GaussianProcess', 
    bootstrap=5
)

Parameters

dataset_path (str): The path to the dataset containing both features and response variables.
VS_path (str): The path to the virtual space where candidate data for optimization is stored.
object_num (int): The number of objectives (target properties) to optimize.
max_search (bool, optional, default=True): Whether to maximize (True) or minimize (False) the objectives.
method (str, optional, default=EHVI): The optimization method. Supported methods:
'EHVI': Expected Hypervolume Improvement
'PI': Probability of Improvement
'UCB': Upper Confidence Bound
assign_model (str or False, optional, default=False): Specify the surrogate model:
'RandomForest'
'GradientBoosting'
'SVR'
'GaussianProcess'
False: The surrogate model is chosen automatically.
bootstrap (int, optional, default=5): Number of bootstrap iterations for uncertainty quantification.

Return

The fit method returns a tuple: - VS_recommended: The recommended data point from the virtual space. - improvements: The calculated improvements based on the chosen optimization method. - index: The index of the recommended data point in the virtual space.

Notes

The selected method will influence how the algorithm balances different objectives during optimization.

Algorithms

MultiBgolearn includes several optimization strategies:

Expected Hypervolume Improvement (EHVI): Focuses on maximizing the volume of the objective space dominated by the solutions.
Probability of Improvement (PI): Selects points with the highest probability of improving over the best known solution.
Upper Confidence Bound (UCB): Explores points with the highest upper confidence bound, balancing exploration and exploitation.

Contributions

We welcome contributions from the community! Please feel free to open issues or submit pull requests.

Issues: GitHub Issues
Pull Requests: GitHub Pull Requests

For questions or suggestions, please contact:

Bin Cao: bcao686@connect.hkust-gz.edu.cn
GitHub: MultiBgolearn GitHub Repository