Bgolearn: A Computational Framework for Material Design Optimization

Bgolearn is an advanced computational tool designed to optimize material compositions and performance metrics. By leveraging experimental data, it facilitates the exploration and recommendation of optimal material compositions. This iterative process, integrating simulations with experimental results, accelerates material discovery.

Screenshot 2023-04-06 at 16 37 26

Features

Material Composition Optimization: Identifies the optimal compositions to maximize or minimize performance metrics.
Iterative Learning: Continuously updates recommendations by incorporating new experimental data.
Versatile Applications: Can be applied to both regression and classification tasks in material design.

Installation

Install Bgolearn via pip:

pip install Bgolearn

Check the installation:

pip show Bgolearn

To update Bgolearn, use:

pip install --upgrade Bgolearn

Usage

Parameters in `fit()`

fit(data_matrix, Measured_response, virtual_samples, Mission='Regression', Classifier=None, 
    noise_std=None, Kriging_model=None, opt_num=1, min_search=True, CV_test=False)

data_matrix: Input training dataset (X).
Measured_response: Target response (y) of the training dataset.
virtual_samples: Designed virtual samples for optimization.
Mission: Task type ('Regression' or 'Classification'), default is 'Regression'.
Classifier: Classifier for classification tasks. If not provided, defaults to 'GaussianProcess'. Supported classifiers include:
'GaussianProcess' (default)
'LogisticRegression'
'NaiveBayes'
'SVM'
'RandomForest'
noise_std: Optional; added to the diagonal of the kernel matrix to prevent numerical instability.
Kriging_model: Optional; supports machine learning models like SVM, Random Forest, AdaBoost, MLP, or custom user-defined models. Kriging models must implement a fit_pre method.
opt_num: Number of recommended candidates for the next iteration. Default is 1.
min_search: If True, searches for the global minimum; if False, searches for the global maximum.
CV_test: Cross-validation option. Can be 'LOOCV' (Leave-One-Out Cross-Validation) or an integer for k-fold CV.

Example: Regression

import Bgolearn.BGOsampling as BGOS
import pandas as pd

# Load data
data = pd.read_csv('data.csv')
x = data.iloc[:, :-1]
y = data.iloc[:, -1]

# Load virtual samples
vs = pd.read_csv('virtual_data.csv')

# Initialize Bgolearn
Bgolearn = BGOS.Bgolearn()

# Fit the model
Mymodel = Bgolearn.fit(data_matrix=x, Measured_response=y, virtual_samples=vs)

# Recommend next candidate using Expected Improvement (EI) method
Mymodel.EI()

Infill Criteria Methods

Expected Improvement (EI): Recommends the next candidate based on the Expected Improvement method. Reference paper.
EI with Plugin Method: Uses the Expected Improvement method with a "plugin" Reference paper.
Augmented EI: Uses Augmented Expected Improvement. Reference paper.
Expected Quantile Improvement (EQI): Recommends candidates based on Expected Quantile Improvement. Reference paper.
Upper Confidence Bound (UCB): Recommends candidates using the Upper Confidence Bound method. Reference paper.
Probability of Improvement (PoI): Recommends candidates using the Probability of Improvement method. Reference paper.
Predictive Entropy Search (PES): Recommends candidates using Predictive Entropy Search. Reference paper.
Knowledge Gradient (KG): Uses the Knowledge Gradient method for candidate recommendation. Reference paper.

Utility Function with Parameters

For methods like Augmented EI, you can provide custom parameters:

Mymodel.Augmented_EI(alpha=1, tao=0)

alpha: Tradeoff coefficient (default 1).

Mymodel.EQI(beta = 0.5,tao_new = 0)

beta: beta quantile number, default 0.5, recommended [0.2,0.8]
tao: noise standard deviation, default 0, recommended [0,1]

Mymodel.UCB(alpha=1)

alpha tradeoff coefficient, default 1, recommended [0,3]

Mymodel.PoI(tao = 0)

tao: improvement ratio (>=0) , default 0, recommended [0,0.3]

Mymodel.PES(sam_num = 500)

sam_num: number of optimal drawn from p(x*|D), default 500, recommended [100,1000]

Mymodel.Knowledge_G(MC_num = 50)

MC_num: number of Monte carlo sampling, default 50, recommended [50,300]

Classification

Bgolearn supports classification tasks using various classifiers:

Screenshot 2023-04-06 at 18 57 18

GaussianProcess (default)
LogisticRegression
NaiveBayes
SVM
RandomForest

Example for Classification

import Bgolearn.BGOsampling as BGOS
import pandas as pd

# Load data
data = pd.read_csv('data.csv')
x = data.iloc[:, :-1]
y = data.iloc[:, -1]

# Load virtual samples
vs = pd.read_csv('virtual_data.csv')

# Initialize Bgolearn for classification
Bgolearn = BGOS.Bgolearn()

# Fit the model with classification
Mymodel = Bgolearn.fit(data_matrix=x, Measured_response=y, virtual_samples=vs, Mission='Classification', Classifier='SVM')

# Obtain results
Mymodel.Least_cfd()  # Example using Expected Improvement

Acquisition Functions for Classification

Mymodel.Least_cfd(): Recommends the next candidate by Least Confidence method.
(Reference paper: Least Confidence Method, p : 022802-3)
Mymodel.Margin_S(): Recommends the next candidate by Margin Sampling method.
(Reference paper: Margin Sampling Method, p : 022802-3)
Mymodel.Entropy(): Recommends the next candidate by Entropy-based approach.
(Reference paper: Entropy-based Approach, p : 022802-3)

Single Target BGO with Multi-Objective Selection

For multi-objective optimization, BgoKit offers functionality for selecting the next candidate using multiple objectives.

from BgoKit import ToolKit

# vs is the virtual samples
# score_1, score_2 are outputs of Bgolearn
# score_1, _= Mymodel_1.EI() ; score_2, _= Mymodel_2.EI()

Model = ToolKit.MultiOpt(vs, [score_1, score_2])
Model.BiSearch()
Model.plot_distribution()

You can refer to the detailed example for more information.

Efficiency Testing

Bgolearn also supports efficiency comparisons across different acquisition functions. This can be affected by:

The type of optimized function
Distribution of training data
Fit of the Kriging model
Data noise
Computational budget

Methods for Efficiency Testing

Trail Method: Compares efficiency by evaluating the optimization path.
Opportunity Cost Method: Assesses efficiency based on opportunity costs.
Probability Density Function (PDF) Method: Uses PDFs to compare function efficiency.
Count Strategy: Evaluates efficiency by counting occurrences of performance improvements.

Example Code for Efficiency Testing

import Bgolearn.BGOsampling as BGOS
import numpy as np
import pandas as pd

# Define the true function
def function(X):
    X = np.array(X)
    Y = 0.013 * X**4 - 0.25 * X**3 + 1.61 * X**2 - 4.1 * X + 8
    return Y

# Load the value space of sampling points
vs = pd.read_csv('virtual_data.csv')

# Initialize Bgolearn
Bgolearn = BGOS.Bgolearn()

# Pass parameters to the function
Mymodel = Bgolearn.test(Ture_fun=function, Def_Domain=vs)

# Compare efficiency using different methods
Mymodel.Trail()
Mymodel.Opp_Cost()
Mymodel.Pdf()
Mymodel.Count()

For more detailed methods, refer to the Reference Paper on Utility Function Efficiency.

Test Parameters

Ture_fun: The function to evaluate.
Def_Domain: The domain of discrete functions, e.g., numpy.linspace(0, 11, 111).
Kriging_model: A callable Kriging model or a pre-defined model.
opt_num: Number of recommended candidates for the next iteration.
min_search: Indicates whether to search for the global minimum or maximum.

Contributions

Feel free to contribute by opening issues or submitting pull requests:

Issues: GitHub Issues
Pull Requests: GitHub Pull Requests

For questions or suggestions, contact:

Bin Cao: bcao686@connect.hkust-gz.edu.cn
GitHub: Bgolearn GitHub Repository