Multi-Objective Optimization Examples

15. Multi-Objective Optimization Examples#

Note

This page provides comprehensive examples of multi-objective optimization using MultiBgolearn for materials design problems.

Warning

Virtual Space Requirements:

The virtual space (candidate points) should contain enough samples for optimization
Recommended: At least 100-1000 candidate points
The virtual space CSV file should contain only feature columns (no objective columns)
If your virtual space is too small or has format issues, you may encounter IndexError

Example: For a 3-feature alloy (Cu, Mg, Si), your virtual_space.csv should look like:

Cu,Mg,Si
1.5,0.5,0.3
1.6,0.6,0.4
...

(No Strength or Ductility columns in virtual space!)

## Example 1: Bi-Objective Alloy Design

Optimize an aluminum alloy for strength and ductility simultaneously.

### Problem Setup

```python
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from MultiBgolearn import bgo

# Problem: Optimize Al-Cu-Mg-Si alloy for:
# 1. Maximize Strength (MPa)
# 2. Maximize Ductility (%)
# Note: MultiBgolearn only supports 2 objectives (object_num=2)

15.1. Prepare Training Data#

# Historical experimental data
# IMPORTANT: Only 2 objectives (Strength and Ductility) for MultiBgolearn
dataset = pd.DataFrame({
    'Cu': [2.0, 3.5, 1.8, 4.2, 2.8, 3.2, 2.5, 3.8, 2.2, 3.6],
    'Mg': [1.2, 0.8, 1.5, 0.9, 1.1, 1.3, 0.9, 1.0, 1.4, 0.7],
    'Si': [0.5, 0.7, 0.3, 0.8, 0.6, 0.4, 0.9, 0.5, 0.7, 0.6],
    'Strength': [250, 280, 240, 290, 265, 275, 255, 285, 245, 275],      # Objective 1
    'Ductility': [15, 12, 18, 10, 14, 13, 16, 11, 17, 12]                # Objective 2
})

print("Training Dataset:")
print(dataset.head())
print(f"\nDataset shape: {dataset.shape}")
print(f"Features: {list(dataset.columns[:3])}")
print(f"Objectives (2 only): {list(dataset.columns[3:])}")

# Save dataset
dataset.to_csv('alloy_dataset.csv', index=False)

15.2. Create Virtual Space#

# Generate candidate alloy compositions
# IMPORTANT: Create enough candidates (recommended: 500-2000)
np.random.seed(42)
n_candidates = 1000  # Sufficient number of candidates

# Method 1: Random sampling (recommended for large spaces)
virtual_space = pd.DataFrame({
    'Cu': np.random.uniform(1.5, 4.5, n_candidates),
    'Mg': np.random.uniform(0.5, 1.5, n_candidates),
    'Si': np.random.uniform(0.3, 1.0, n_candidates)
})

# IMPORTANT: Verify virtual space has enough candidates
print(f"\nVirtual space: {len(virtual_space)} candidate compositions")
if len(virtual_space) < 100:
    print("WARNING: Virtual space has fewer than 100 candidates!")
    print("Consider increasing n_candidates for better optimization.")
else:
    print(f"Virtual space size is adequate")

print(f"Virtual space shape: {virtual_space.shape}")
print(f"Virtual space columns: {list(virtual_space.columns)}")
print(f"Composition ranges:")
print(f"  Cu: {virtual_space['Cu'].min():.1f} - {virtual_space['Cu'].max():.1f}%")
print(f"  Mg: {virtual_space['Mg'].min():.1f} - {virtual_space['Mg'].max():.1f}%")
print(f"  Si: {virtual_space['Si'].min():.1f} - {virtual_space['Si'].max():.1f}%")

# Save virtual space (IMPORTANT: only feature columns, no objectives!)
virtual_space.to_csv('virtual_space.csv', index=False)
print("\n Virtual space saved to 'virtual_space.csv'")

15.3. Multi-Objective Optimization with EHVI#

# Expected Hypervolume Improvement optimization
print("Running EHVI optimization...")

# IMPORTANT: MultiBgolearn only supports bi-objective optimization (object_num=2)
# The fit() method uses positional arguments for the first 3 parameters
VS_recommended, improvements, index = bgo.fit(
    'alloy_dataset.csv',             # dataset_path (positional arg 1)
    'virtual_space.csv',             # VS_path (positional arg 2)
    2,                               # object_num (positional arg 3) - MUST BE 2
    max_search=True,                 # Maximize both objectives
    method='EHVI',                   # Expected Hypervolume Improvement
    assign_model='RandomForest',     # Surrogate model (RandomForest is more stable)
    bootstrap=10                     # Bootstrap iterations for uncertainty
)

print(f"\nEHVI Optimization Results:")
print(f"Recommended composition: {VS_recommended}")
print(f"  Cu: {VS_recommended[0]:.2f}%")
print(f"  Mg: {VS_recommended[1]:.2f}%")
print(f"  Si: {VS_recommended[2]:.2f}%")
print(f"  Total: {sum(VS_recommended):.2f}%")
print(f"\nHypervolume improvement: {improvements[index]:.4f}")
print(f"Recommended index in virtual space: {index}")

15.4. Compare Different MOBO Algorithms#

# Compare EHVI, PI, and UCB
algorithms = ['EHVI', 'qNEHVI', 'PI', 'UCB']
results = {}

for algorithm in algorithms:
    print(f"\nRunning {algorithm} optimization...")

    VS_rec, imp, idx = bgo.fit(
        'alloy_dataset.csv',             # dataset_path (positional arg 1)
        'virtual_space.csv',             # VS_path (positional arg 2)
        2,                               # object_num (positional arg 3) - MUST BE 2
        max_search=True,
        method=algorithm,
        assign_model='RandomForest',     # Use RandomForest for stability
        bootstrap=8
    )
    
    results[algorithm] = {
        'composition': VS_rec,
        'improvements': imp,
        'index': idx
    }
    
    print(f"{algorithm} recommendation: Cu={VS_rec[0]:.2f}%, Mg={VS_rec[1]:.2f}%, Si={VS_rec[2]:.2f}%")

# Compare results
print("\nAlgorithm Comparison:")
print("-" * 70)
print(f"{'Algorithm':<8} {'Cu (%)':<8} {'Mg (%)':<8} {'Si (%)':<8} {'Total (%)':<10}")
print("-" * 70)
for alg, result in results.items():
    comp = result['composition']
    total = sum(comp)
    print(f"{alg:<8} {comp[0]:<8.2f} {comp[1]:<8.2f} {comp[2]:<8.2f} {total:<10.2f}")

15.5. Pareto Front Analysis#

# Analyze the Pareto front from training data
def find_pareto_front(objectives):
    """Find Pareto optimal points."""
    n_points = objectives.shape[0]
    pareto_indices = []
    
    for i in range(n_points):
        is_pareto = True
        for j in range(n_points):
            if i != j:
                # Check if point j dominates point i
                if all(objectives[j] >= objectives[i]) and any(objectives[j] > objectives[i]):
                    is_pareto = False
                    break
        if is_pareto:
            pareto_indices.append(i)
    
    return np.array(pareto_indices)

# Extract objectives from training data (bi-objective only!)
objectives = dataset[['Strength', 'Ductility']].values
pareto_indices = find_pareto_front(objectives)
pareto_front = objectives[pareto_indices]

print(f"\nPareto Front Analysis:")
print(f"Found {len(pareto_indices)} Pareto optimal solutions from training data")
print("\nPareto optimal compositions:")
for i, idx in enumerate(pareto_indices):
    comp = dataset.iloc[idx]
    print(f"  {i+1}. Cu={comp['Cu']:.1f}%, Mg={comp['Mg']:.1f}%, Si={comp['Si']:.1f}% -> "
          f"Strength={comp['Strength']:.0f}, Ductility={comp['Ductility']:.0f}")

15.6. Visualization#

# Create comprehensive visualization for bi-objective optimization
fig = plt.figure(figsize=(15, 10))

# 1. Pareto Front (2D for bi-objective)
ax1 = fig.add_subplot(2, 3, 1)
ax1.scatter(objectives[:, 0], objectives[:, 1], alpha=0.6, s=50, c='lightblue', label='All solutions')
ax1.scatter(pareto_front[:, 0], pareto_front[:, 1], c='red', s=100, marker='*', label='Pareto front')

# Add recommended point (estimated values)
rec_objectives = [285, 13]  # Estimated for recommended composition
ax1.scatter(rec_objectives[0], rec_objectives[1], c='gold', s=150, marker='D',
           edgecolors='black', linewidth=2, label='EHVI recommendation')

ax1.set_xlabel('Strength (MPa)')
ax1.set_ylabel('Ductility (%)')
ax1.set_title('Bi-Objective Pareto Front')
ax1.legend()
ax1.grid(True, alpha=0.3)

# 2. Parallel coordinates plot (bi-objective)
ax2 = fig.add_subplot(2, 3, 2)
# Normalize objectives for parallel coordinates
obj_norm = (objectives - objectives.min(axis=0)) / (objectives.max(axis=0) - objectives.min(axis=0))
pareto_norm = (pareto_front - objectives.min(axis=0)) / (objectives.max(axis=0) - objectives.min(axis=0))

x_pos = [0, 1]
obj_names = ['Strength', 'Ductility']

# Plot all solutions
for i in range(len(obj_norm)):
    ax2.plot(x_pos, obj_norm[i], 'b-', alpha=0.3, linewidth=1)

# Highlight Pareto front
for i in range(len(pareto_norm)):
    ax2.plot(x_pos, pareto_norm[i], 'r-', alpha=0.8, linewidth=2)

ax2.set_xticks(x_pos)
ax2.set_xticklabels(obj_names)
ax2.set_ylabel('Normalized Value')
ax2.set_title('Parallel Coordinates (Bi-Objective)')
ax2.grid(True, alpha=0.3)

# 3. Algorithm comparison
ax3 = fig.add_subplot(2, 3, 3)
algorithms = list(results.keys())
compositions = [results[alg]['composition'] for alg in algorithms]
cu_values = [comp[0] for comp in compositions]
mg_values = [comp[1] for comp in compositions]
si_values = [comp[2] for comp in compositions]

x = np.arange(len(algorithms))
width = 0.25

ax3.bar(x - width, cu_values, width, label='Cu', alpha=0.8)
ax3.bar(x, mg_values, width, label='Mg', alpha=0.8)
ax3.bar(x + width, si_values, width, label='Si', alpha=0.8)

ax3.set_xlabel('Algorithm')
ax3.set_ylabel('Composition (%)')
ax3.set_title('Algorithm Recommendations')
ax3.set_xticks(x)
ax3.set_xticklabels(algorithms)
ax3.legend()
ax3.grid(True, alpha=0.3)

# 4. Objective distribution
ax4 = fig.add_subplot(2, 3, 4)
ax4.hist(objectives[:, 0], bins=15, alpha=0.6, label='Strength', color='blue')
ax4.axvline(pareto_front[:, 0].mean(), color='red', linestyle='--', linewidth=2, label='Pareto mean')
ax4.set_xlabel('Strength (MPa)')
ax4.set_ylabel('Frequency')
ax4.set_title('Strength Distribution')
ax4.legend()
ax4.grid(True, alpha=0.3)

# 5. Ductility distribution
ax5 = fig.add_subplot(2, 3, 5)
ax5.hist(objectives[:, 1], bins=15, alpha=0.6, label='Ductility', color='green')
ax5.axvline(pareto_front[:, 1].mean(), color='red', linestyle='--', linewidth=2, label='Pareto mean')
ax5.set_xlabel('Ductility (%)')
ax5.set_ylabel('Frequency')
ax5.set_title('Ductility Distribution')
ax5.legend()
ax5.grid(True, alpha=0.3)

# 6. Composition space
ax6 = fig.add_subplot(2, 3, 6)
scatter = ax6.scatter(dataset['Cu'], dataset['Mg'],
                     c=dataset['Strength'], s=dataset['Ductility']*10,
                     cmap='viridis', alpha=0.6, edgecolors='black')
ax6.set_xlabel('Cu (%)')
ax6.set_ylabel('Mg (%)')
ax6.set_title('Composition Space (color=Strength, size=Ductility)')
plt.colorbar(scatter, ax=ax6, label='Strength (MPa)')
ax6.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

15.7. Example 2: Bi-Objective Processing Optimization#

Optimize heat treatment parameters for hardness and toughness.

15.7.1. Problem Setup#

# Heat treatment optimization for:
# 1. Maximize Hardness (HV)
# 2. Maximize Toughness (J)
# Note: MultiBgolearn only supports 2 objectives (object_num=2)

processing_dataset = pd.DataFrame({
    'Temperature': [450, 500, 550, 480, 520, 470, 530, 490, 510, 460],
    'Time': [2, 4, 6, 3, 5, 2.5, 4.5, 3.5, 4.2, 2.8],
    'Cooling_Rate': [10, 20, 15, 25, 12, 18, 22, 14, 16, 19],
    'Hardness': [180, 220, 250, 200, 235, 190, 245, 210, 230, 185],      # Objective 1
    'Toughness': [45, 35, 25, 40, 30, 42, 28, 38, 32, 44]                # Objective 2
})

print("Processing Dataset:")
print(processing_dataset.head())
print(f"Objectives (2 only): Hardness, Toughness")

# Save dataset
processing_dataset.to_csv('processing_dataset.csv', index=False)

# Create virtual processing space
# IMPORTANT: Generate enough candidates (recommended: 500-2000)
np.random.seed(123)
n_conditions = 800  # Sufficient number of candidate conditions

virtual_processing_df = pd.DataFrame({
    'Temperature': np.random.uniform(440, 560, n_conditions),
    'Time': np.random.uniform(1.5, 6.5, n_conditions),
    'Cooling_Rate': np.random.uniform(8, 28, n_conditions)
})

virtual_processing_df.to_csv('virtual_processing.csv', index=False)

print(f"Virtual processing space: {len(virtual_processing_df)} conditions")
print(f"Virtual space size is adequate")

15.7.2. Bi-Objective Optimization#

# Optimize for 2 objectives (Hardness and Toughness)
# IMPORTANT: object_num MUST be 2 for MultiBgolearn
VS_recommended, improvements, index = bgo.fit(
    'processing_dataset.csv',        # dataset_path (positional arg 1)
    'virtual_processing.csv',        # VS_path (positional arg 2)
    2,                               # object_num (positional arg 3) - MUST BE 2
    max_search=True,                 # Maximize both objectives
    method='EHVI',                   # Expected Hypervolume Improvement
    assign_model='RandomForest',     # Use RandomForest for stability
    bootstrap=10
)

print(f"\nBi-Objective Optimization Results:")
print(f"Recommended processing conditions:")
print(f"  Temperature: {VS_recommended[0]:.0f}°C")
print(f"  Time: {VS_recommended[1]:.1f} hours")
print(f"  Cooling Rate: {VS_recommended[2]:.0f}°C/min")

print(f"\nExpected improvements for 2 objectives:")
print(f"  Hardness improvement: {improvements[0]:.1f} HV")
print(f"  Toughness improvement: {improvements[1]:.1f} J")

15.8. Example 3: Materials Discovery with Constraints#

Optimize a ceramic composition with constraints.

15.8.1. Problem Setup#

# Ceramic optimization for:
# 1. Maximize Strength (MPa)
# 2. Maximize Thermal Conductivity (W/mK)
# Note: MultiBgolearn only supports 2 objectives (object_num=2)

ceramic_dataset = pd.DataFrame({
    'Al2O3': [85, 90, 80, 95, 88, 92, 83, 89, 87, 91],
    'SiO2': [10, 5, 15, 3, 8, 4, 12, 7, 9, 6],
    'MgO': [5, 5, 5, 2, 4, 4, 5, 4, 4, 3],
    'Strength': [300, 350, 280, 380, 320, 360, 290, 340, 330, 365],              # Objective 1
    'Thermal_Conductivity': [25, 30, 20, 35, 28, 32, 22, 29, 27, 33]            # Objective 2
})

print("Ceramic Dataset:")
print(ceramic_dataset.head())
print(f"Objectives (2 only): Strength, Thermal_Conductivity")

# Constraint: compositions must sum to 100%
def check_ceramic_constraints(composition):
    """Check ceramic composition constraints."""
    al2o3, sio2, mgo = composition
    
    # Composition sum constraint (within tolerance)
    total = al2o3 + sio2 + mgo
    if not (99 <= total <= 101):
        return False
    
    # Individual component constraints
    if not (75 <= al2o3 <= 98):
        return False
    if not (2 <= sio2 <= 20):
        return False
    if not (1 <= mgo <= 8):
        return False
    
    return True

# Generate constrained virtual space
# IMPORTANT: Create enough candidates (recommended: 500-2000)
np.random.seed(456)
n_ceramic_candidates = 1000

virtual_ceramics = []
for _ in range(n_ceramic_candidates * 2):  # Generate extra to account for constraint filtering
    al2o3 = np.random.uniform(75, 98)
    sio2 = np.random.uniform(2, 20)
    mgo = np.random.uniform(1, 8)

    # Normalize to sum to 100%
    total = al2o3 + sio2 + mgo
    normalized = [al2o3 * 100/total, sio2 * 100/total, mgo * 100/total]

    if check_ceramic_constraints(normalized):
        virtual_ceramics.append(normalized)

    if len(virtual_ceramics) >= n_ceramic_candidates:
        break

virtual_ceramics_df = pd.DataFrame(virtual_ceramics, columns=['Al2O3', 'SiO2', 'MgO'])
print(f"Constrained virtual ceramic space: {len(virtual_ceramics_df)} compositions")
if len(virtual_ceramics_df) >= 500:
    print(f"Virtual space size is adequate")
else:
    print(f"WARNING: Only {len(virtual_ceramics_df)} candidates generated")

# Save data
ceramic_dataset.to_csv('ceramic_dataset.csv', index=False)
virtual_ceramics_df.to_csv('virtual_ceramics.csv', index=False)

15.8.2. Constrained Multi-Objective Optimization#

# Bi-objective ceramic optimization
# IMPORTANT: object_num MUST be 2 for MultiBgolearn
VS_recommended, improvements, index = bgo.fit(
    'ceramic_dataset.csv',           # dataset_path (positional arg 1)
    'virtual_ceramics.csv',          # VS_path (positional arg 2)
    2,                               # object_num (positional arg 3) - MUST BE 2
    max_search=True,
    method='EHVI',
    assign_model='RandomForest',     # Use RandomForest for stability
    bootstrap=8
)

print(f"Ceramic Optimization Results:")
print(f"Recommended composition:")
print(f"  Al2O3: {VS_recommended[0]:.1f}%")
print(f"  SiO2: {VS_recommended[1]:.1f}%")
print(f"  MgO: {VS_recommended[2]:.1f}%")
print(f"  Total: {sum(VS_recommended):.1f}%")

# Verify constraints
if check_ceramic_constraints(VS_recommended):
    print("✓ All constraints satisfied")
else:
    print("✗ Constraint violation detected")

print(f"Expected improvements:")
print(f"  Strength: +{improvements[0]:.1f} MPa")
print(f"  Thermal Conductivity: +{improvements[1]:.1f} W/mK")
print(f"  Thermal Shock Resistance: +{improvements[2]:.1f} cycles")

15.9. Example 4: Sensitivity Analysis#

Analyze how sensitive the optimization is to different factors.

15.9.1. Model Sensitivity#

# Compare different surrogate models for multi-objective optimization
# Note: RandomForest and GradientBoosting are generally more stable than GaussianProcess
models_to_test = ['RandomForest', 'GradientBoosting', 'LinearRegression']
model_results = {}

for model_name in models_to_test:
    print(f"\nTesting {model_name} for multi-objective optimization...")

    try:
        VS_rec, imp, idx = bgo.fit(
            'alloy_dataset.csv',             # dataset_path (positional arg 1)
            'virtual_space.csv',             # VS_path (positional arg 2)
            2,                               # object_num (positional arg 3) - MUST BE 2
            max_search=True,
            method='EHVI',
            assign_model=model_name,
            bootstrap=5  # Reduced for speed
        )
        
        model_results[model_name] = {
            'composition': VS_rec,
            'improvements': imp,
            'success': True
        }
        
        print(f"Success: Cu={VS_rec[0]:.2f}%, Mg={VS_rec[1]:.2f}%, Si={VS_rec[2]:.2f}%")
        
    except Exception as e:
        print(f"Failed: {str(e)}")
        model_results[model_name] = {'success': False, 'error': str(e)}

# Compare successful results
print("\nModel Comparison for Multi-Objective Optimization:")
print("-" * 80)
print(f"{'Model':<15} {'Cu (%)':<8} {'Mg (%)':<8} {'Si (%)':<8} {'Improvements':<20}")
print("-" * 80)

for model, result in model_results.items():
    if result['success']:
        comp = result['composition']
        imp = result['improvements']
        imp_str = f"[{imp[0]:.1f}, {imp[1]:.1f}, {imp[2]:.1f}]"
        print(f"{model:<15} {comp[0]:<8.2f} {comp[1]:<8.2f} {comp[2]:<8.2f} {imp_str:<20}")
    else:
        print(f"{model:<15} {'Failed':<40}")

15.9.2. Bootstrap Sensitivity#

# Test different bootstrap settings
bootstrap_values = [3, 5, 8, 10, 15]
bootstrap_results = {}

for bootstrap in bootstrap_values:
    print(f"\nTesting bootstrap = {bootstrap}...")

    VS_rec, imp, idx = bgo.fit(
        'alloy_dataset.csv',             # dataset_path (positional arg 1)
        'virtual_space.csv',             # VS_path (positional arg 2)
        2,                               # object_num (positional arg 3) - MUST BE 2
        max_search=True,
        method='EHVI',
        assign_model='RandomForest',     # Use RandomForest for stability
        bootstrap=bootstrap
    )
    
    bootstrap_results[bootstrap] = {
        'composition': VS_rec,
        'improvements': imp
    }

# Analyze bootstrap sensitivity
print("\nBootstrap Sensitivity Analysis:")
print("-" * 70)
print(f"{'Bootstrap':<10} {'Cu (%)':<8} {'Mg (%)':<8} {'Si (%)':<8} {'Avg Improvement':<15}")
print("-" * 70)

for bootstrap, result in bootstrap_results.items():
    comp = result['composition']
    avg_imp = np.mean(result['improvements'])
    print(f"{bootstrap:<10} {comp[0]:<8.2f} {comp[1]:<8.2f} {comp[2]:<8.2f} {avg_imp:<15.2f}")

15.10. Best Practices for Multi-Objective Optimization#

15.10.1. 1. Problem Formulation#

# Good practice: Clear objective definitions
objectives = {
    'Strength': {'type': 'maximize', 'unit': 'MPa', 'range': [200, 350]},
    'Ductility': {'type': 'maximize', 'unit': '%', 'range': [8, 20]},
    'Cost': {'type': 'minimize', 'unit': '$/kg', 'range': [80, 150]}
}

# Convert minimization to maximization
for obj_name, obj_info in objectives.items():
    if obj_info['type'] == 'minimize':
        print(f"Converting {obj_name} to maximization (negative values)")

15.10.2. 2. Data Preprocessing#

# Normalize objectives if they have very different scales
from sklearn.preprocessing import StandardScaler

def preprocess_objectives(data, objective_columns):
    """Preprocess objectives for multi-objective optimization."""
    scaler = StandardScaler()
    normalized_data = data.copy()
    normalized_data[objective_columns] = scaler.fit_transform(data[objective_columns])
    return normalized_data, scaler

# Example usage
# normalized_dataset, scaler = preprocess_objectives(dataset, ['Strength', 'Ductility', 'Cost'])

15.10.3. 3. Algorithm Selection Guidelines#

algorithm_guidelines = {
    'EHVI': {
        'best_for': '2-4 objectives',
        'pros': ['Theoretically sound', 'Balanced exploration'],
        'cons': ['Computationally expensive for >4 objectives'],
        'recommended_bootstrap': 8
    },
    'PI': {
        'best_for': 'Conservative optimization',
        'pros': ['Fast computation', 'Reliable improvements'],
        'cons': ['May not explore enough'],
        'recommended_bootstrap': 5
    },
    'UCB': {
        'best_for': 'Noisy objectives, >4 objectives',
        'pros': ['Uncertainty aware', 'Scalable'],
        'cons': ['Requires parameter tuning'],
        'recommended_bootstrap': 10
    }
}

# Print guidelines
for alg, info in algorithm_guidelines.items():
    print(f"\n{alg}:")
    print(f"  Best for: {info['best_for']}")
    print(f"  Recommended bootstrap: {info['recommended_bootstrap']}")

15.11. Troubleshooting#

15.11.1. Common Error: IndexError#

Error Message:

IndexError: index XXXX is out of bounds for axis 0 with size YYYY

Cause: This error occurs when the recommended index exceeds the size of your virtual space.

Solutions:

Check Virtual Space Size:

import pandas as pd
vs = pd.read_csv('virtual_space.csv')
print(f"Virtual space size: {len(vs)}")
print(f"Virtual space columns: {list(vs.columns)}")

Ensure Enough Candidates:
- Recommended: At least 100-1000 candidate points
- If you have < 100 points, expand your virtual space

Verify File Format:

# Virtual space should ONLY have feature columns
# CORRECT:
virtual_space = pd.DataFrame({
    'Cu': [...],
    'Mg': [...],
    'Si': [...]
})

# WRONG (includes objectives):
virtual_space = pd.DataFrame({
    'Cu': [...],
    'Mg': [...],
    'Si': [...],
    'Strength': [...],  # Remove this!
    'Ductility': [...]  # Remove this!
})

Check for Data Corruption:

# Verify no NaN or infinite values
vs = pd.read_csv('virtual_space.csv')
print(f"NaN values: {vs.isna().sum().sum()}")
print(f"Infinite values: {np.isinf(vs.values).sum()}")

15.11.2. Common Error: TypeError with GaussianProcess#