2019, Vol.48, No.8

We examined a virtual simulation scheme for reaction condition optimization using machine learning for a small number of experiments with nine reaction conditions, consisting of five continuous and four discrete variables. Simulations were performed for predicting product yields in a synthetic reaction of tetrasilabicyclo[1.1.0]but-1(3)-ene (SiBBE). The performances in terms of accuracy and efficiency in the simulations and the chemical implications of the results were discussed.

Optimization of reaction conditions in any chemical reaction is necessary to obtain high product yield, selectivity, and/or the desired properties. The optimization process is generally performed for each reaction system because experimental results are highly sensitive to reaction conditions. In addition, optimization attempts require considerable time, labor, and costs and depend on the chemist’s experiences and intuition. Thus, development of rational and efficient optimization schemes for reaction conditions is desired to determine appropriate conditions rapidly.

So far, several optimization schemes have been proposed from two different perspectives. The first approach is the analysis of more than thousands of experimental data points. In recent studies, for example, the experimental data extracted from a reaction database1 or obtained using special experimental equipment24 were investigated. The second approach is the development of algorithm-assisted optimization schemes, which suggest reaction conditions for better results, with a small number of experiments.522

The optimization scheme for a small number of experiments requires careful consideration of different types of reaction conditions, which include continuous and discrete (discontinuous) variables. Various optimizations have been performed for reaction systems consisting of only continuous conditions510 or only discrete ones.1113 For example, the schemes for flow and microfluidic reactor systems have achieved automatic and effective optimization1419 through real-time monitoring of the reactions and continuous changes of the reaction conditions. However, there are only a few studies on reaction conditions that include both continuous and discrete conditions.2022 These optimizations have been performed in flow reactors with up to three continuous and discrete variables.

This study attempted to investigate virtual reaction condition optimization with a small number of batch experiments and high-dimensional conditions. Here, we focused on a synthetic reaction of tetrasilabicyclo[1.1.0]but-1(3)-ene (SiBBE), shown in Scheme 1.23 The conditions include five continuous and four discrete variables. First, we constructed a predictive function, which can predict product yields from reaction conditions using machine learning. Subsequently, we analyzed the function and discussed important factors contributing to a high product yield. Finally, we performed virtual optimization of the reaction conditions using the proposed scheme.

Machine learning was performed using 57 experimental data points. The experiments with nine reaction conditions were conducted: the scale of reactants, solvents, concentration of reactants, temperature, reaction time, reductants, equivalent mass of reductants, stirring method, and product extraction method. Here, the stirring method refers to the use of nichrome wires in a stirring process, which scratches the surface of dispersed Na in the solvent. Detailed information about the reaction conditions and the corresponding product yields is provided in Supporting Information.

Before machine learning, data preprocessing was performed based on a usual method in the field of data science. The reaction conditions were divided into continuous and discrete data. The continuous data included the scale of reactants, concentration of reactants, temperature, reaction time, and equivalent mass of reductants. The discrete data included the solvents, reductants used, stirring method, and product extraction method. The discrete data in each reaction condition were transformed into binary values: zero or one. For example, data about the solvent were converted into four-dimensional binary values because the experiments used four solvents: tetrahydrofuran (THF), diethyl ether (Et2O), toluene, and hexane. Each condition was scaled to have a mean value of zero and a deviation value of one.

Using the preprocessed data, machine learning was performed to investigate whether the product yields can be predicted from the reaction conditions. Twenty regression methods were adopted to assess the accuracy. The mean absolute errors (MAEs) and correlation coefficients between the experimental and predicted product yields were used as the evaluation metrics. The accuracy was estimated through double cross-validation to avoid an overestimation of the performance. In the double cross-validation, the data were divided into three sets: test, validation, and training sets. The accuracy of the prediction was evaluated using only the test set. The hyperparameters for regression methods were determined using the validation and training sets to minimize the MAE for the validation sets. The test set was generated using the leave-one-out (LOO) method. The validation and training sets were constructed using five-fold cross-validation for each LOO process.

Table 1 summarizes the MAEs and correlation coefficients for each regression method. We adopted 10 linear and 10 nonlinear regression methods. The linear regression methods include ordinal linear, least absolute shrinkage and selection operator (LASSO), ridge, elastic net, Huber, partial least squares (PLS), Bayesian ridge, automatic relevance determination (ARD) regressions, kernel ridge regression (KRR) with linear kernel, and support vector regression (SVR) with linear kernel. The nonlinear regression methods include Gaussian process regression, KRR with polynomial, radial basis function (rbf), and sigmoid kernel, SVR with polynomial, rbf, and sigmoid kernel, decision tree regression, random forest regression, and AdaBoost decision tree regression. The linear methods showed MAEs in the range 6.61 to 8.11 and correlation coefficients in the range 0.53 to 0.70. The nonlinear ones showed MAEs in the range 6.52 to 9.26 and correlation coefficients in the range 0.32 to 0.67. The linear regression methods provide the same or better results in terms of accuracy as compared with the nonlinear ones. This indicates that the relationships between the reaction conditions and product yields are approximately linear for the present reaction data. This also suggests that the use of nonlinear regression for the small number of data points in a large number of variables tends to cause overfitting. The KRR with linear kernel showed the highest correlation coefficient, 0.70, with an MAE of 6.61%. This indicates that the model can provide qualitative predictions of the product yields.

Table
Table 1. Statistical MAEs and correlation coefficients in 10 linear and 10 nonlinear regression methods.
Table 1. Statistical MAEs and correlation coefficients in 10 linear and 10 nonlinear regression methods.
Type Method MAE
%
Correlation
coefficient
Linear Linear 7.47 0.61
LASSO 6.93 0.65
Ridge 7.42 0.62
Elastic net 6.88 0.66
Huber 7.13 0.64
PLS 7.05 0.65
Bayesian ridge 6.87 0.66
ARD 7.28 0.61
KRR (linear kernel) 6.61 0.70
SVR (linear kernel) 8.11 0.53
Nonlinear Gaussian process 6.85 0.67
KRR (polynomial kernel) 7.21 0.61
KRR (rbf kernel) 7.79 0.52
KRR (sigmoid kernel) 7.05 0.61
SVR (polynomial kernel) 8.83 0.32
SVR (rbf kernel) 6.90 0.66
SVR (sigmoid kernel) 6.52 0.67
Decision tree 9.26 0.34
Random forest 7.59 0.56
AdaBoost decision tree 7.67 0.56

Linear correlations between the reaction conditions and product yields can be analyzed using the coefficients in linear regressions. Here, we focused on LASSO because the scheme has an advantage in that it can force coefficients with small contributions to the product yields to be zero. Table 2 shows the coefficient for each condition. The positive and negative signs of the coefficients indicate positive and negative contributions to the product yields, respectively. From the results, 10 reaction conditions have non-zero values, which indicate major contributions to the product yields. The other six reaction conditions have zero values, which indicate negligible contributions. The results suggested that smaller scale, stirring with nichrome wire, longer reaction time, and lower temperature improve the product yields, as can be expected. For reductants and solvents, only the uses of prepared Na and hexane provide positive contributions to the product yields. The coefficients for the other reductants and solvents have negative or zero values. The other reaction conditions such as the concentration of reactant, equivalent mass of reductants, and extraction method have only slight contributions to the product yields. This analysis can be useful for a wide range of reaction conditions including discrete variables.

Table
Table 2. Regression coefficients in LASSO. Reaction conditions are arranged in the descending order of the absolute values.
Table 2. Regression coefficients in LASSO. Reaction conditions are arranged in the descending order of the absolute values.
Condition Coefficient
Scale −0.46
Stirring with nichrome wire 0.38
Time 0.32
Temperature −0.30
KC8 (Reductant) −0.30
NaK2 (Reductant) −0.24
Na (Reductant, prepared) 0.18
Hexane (Solvent) 0.13
Et2O (Solvent) −0.09
LiNaph (Reductant) −0.09
Concentration of reactants 0.00
THF (Solvent) 0.00
Na (Reductant, commercially available) 0.00
Hot toluene (Extraction) 0.00
Equivalent mass of reductants 0.00
Soxhlets (Extraction) 0.00

Finally, we performed a virtual reaction condition optimization simulation. Scheme 2 shows the algorithms in detail. The algorithms aimed to maximize an objective value corresponding to a given reaction condition. Here, the objective value was defined as 2log(y) + log(m), where y and m represent yields and masses of the product, i.e. SiBBE, respectively. Here, m is defined as the product of the reactant scale and y. If we adopt y as an objective value, reactions in smaller scale become favorable. If we adopt m as an objective value, large-scale reactions can increase the masses but decrease the yields. Experimental chemists attempted to determine a reaction condition with high yield and large mass simultaneously. The best reaction condition determined by the experimental chemists of the authors was the 50th one.23 The objective function was designed to maximize the value at the reaction condition. The behaviors of simulations with other objective functions are described in Supporting Information.

We performed one random simulation and three optimization simulations based on machine learning. The random method, termed Method R, iteratively selected an experimental condition in a random manner and evaluated a corresponding objective value. The other three methods for the optimization were examined because the efficiency of the simulation strongly depends on the choice of initial data. The first scheme, termed Method A, adopted five initial experimental data points, which are the same as the initial ones applied in experiments. A regression function was constructed to predict the corresponding product yields from the given reaction conditions. The function was applied to the remaining 52 reaction conditions to predict the product yields y and masses m. Subsequently, objective values, 2log(y) + log(m), were calculated. Next, a reaction condition that would provide the largest objective value in the prediction was chosen and added to the initial learning data. This process indicates a virtual determination of the subsequent experiment. Finally, the learning procedure was iteratively performed until all the data were learned. In this procedure, KRR with linear kernel was adopted as the machine learning method. The hyperparameter was determined through LOO for each iteration.

In the second scheme, termed Method B, five initial data points were randomly selected from 57 experimental data points. The third scheme, termed Method C, adopted five distant data points from 57 experimental data points in terms of the distances of reaction conditions. The procedure for the selection of five distant data points is explained as follows. First, 1,000 virtual reaction conditions were randomly generated in uniform distributions with a range of experimental conditions. Then, the data were classified into five clusters using the k-means algorithm. Five centroids for the clusters were determined. Finally, five experimental data points with the closest distances from the centroids were chosen as the initial data. The performances of Methods B, C, and R were estimated from the average of results in 1,000 virtual simulations because the results depend on the randomness of the initial selection.

Figure 1 shows the number of experiments required to obtain a target condition that provides the highest objective value. The results of Experiment and Method A are represented by black solid and dashed lines, respectively. The results of Methods B, C, and R are expressed by a histogram of 1,000 simulations. The vertical axis shows the frequencies for the histogram. The average performances of Methods B, C, and R are represented by red, blue, and gray dotted lines, respectively. The resulting numbers of experiments in the simulations were 16 for Method A, 17.1, 11.7, and 22.8 for Methods B, C, and R on average, respectively. Methods A, B, and C predicted the target condition more efficiently than Experiment, which required 50 experiments and Method R. Methods A and B showed similar efficiency. Method C showed better performance. These results indicate that the present scheme is promising for a rational optimization of reaction conditions. In addition, a uniform scatter in data space for the initial data is important for effective optimization.

In this study, virtual reaction condition optimization was performed based on machine learning for high-dimensional continuous and discrete conditions in a small number of experiments. The virtual simulation showed reasonable performances for reducing the number of trials in the experiments. The results indicate that linear regression and a simple optimization scheme are effective for high-dimensional reaction condition optimization both with continuous and discrete data. A good performance was obtained owing to the relatively high linear relationship between the reaction conditions and product yields in the present experimental data.

To apply this scheme with high efficiency, the following points should be considered. First, chemists should set reaction conditions properly and examine their ranges to be optimized before starting optimization experiments. Second, experimental data of the same quality, including little noise produced by equipment problems or human errors, are required because the robustness of machine learning considerably decreases with the inclusion of outliers. Moreover, failed experiments should be included in learning data. Here, the failed data mean undesired data for chemists, e.g. reaction conditions giving zero or low yield in this study. All the experimental data are mathematically useful for machine learning, as pointed out in ref 24. If the scheme is applied appropriately in suitable systems, which satisfy the requirements above, rational optimization might be performed sufficiently.

The author M.F. is supported by a Grant-in-Aid for Japan Society for the Promotion of Science Research Fellow. The author J.S. is grateful to the specific project investigation in PRESTO Program “Advanced Materials Informatics through Comprehensive Integration among Theoretical, Experimental, Computational and Data-Centric Sciences” of Japan Science and Technology Agency.

Supporting Information is available on https://doi.org/10.1246/cl.190267.

M. Fujinami

J. Seino

H. Nakai

T. Nukazawa

S. Ishida

T. Iwamoto