1. Introduction
Nowadays, there has been a growing trend in employing machine learning techniques to address challenges in the domain of seismic and structural engineering. Machine learning offers the potential to supplant the reliance on current empirical and semi-empirical prediction models, offering the advantage of highly accurate models. A novel artificial intelligence approach, known as ICA–XGBoost, has been employed in studies to forecast the strength of concrete containing recycled aggregates. This method combines the utilization of a meta-heuristic algorithm called ICA with the machine learning algorithm XGBoost. The outcomes demonstrated that this amalgamated algorithm outperformed other algorithms, yielding superior results [
1]. A comprehensive investigation was conducted to explore the expanding applications of machine learning in the subject of structural engineering. The research encompassed a systematic review of various machine learning techniques, machine learning libraries, as well as Python resources, codes, and datasets pertinent to structural engineering [
2]. A scholarly discussion centered on implementing a machine learning approach to calculate and optimize the modulus of elasticity of concrete containing recycled aggregates. A comparative analysis was conducted to assess the performance of the ensemble model against other algorithms, revealing that the ensemble model exhibited more precise predictions than the individual models [
3]. The algorithms of machine learning were utilized to predict the shear strength of beams containing concrete with recycled aggregates, both with and without shear reinforcement. The shear strength of reinforced concrete elements is obtained using the XGBoost model [
4,
5,
6]. In addition, researchers utilized an ensemble learning method to forecast the shear strength of deep reinforced concrete beams, both with and without reinforced web. The findings revealed that the ensemble method outperformed traditional machine learning methods, presenting a superior performance [
7].
Consumption of recycled aggregates as a replacement for natural aggregates in concrete preparation is recognized as an operative means to promote sustainability within the construction industry. Liu et al. [
8] applied machine learning models to forecast the stability of concrete containing recycled aggregates. The outcomes revealed that the artificial neural network (ANN) model achieved the uppermost level of predictive accuracy. Hu and Kwok [
9] employed machine learning techniques to predict the wind pressure distribution around circular cylinders. They found that the gradient boosting regression trees model had the most pronounced impact on predictive performance.
Pyakurel et al. [
10] employed machine learning techniques to predict landslides activated by seismic actions. The conclusions revealed that the trees classifier model exhibited a greater efficacy compared to other models. Feng et al. [
11] investigated the uncertainty of machine learning models when assessing the sensitivity of landslides caused by earthquakes. In assessing the design strength of cement-stabilized soft soil (cement soil) across diverse application environments, several field and indoor geotechnical tests are typically managed. However, these experiments often lead to inefficiencies in terms of resource utilization, cost, and time, while also posing significant environmental pollution challenges. The compressive strength of cement, the strength and hardness of cement-stabilized soils, has been obtained using different machine learning methods. The obtained results display that machine learning models are highly accurate in predicting the compressive strength of cement [
12,
13,
14].
Sayed et al. [
15] conducted a study utilizing machine learning models to forecast the axial compressive load of concrete columns with FRP encasement. They reported that the gradient boosting and random forest models achieved the highest accuracy in prediction. Nguyen and Ly [
16] conducted compressive strength and sensitivity analyses of fiber-reinforced self-compacting concrete (FRSCC) using machine learning models. The outcomes indicated that XGBoost exhibited the highest predictive performance. The estimation of mechanical properties of concrete is often a crucial requirement in design codes. The introduction of novel concrete mixes and applications has prompted scientists to seek reliable models for predicting mechanical strength. Chaabene et al. [
17] employed machine learning methods to predict the mechanical properties of concrete. Jiang and Zhao [
18] applied machine learning methods to the design of stainless steel bolted connections. The obtained results showed that the support vector machine has the finest accuracy and performance.
In the other study, the chloride diffusion coefficient of concrete is predicted by Taffese and Espinosal-Leal [
19] based on machine learning techniques. The outcomes revealed that the XGBoost model demonstrated the most predictive performance. Mousavi et al. [
20] applied machine learning methods to categorize the properties of wood derived from ultrasonic tests. Li et al. [
21] successfully determined the compressive strength of BFRC by a combined algorithm of kernel extreme learning machine (KELM) and genetic algorithm (GA). They found that the KELM–GA model exhibited strong predictive capabilities.
Sandeep et al. [
22] utilized machine learning techniques to predict the shear strength of reinforced concrete beams, presenting the capabilities of this approach. Kaveh et al. [
23] employed machine learning methods to predict the shear strength of FRP-reinforced concrete girders. They observed that the extreme gradient boosting model outperformed other machine learning models, demonstrating its superior predictive capabilities. Artificial neural networks (ANN) are used to determine the shear strength of flexural members reinforced, cold-formed steel structures and complex deformation of structural elements [
24,
25,
26].
Jiang et al. [
27] obtained the deterioration of a bridge through the hybrid method of whale algorithm with other machine learning. The results demonstrated that the combined model performed better than the simple model. Hwang et al. [
28] utilized machine learning models to predict seismic responses and classify structural collapse for ductile reinforced concrete buildings during seismic events, effectively accounting for the inherent uncertainty. The compressive strength can differ depending on the composition and ratio of the components and materials employed. Farooq et al. [
29] employed machine learning methods for high-performance prediction. The results indicated that the function of bagging and boosting methods had enhanced the response of the basic machine learning models. Concrete-encased steel columns (CES), commonly referred to as concrete and steel composite columns, exhibit excellent fire resistance attributed to the performance of concrete. Li et al. [
30] utilized the artificial neural network method to forecast fire resistance in composite columns. Predicting the nominal shear capacity of deep reinforced concrete beams with openings poses a complex challenge due to its highly nonlinear behavior. Li et al. [
31] investigated the progressive collapse performance of the planar frame structure with engineered cementation composites (ECC) under the removal of the middle column for normal concrete and ECC samples. The results have shown that the (ECC) sample has limited cracking, and progressive collapse performance is also improved. Li and Song [
32] utilized the stacking ensemble learning method to forecast the compressive strength of concrete incorporating rice husk ash. The results demonstrated that the proposed new model exhibited a superior performance compared to other algorithms.
This paper will use machine learning techniques for the prediction of deterioration components (DCs) of steel w-section beams. The source data are related to a Lignos and Krawinkler [
33] study that utilized analytical relations based on experimental tests to ascertain the deterioration components of steel w-section beams data, namely Pre-capping plastic rotation (θ
p, post-capping plastic rotation(θ
pc), and cumulative rotation capacity (Λ). These parameters are critical for the collapse evaluation of structural elements that require effective hysteretic models capable of summarizing the failure behavior of structural components. Backbone curves delineate the boundaries of the hysteretic response of these components, as depicted schematically in
Figure 1.
In this study, DCs are predicted through a stacking model. Specifically, three base learners, namely AdaBoost, Random Forest (RF), and XGBoost, are selected as primary predictors, with RF used as the meta-learner in the stacking model. Hyperparameter optimization is conducted using grid search and 5-fold cross-validation methods. The importance of features is assessed through the Shapley Additive Explanations model. The dataset comprises 157 laboratory samples pertaining to steel w-section beams, which were collected by Lignos and Krawinkler [
33]. Empirical relationships presented by Lignos and Krawinkler are considered for predicting DCs. A comparison between these empirical relationships and machine learning models reveals that the stacking model exhibits remarkable accuracy and performance.
3. Research Significance
The primary objective of earthquake engineering has always been to comprehend, predict, and prevent structural collapse. From a financial perspective, collapse refers to a state in which a building, its contents, and its functionality are utterly destroyed, leading to significant monetary loss. Moreover, collapse poses a threat to human safety, resulting in injuries and fatalities. Thus, it becomes imperative to evaluate the level of life safety, as it is a fundamental general concern. The assessment of structural collapse necessitates the use of hysteretic models capable of capturing the failures occurring in structural components. Backbone curves, representing the boundaries of hysteretic response, serve as a means to depict the deterioration components within structural members, as depicted schematically in
Figure 1. The deterioration components, encompassing θ
p = pre-capping plastic rotation, θ
pc = post-capping plastic, and Λ = cumulative rotation capacity, have a key role in providing necessary information about the deterioration characteristics of steel moment-resisting frames. To obtain comprehensive data regarding these parameters, a collection of laboratory tests is imperative. The dataset comprises 157 tests of steel w-section beams, thoughtfully compiled by Lignos and Krawinkler [
33]. Using experimental data, empirical relationships have been obtained for two types of beams: beams with other than reduced beam section (RBS) and beams with RBS [
33]. The resulting relationships are as follows:
The analytical relationships are derived from considerations of geometrical characteristics and material properties. These relations specifically pertain to sections of the W-section type. The resulting analytical equations encompass the following parameters:
h/tw is the web-depth-over-web-thickness ratio, Lb/ry is the ratio between beam unbraced length Lb over a radius of gyration, bf/2tf is the flange width-to-thickness ratio used for compactness, L/d is the shear span-to-depth ratio of the beam, d is the beam depth of the cross section, Fy is the expected yield strength of the flange of the beam, which is normalized by 50 ksi (typical nominal yield strength of structural us steel), and C1unit and C2unit are coefficients for unit conversion. They both are 1 if inches and ksi are used, and they are C1unit = 0.0254 and C2unit = 0.145 if d is the meter and Fy is in MPa.
This research employs machine learning techniques to determine the deterioration components of w-section steel beams. As Lignos and Krawinkler [
33] used five numbers of parameters (h/t
w, b
f/2t
f, L/d, d, L
b/r
y), these parameters have the most effect on the deterioration components. But the number of experimental data had similar input parameters; therefore, machine learning models made mistakes in training. For this purpose, three parameters (connection type, test configuration, and yield moment) have been added to the input. In addition to the parameters proposed by Lignos and Krawinkler [
33], this study introduces three additional parameters, namely connection type, test configuration, and yield moment (M
Y). The connection type encompasses approximately 29 distinct connection types, as detailed in
Table 1, while the test configuration includes around 8 different configurations listed in
Table 2. To incorporate the connection type and test configuration into the machine learning models, each type is assigned a corresponding label. For instance, the 29 connection types are designated with numbers 1 to 29, and the 8 formation types are assigned numbers 1 to 8 [
40].
4. Data Preprocessing
The current investigation centers around a dataset derived from laboratory experiments [
33]. The number of laboratory data is 157. Among the 157 data, some data are similar, and some others are not reported, so the averaging method has been used for the data. Thus, there are 96 samples available for θ
p, 91 samples for θ
pc, and 96 samples for Λ. The experimental collected data can be accessed in the Lignos thesis dissertation [
40]. The input data considered in this study encompass several factors, including the web-depth-over-web-thickness ratio (h/t
w), the ratio between beam unbraced length L
b over a radius of gyration (L
b/r
y), the flange width-to-thickness ratio used for compactness (b
f/2t
f), the shear span-to-depth ratio of the beam (L/d), the beam depth of the cross section (d), connection type, test configuration, and yield moment (M
y). The outputs of interest consist of θ
p, θ
pc, and Λ. An overview of the features is presented in
Table 3. In total, there are eight types of input parameters and three types of output parameters under consideration.
5. Model Building and Evaluation
Prior to extending the model, the dataset is divided into two subsets: the training data and the test data. The training set was utilized to train the employed model, while the test set was applied to assess the operation of the constructed model. In this paper, 90% of the data was assigned to the training set, and the remaining 10% constituted the test set. Hyperparameters play a pivotal role in determining the model’s performance. To achieve optimal performance, an optimization method can be used to determine the hyperparameters of the machine learning model. This ensures that the model operates at its best capacity. Accordingly, the efficacy of the utilized model’s feature is enhanced. The optimization of hyperparameters for the machine learning model is achieved through a combination of grid search and 5-fold cross-validation. The grid search method involves evaluating all possible combinations of hyperparameters, as opposed to random sampling. Meanwhile, the cross-validation technique entails dividing the dataset into K parts and performing K iterations, wherein each time, one of the K parts is designated as the test set, and the remaining K-1 parts serve as training data. The evaluation results obtained from each iteration are then averaged to stipulate the final evaluation result. For the present study, a value of k = 5 is employed for the cross-validation process. The performance evaluation criteria chosen for this study consist of the coefficient of determination (R
2) and root-mean-square error (RMSE), as represented by Equations (11) and (12). The coefficient of determination (R
2) quantifies the relationship between the predicted and actual values, yielding a value within the range of 0 to 1. In these equations, M denotes the total number of samples,
represents the real value of the data, y
j shows the predicted value of the data, and
stands for the average of the predicted values.