Geomechanics and Engineering A
Volume 42, Number 5, 2025, pages 321-332
DOI: 10.12989/gae.2025.42.5.321
Application of automated machine learning and clustering algorithm for data-driven site characterization: Predicting the soil-rock interface
Dongwoo Lim, Mijin Goo, Han-Saem Kim and Taeseo Ku
Abstract
The development of underground spaces requires detailed insight into subsurface conditions, particularly the soil–
rock interfaces, as this information is crucial for the effective design and safe construction of underground infrastructures.
Traditional geotechnical site investigations rely mainly on direct drilling and sampling; however, these methods yield data only
at specific investigation points, thus posing limitations in comprehensively capturing ground conditions across an entire area. To
address this limitation, various studies have aimed to predict unknown subsurface sections using existing borehole data.
Conventional methods use geospatial interpolation, while machine learning has emerged as a strong alternative. The selection
and proper tuning of an appropriate model are critical to achieving optimal performance. This study applies automated machine
learning, focusing on predicting soil-rock interfaces in unsampled regions using borehole data. AutoGluon is used as the
machine learning framework to automate data preprocessing, model selection, hyperparameter tuning, and model ensemble. For
this study, approximately 20,000 boreholes from the Seoul metropolitan area were collected and employed. Additionally, various
digital maps were used to extract input variables. To capture non-linearity among input variables, Uniform Manifold
Approximation and Projection were employed to reduce the dimensionality of the dataset, while Hierarchical Density-Based
Spatial Clustering of Applications and Noise was implemented as the clustering algorithm. When compared to a model tuned
using Bayesian optimization, AutoGluon exhibited superior predictive performance and reduced errors. Furthermore, although
the focus of this study is on predicting the soil-rock interface, the methodology can be extended to the prediction of other
geotechnical parameters.
Key Words
automated ML; clustering; data-driven; soil-rock interface; spatial prediction
Address
Dongwoo Lim: Department of Civil, Environmental and Plant Engineering, Konkuk University, 120 Neungdong-ro,
Gwangjin-gu, Seoul, Republic of Korea, 05029
Mijin Goo and Taeseo Ku: Department of Civil and Environmental Engineering, Konkuk University, 120 Neungdong-ro,Gwangjin-gu, Seoul, Republic of Korea, 05029
Han-Saem Kim: Department of Civil and Environmental Engineering, Dongguk University, 30, Pildong-ro 1-gil,
Jung-gu, Seoul, Republic of Korea, 04620