Hybrid Machine Learning Approaches for Landslide Susceptibility Modeling

Nguyen, Vu Viet; Pham, Binh Thai; Vu, Ba Thao; Prakash, Indra; Jha, Sudan; Shahabi, Himan; Shirzadi, Ataollah; Ba, Dong Nguyen; Kumar, Raghvendra; Chatterjee, Jyotir Moy; Tien Bui, Dieu

doi:10.3390/f10020157

Open AccessArticle

Hybrid Machine Learning Approaches for Landslide Susceptibility Modeling

by

Vu Viet Nguyen

¹,

Binh Thai Pham

²

,

Ba Thao Vu

³,

Indra Prakash

⁴

,

Sudan Jha

⁵

,

Himan Shahabi

⁶

,

Ataollah Shirzadi

⁷

,

Dong Nguyen Ba

⁸,

Raghvendra Kumar

⁹,

Jyotir Moy Chatterjee

¹⁰

and

Dieu Tien Bui

^11,*

¹

Vietnam Academy for Water Resources, 171 Tay Son Street, Ha Noi 100000, Viet Nam

²

Institute of Research and Development, Duy Tan University, Da Nang 550000, Vietnam

³

Department of Geotechnical Engineering, Hydraulic Construction Institute, Vietnam Academy for Water Resources, 3/95 Chua Boc Street, Ha Noi 100000, Viet Nam

⁴

Department of Science & Technology, Bhaskarcharya Institute for Space Applications and Geo-Informatics (BISAG), Government of Gujarat, Gandhinagar 382007, India

⁵

School of Computer Engineering, KIIT-Deemed to be University, Odisha 751024, India

⁶

Department of Geomorphology, Faculty of Natural Resources, University of Kurdistan, Sanandaj 66177-15175, Iran

⁷

Department of Rangeland and Watershed Management, Faculty of Natural Resources, University of Kurdistan, Sanandaj 66177-15175, Iran

⁸

Department of Geotechnical Engineering, University of Transport and Communication, Ha Noi 100000, Vietnam

⁹

Computer Science and Engineering Department, LNCT College, Jabalpur-482053, India

¹⁰

Department of IT, LBEF(APUTI), Kathmandu, Nepal-44600

¹¹

Geographic Information System group, Department of Business and IT, University of South-Eastern Norway, Bø i Telemark N-3800, Norway

Show full affiliation list

Hide full affiliation list

^*

Author to whom correspondence should be addressed.

Forests 2019, 10(2), 157; https://0-doi-org.brum.beds.ac.uk/10.3390/f10020157

Submission received: 13 December 2018 / Revised: 19 January 2019 / Accepted: 23 January 2019 / Published: 12 February 2019

(This article belongs to the Special Issue Watershed Scale Forest Restoration and Sustainable Development)

Download

Browse Figures

Versions Notes

Abstract

:

This paper presents novel hybrid machine learning models, namely Adaptive Neuro Fuzzy Inference System optimized by Particle Swarm Optimization (PSOANFIS), Artificial Neural Networks optimized by Particle Swarm Optimization (PSOANN), and Best First Decision Trees based Rotation Forest (RFBFDT), for landslide spatial prediction. Landslide modeling of the study area of Van Chan district, Yen Bai province (Vietnam) was carried out with the help of a spatial database of the area, considering past landslides and 12 landslide conditioning factors. The proposed models were validated using different methods such as Area under the Receiver Operating Characteristics (ROC) curve (AUC), Mean Square Error (MSE), Root Mean Square Error (RMSE). Results indicate that the RFBFDT (AUC = 0.826, MSE = 0.189, and RMSE = 0.434) is the best method in comparison to other hybrid models, namely PSOANFIS (AUC = 0.76, MSE = 0.225, and RMSE = 0.474) and PSOANN (AUC = 0.72, MSE = 0.312, and RMSE = 0.558). Thus, it is reasonably concluded that the RFBFDT is a promising hybrid machine learning approach for landslide susceptibility modeling.

Keywords:

GIS; hybrid models; machine learning; adaptive neuro fuzzy inference system; landslide; Vietnam

1. Introduction

Landslides are gravitational movements of slope-framing materials caused by natural and anthropogenic activities [1]. They are considered one of the major hazards affecting human life, property, infrastructure, and landscape [2]. A landslide susceptibility map is a fundamental tool for landslide hazard management and land use planning. Assessment of landslide susceptibility gauges the spatial probability of landslide occurrences considering a set of geo-environmental parameters [3]. As a landslide is a complex process related to geology, topography, and other geo-environmental factors associated with different conditioning and triggering factors, modeling landslide susceptibility is a difficult task. In recent years, many techniques have been developed for landslide modeling; in general, these methods can be divided in to three main approaches namely expert system, physical strategies, and information mining techniques [4]. Out of these approaches, information mining strategies, which utilize machine learning and statistical methods, are considered the best for landslide hazard assessment and prediction [5].

In the last 10 years, different information mining strategies have been adopted all over the world. Bui et al. [6] applied Adaptive Neuro-Fuzzy Inference System (ANFIS) for torrential slide mapping and modeling in the Hoa Binh area of Vietnam. Umar et al. [7] utilized an ensemble technique of frequency ratio and logistic regression for landslide susceptibility mapping. Su et al. [8] applied Support Vector Machines (SVM) for mapping precipitation accentuated landslide susceptibility mapping in the Wencheng territory of Chan Province, China. Chen et al. [9] applied and compared various data mining methods, namely Kernel Logistic Regression, Naive Bayes and RBF network models. Youssef et al. [10] compared various models named Random Forest, Boosted Regression Tree, Classification and Regression Tree, and General Linear models for landslide susceptibility mapping. In addition, there are other models developed and applied for assessment of susceptibility of landslide such as Artificial Neural Networks [11], Best First Decision Tree [12], and Kernel Logistic Regression [13].

More recently, many researchers have combined different single methods and techniques to develop various hybrid models for better assessment of landslide susceptibility. Abedini et al. [14] developed a hybrid model that is a combination of Bayesian Logistic Regression and various ensemble techniques, and stated that the hybrid models are promising techniques for the assessment of landslide susceptibility. Zhang et al. [15] enhanced the prediction performance of landslide susceptibly model by developing the novel hybrid approach of Entropy with Logistic Regression and the SVM, and claimed that this developed hybrid model outperformed the singe Entropy model. Chen et al. [16] developed a novel hybrid approach of Bagging Ensemble and Kernel Logistic Regression for modeling landslide susceptibility, and proved that the novel developed model outperformed the benchmark SVM model. Even though the mentioned methods performed well for landslide susceptibility modeling at a given area, there is no conclusive information about which model is the best for other regions. Moreover, the applicability of the developed new techniques and approaches for better assessment of predictive capability of landslide susceptibility models needs to be further evaluated.

In this study, the main aim is to develop novel hybrid machine learning approaches such as Adaptive Neuro Fuzzy Inference System optimized by Particle Swarm Optimization (PSOANFIS), Artificial Neural Networks optimized by Particle Swarm Optimization (PSO) (PSOANN), and Best First Decision Trees based Rotation Forest (RFBFDT) for the evaluation and selection of the best landslide susceptibility model. More specifically, the PSOANFIS is a hybrid approach of ANFIS and PSO, whereas the PSOANN is a hybrid approach of Artificial Neural Networks (ANN) and the PSO and the RFBFDT is a hybrid model of Rotation Forest (RF) and Best First Decision Trees (BFDT). The Van Chan district, Yen Bai province, a landslide-prone hilly area in Vietnam, was selected for the present study. The Area under the Receiver Operating Characteristics (ROC) curve (AUC), Mean Square Error (MSE), and Root Mean Square Error (RMSE) methods were used for the model validation.

2. Study Area

The study area is Van Chan district of Yen Bai Province, located between longitudes 104°16′02″ and 104°54′43″ and latitudes 21°48′49″ to 21°19′34″ in the northeast region of Vietnam (Figure 1). The area of the district is approximately 1207 km² and it has a population of about 144,201. The topography of the area is mountainous and midland type, with elevation ranges from 60 m to 2542 m. High mountains, namely Tay Con Linh and Kieu Lieu Ti, are located on the western side. Bac Ha, Quan Bạ, and Dong Van are the plateaus (highlands) located on the northern side, with an average elevation of 1000–1200 m. Dong Van Plateau is the highest at 1600 m. The midlands (elevation 100–150 m) are on the southwest side. The lowest elevation in the area is in the southeast.

Hills and valleys are generally aligned in the northwest to southeast direction, parallel to the orientation of geological faults. Drainage density in the area is high and most of the drainage is structurally controlled. Hill slopes are very steep in places (up to 84°). Narrow valleys and steep hill slopes are some of the main factors causing landslides, besides heavy rains and anthropogenic activity. Changes in the land use pattern for cultivation of rice on terraces and other developmental activities increased the landslide occurrences in the area. Accumulation of irrigation water on the terraces increases effective weight and reduces the strength of the slope-forming materials, thus adversely affecting the stability of slopes.

Geologically, the study area is occupied by igneous, metamorphic, and sedimentary rocks belonging to the Tu Le–Ngoi Thia complex (21.56%), Tram Tau formation (15.42%), and Ca Vinh complex (13.17%). Rock mass in this area is highly weathered. Depth of weathering varies from 10 m to 18 m. Most of the landslides are observed in the weathered Tu Le–Ngoi Thia complex (10.78%), Tram Tau formation (10.18%), and in gabbro and diabase rocks (11.38%) (Figure 2 and Table 1). Weathered rocks have high permeability and low strength, resulting in slope failure.

3. Materials and Methods

3.1. Data Used

3.1.1. Landslide Inventory

A landslide inventory showing the location and type of landslides occurring in the area is important for the development of landslide models. In this area, 167 landslides were identified from Google Earth images and air photos checked against the available historical record and limited field investigations. Based on these data, a landslide inventory map was constructed. Translational, rotational, mixed, and debris flow types of landslides occur in the area. Translation type of landslides are prominent in the study area, hence only these landslides were taken into account for modeling. National Road No. 32 is most affected by landslide hazards (Figure 3). The size of landslides varies from a few cubic meters to thousands of cubic meters. We selected the center of each scar (polygon) of the landslide as one point with a cell size of 20 m for sampling as we considered that most of the pixels of a landslide polygon have identical conditions for landslide occurrence in similar types of slope-forming materials [17,18].

3.1.2. Landslide Influencing Parameters

In landslide modeling, it is very important to select the suitable affecting factors for landslide assessment. In our study, the selection of factors is based on the analysis of the nature of landslide occurrences in relation to the characteristics of geomorphology, geology, hydrology, meteorology, and human impacts in the study area. Thus, we have selected 12 factors, namely slope, aspect, elevation, curvature, slope length, valley depth, distance to rivers, distance to roads, distance to faults, Topographic Wetness Index (TWI), and Terrain Ruggedness Index (TRI), for landslide analysis and modeling. Each factor was classified into several classes based on the standard classification for lithology and aspect, natural break method for slope and expert’s knowledge method for elevation, curvature, slope length, valley depth, distance to rivers, distance to roads, distance to faults, TWI, and TRI [19,20,21,22,23]. In addition, the Frequency Ratio (FR) method, which is defined as the percentage of the number of landslide pixels per the percentage of the number of class pixels in the study area, was applied to assess the spatial relationship between the landslides and 12 conditioning factors (Table 2).

Slope is important in landslide susceptibility study [24]. A slope angle map of the study area was generated from a Digital Elevation Model (DEM) with 20 m spatial resolution, which was generated from the topographic map of 1:50000 scale. A total of six classes (0–7.92, 7.92–17.82, 17.82–26.07, 26.07–34.65, 34.65–44.88, and 44.88–84.16°) were obtained on the slope map using the natural break method in GIS application (Figure 4a). According to the FR analysis, slopes in this area between 7.92°and 34.65° had the high FR values, ranging from 1.13 to 1.69, which indicate the highest susceptibility to landslide occurrences in these three classes.

Aspect is a significant factor in the development of landslide susceptibility maps [25]. A map of aspect was extracted from the DEM with nine slope aspect classes: north (0–22.5°; 337.5–360°), flat (−1°), east (67.5–112.5°), northeast (22.5–67.5°), southeast (112.5–157.5°), south (157.5–202.5°), west (247.5–292.5°), southwest (202.5–247.5°), and northwest (292.5–337.5°) (Figure 4b). The FR analysis showed that slopes facing north, northeast, east, south, and southeast are generally prone to landslides as their FR values are 1.15, 1.12, 1.41, 1.27, and 1.22, respectively.

Elevation is one of the important factors in the occurrence of landslides as height affects the loading on the slope and thus enhances the chances of landslides when the sliding plain has a dip (orientation) towards the open excavation [26]. The weathering profile also depends on the elevation of the area. An elevation map was extracted from the DEM 20 m including seven classes (0–200, 200–400, 400–600, 600–800, 800–1000, 1000–1200, 1200–1400, 1400–1600, 1600–1800, and 1800–2542 m) (Figure 4c). The FR analysis indicated that the class of 400–600 m above sea level is the most susceptible (FR = 1.66), whereas above elevation 1400 m the frequency of occurrence of landslide susceptibility is the lowest. This might be due to more weathering on the middle height slope in comparison to higher levels.

Curvature is an important landslide affecting factors such as the runoff or accumulation of water on the slope, depending on the type of curvature [27]. In this study, a curvature map was extracted from the DEM 20 m and classified as concave, convex, or flat depending on its value either below, above, or equal to 0.05, respectively (Figure 4d). The FR analysis showed that 55.69% of landslides occurred in concave class curvature slopes, which occupy 41.71% of the area. The occurrence of more landslides on a concave surface can be related to the accumulation of more water on such slopes.

Slope length is the distance from the origin of the landslide’s flow along its flow path to the place of its runout distance or end. The parameters that control the runout distance of a landslide are geometry, physical property, and frictional coefficients. A slope length map was constructed from the DEM 20 m using SAGA tool with six classes (0–20, 20–50, 50–100, 100–150, 150–200, and 200–2501 m) (Figure 4e). The FR analysis based on the slope length map showed that the highest susceptibility to landslide incidence is in the 200–500 m slope length class (Table 2). This may be due to the topography and structure of the area.

Valley depth controls the weathering process and water transportation and accumulation; thus, it affects landslide occurrences. In this area, a total valley depth map was constructed from the DEM 20 m using SAGA tool considering six classes of depth (0–5, 5–30, 30–60, 60–100, 100–150, and 150–656 m) (Figure 4f). The FR analysis showed that the most landslide-susceptible class is at 100–150 m (FR = −1.62), whereas the lowest FR value (0.47) was obtained for valley depth >150 m.

Distance to rivers is one of the most important factors for the stability as distance from a river affects the saturation degree of the slope-forming materials (Dai et al., 2001; Saha et al., 2002). A distance to rivers map was constructed on the basis of buffering the rivers extracted from the topographic map (1: 50,000) with five classes (0–100, 100–200, 200–300, 300–400, and >400 m) (Figure 4g). The FR analysis indicated that with the increase of the distance to the rivers, the probability of landslide occurrence is decreased. Specifically, most of the landslides are located within the 100‒200m distance class (FR = 1.56).

Distance to roads is one of the factors that most affects landslide occurrences as most of the landslides are observed close to roads [28]. In this study, a distance to roads map was constructed on the basis of buffering the roads extracted from the topographic map (1: 50,000) and divided into five different buffer class (0–100, 100–200, 200–300, 300–400, and >400 m) (Figure 4h). The FR analysis indicated that most landslides occurred within 0–100 m from roads.

Distance to faults is one of the most important affecting factors as slope may fail along faults depending on the nature and orientation of faults [29]. Faults with clay gouge and dipping towards the slope face are the most unfavorable features for slope stability. In the study area, a distance to faults map was constructed with five different buffer classes on the basis of buffering the faults extracted from the geological map (1: 50,000) (0–250, 250–500, 500–750, 750–900, and >900 m) (Figure 4i). The FR analysis indicated that with increasing distance from the faults, the probability of landslides is decreased. In this area, fault distance between 250 m and 500 m was most vulnerable to landslide occurrence (FR = 1.56).

Lithology plays a very important role in landslide occurrences as soft and weathered rocks are more vulnerable than hard unjointed rocks, thus lithological units have different vulnerability to landslides [30]. In the study area, a lithology map was extracted from the Geological and Mineral Recourses Map on a scale of 1:50,000 with seven major lithological units (A, B, C, D, E, F, and G) (Figure 4j and Table 3). The FR analysis indicated that group A has the highest FR value (1.46), while group C has the lowest value (0.26) (Table 2).

Topographic Wetness Index (TWI) is a secondary geomorphometric parameter used to describe and quantify local relief [31] as it reveals the diversity and complexity of landslide topographic surface. As the slope-forming material moves, the TWI range increases. In this study, a TWI map was generated from the DEM 20 m using the SAGA tool with different classes (0–8, 8–9, 9–10, 10–11, and 11–24) (Figure 4k). The FR analysis indicated that the class of 9–10 of TWI is the most susceptible (FR = 0.99) (Table 2).

Terrain Ruggedness Index (TRI) proves capable of differentiating landslide population into smaller groups, consistent with their variable origin and mechanism of displacement. As the slope surface moves, the TRI range decreases. However, in the case of slump and rockslide, the calculation is different. In this study, a TRI map was generated from the DEM using the SAGA tool with different classes (0–1, 1–3, 3–5, 5–7, and >7) (Figure 4l). The FR analysis indicated that the class of 3–5 of TRI is the most susceptible class (Table 2).

3.2. Methods Used

3.2.1. Adaptive Neuro Fuzzy Inference System (ANFIS)

The ANFIS was first introduced by Roger Jang [32]. It consists of two parts, a neural network (ANN) and a reasoning capability of Fuzzy Inference System (FIS) in order to enhance the power prediction for comparing the use of a single model [33]. In other word, the ANFIS is able to train FIS membership function (MF) parameters on a training dataset using a combination of back-propagation gradient descent and least-squares methods [34]. The FIS performed is based on the concepts of fuzzy set theory, fuzzy if‒then rules, and fuzzy reasoning [35]. Among all FIS membership function, the Sugeno fuzzy model has been widely used due to high interpretability and computational efficiency, and built-in optimal and adaptive techniques [36]. The flowchart of ANFIS architecture is shown in Figure 5.

In this figure, a circle indicates a node and rectangles denote adaptive nodes. We assumed that there are two FIS, including x and y and one input, z. At first, using the Sugeno fuzzy model, four fuzzy “if‒then rules” can be developed;

\begin{array}{l} {R 1 : If x is A}_{1} {and y is B}_{1} {, then z}_{1} {= p}_{1} {x + q}_{1} {y + r}_{1} \\ {R 2 : If x is A}_{1} {and y is B}_{2} {, then z}_{2} {= p}_{2} {x + q}_{2} {y + r}_{2} \\ {R 3 : If x is A}_{2} {and y is B}_{1} {, then z}_{3} {= p}_{3} {x + q}_{3} {y + r}_{3} \\ {R 4 : If x is A}_{2} {and y is B}_{2} {, then z}_{4} {= p}_{4} {x + q}_{4} {y + r}_{4} \end{array}

(1)

where, A_i and B_i are the fuzzy sets, and p_i, q_i, and r_i are the parameters obtained during the training process. The ANFIS consists of five layers as follows (Figure 5):

Layer 1 (fuzzification): In this layer, the amount of the input variables will fuzzify and each node employs a node function by:

\begin{array}{l} O_{i}^{1} {= μ A}_{i} (x), i = 1, 2 \\ O_{i}^{1} {= μ B}_{i - 2} (x), i = 3, 4 \end{array},

(2)

where any fuzzy membership function (MFs) can be adopted on μA_i(x) and μB_i − 2(y) such as Triangle, Generalized bell (Gbell), and Gaussian.

Layer 2 (fuzzy AND): in this layer, each node calculates the firing strength of a rule via multiplication.

O_{k}^{2} = ω_{k} {= μ A}_{i} {(x) μ B}_{j} (y), i = 1, 2; j = 1, 2; k = 2 (i - 1) + j

(3)

Layer 3 (normalization): In this layer, the firing strength of each node will be normalized using the ratio of firing strength of every node to the total value of each node.

O_{i}^{3} = \bar{ω_{i}} = \frac{ω_{i}}{ω_{1} {+ ω}_{2} {+ ω}_{3} {+ ω}_{4}}, i = 1, 2, 3, 4,

(4)

where

\bar{ω_{i}}

is the normalized firing strength.

Layer 4 (fuzzy inference): In this layer, each node has the following function:

O_{i}^{4} = \bar{ω_{i}} z_{i} = \bar{ω_{i}} {(p}_{i} x_{i} + q_{i} {y + r}_{i}), i = 1, 2, 3, 4,

(5)

where

\bar{ω_{i}}

is the output of layer 3 and (p_i; q_i; r_i) is the consequent parameters set.

Layer 5 (defuzzification): The overall outputs of all the rules will be obtained in this layer using the defuzzification process of the FIS, which is formulated as follows:

O_{i}^{5} = \sum_{i = 1}^{4} \bar{ω_{i}} z_{i} = \frac{ω_{1} z_{1} + ω_{2} z_{2} + ω_{3} z_{3} + ω_{4} z_{4}}{ω_{1} {+ ω}_{2} {+ ω}_{3} {+ ω}_{4}}

(6)

In addition, the details of the ANFIS model can be observed in various studies including those by Chen, Panahi, and Pourghasemi [34], Jang [32], and Aghdam et al. [37].

3.2.2. Multilayer Perceptron Neural Networks

Artificial Neural Networks (ANNs), as a branch of Artificial Intelligence (AI), are nonlinear function approximation algorithms that can be used as a proper approach for classification and prediction problems such as landslides based on the degree of membership value of each pixel over the study area [38]. It indicates that with increasing the value of membership of each pixel, the probability of landslide occurrence will be increased. The ANNs have two functions, Multi-Layer-Perceptron (MLP) and Radial Base Function (RBF). Some researchers that have used the ANNs for landslide susceptibility mapping reported that the MLP is better than the RBF function in the detection of landslide locations [27,39].

The MLP consists of input, one and more hidden layers, and one output so that its complexity will increase when increasing the number of hidden layers [27]. In the landslide susceptibility assessment using the MLP, the condition factors are input layer, the result of landslide modeling, landslide and non-landslide, is output layer, and the classifying layers are the hidden layer [40].

This approach, based on the two main datasets, including training and testing datasets, was performed. A training dataset is applied for the training process, which it performs in two steps; firstly, the hidden layers propagate forward the input layer to output value and consequently the error is computed to compare the pre-value and target value. Secondly, during the training process, the weights will be regulated for achieving the best results with the least difference [41]. Moreover, in the testing phase, the validity of the obtained results (target values) based on some error criteria will be checked for future samples.

Consider that

x = x i, i = 1, 2, \dots, n

is the vector of landslide conditioning factors,

y = y i, i = 1, 2

that indicates landslide and non-landslide classes. The MLP neural network function in the landslide modeling can be expressed as follows:

y = f (x) + b,

(7)

where b is bias and

f (x)

is an unknown function that is optimized by the adjustable network weights during the training process for a given network architecture [40].

3.2.3. Particle Swarm Optimization (Pso)

The PSO is one of the evolutionary algorithms (meta-heuristic) developed by Kennedy et al. (1995). Design of the PSO is based on the nearest route to find food using the movement of biological organisms such as flocks and fish [42]. In recent years, it has been most popular in the optimization of nonlinear problems [34]. In this algorithm, a swarm of particles denotes a potential answer to the problem that searches for the best position based on the best solution. The fitness function can be used to assess the merit of the particles for calculating the fitness values. The particles in the PSO move along the feature space using a set of the following updated equations [42]:

{\begin{cases} v_{i} (t + 1) = w v_{i} (t) + c_{1} r a n d_{1} (p_{b e s t} - x_{i} (t)) \\ + c_{2} r a n d_{2} (g_{b e s t} - x_{i} (t)) x_{i} (t + 1) \\ = x_{i} (t) + v_{i} (t + 1) \end{cases},

(8)

where

x_{i}

and

v_{i}

are the position and velocity of the i^th particle in the feature space, respectively;

w

is the inertial weight coefficients;

c_{1}

and

c_{2}

are learning factors, and

r a n d_{1}

and

r a n d_{2}

are positive random numbers from 0 to 1.

p_{b e s t}

is the personal best position of particle i, and

g_{b e s t}

is the best among all of the particles. In this study, the PSO method is used to optimize the ANFIS and ANN modeling parameters to construct the PSOANFIS and PSOANN prediction models for landslide susceptibility assessment.

3.2.4. Rotation Forest

Rotation Forest (RF) is one of the meta ensemble algorithms that was first introduced by Rodriguez et al. [43] to enhance the power prediction of a weak individual classifier in comparing with using a weak individual classifier alone and also increasing the diversity of base classifiers [44]. In this approach, feature space of training dataset are divided into some subsets based on the Principal Component Analysis [45] for learning base classifiers. The Meta classifiers generally create higher prediction accuracy in comparison with single-based classifiers [46].

In this study, the RF as a Meta classifier in order to detect landslide occurrence locations has been applied. Consider

x = x (x_{1}, x_{2}, \dots, x_{12})

is the vector of 11 landslide conditioning factors,

y = (y_{1}, y_{2})

is the vector of landslide and no-landslide occurrence class, and D indicates the training dataset.

C_{1}, C_{2}, \dots, C_{L}

are the number of classifiers for learning, and φ is a set of landslide conditioning factors. In the first step, φ are divided into k training subsets in which 10/k landslide conditioning factors in each training subset are created. Let φ_i,j be j-th (j = 1, 2, …, k) subset of landslide conditioning factors C_i and P_i,j is landslide conditioning factor in φ_i,j from D. According to the bootstrap algorithm, P΄_i,j with 75% sized randomly selected from P_i,j.

In the next step, to calculate the coefficients of

z_{i, 1}^{(1)}, z_{i, 2}^{(2)}, \dots, z_{i, 1}^{(K_{i})}

, the P’_i,j will be transformed with the size z’_i,1 equals to T × 1. In fact, the RF is constituted using base classifier and the rotation matrix (

Z_{i}^{a}

) by transformation technique (rearranging the matrix of

Z_{i}

), which is observed as follows [40]:

[\begin{matrix} z^{(1)}_{i 1}, \dots, z^{(M 1)}_{i 1} & {0} \dots & {0} \\ \begin{array}{l} {0} \\ \dots \end{array} & \begin{array}{l} z^{(2)}_{i 2}, \dots, z^{(M 2)}_{i 2} \dots \\ \dots \end{array} & \begin{array}{l} {0} \\ \dots \end{array} \\ {0} & \dots & z^{(K)}_{i K}, \dots, z^{(M K)}_{i K} \end{matrix}] .

(9)

Then, the columns of

Z_{i}

are rearranged using the original feature set. In the next step, the

(θ Z_{i}^{M})

value will be transformed on a training dataset using classifier

D_{i}

. Consequently, all classifiers after training with parallel manner will be summed [43].

The classification phase, using the testing dataset of x, will be evaluated when

d_{i j} (θ Z_{i}^{a})

is the probability value determined by classifier

D_{i}

based on the hypothesis that x belongs to class

y

. Then, the average combination method of a class is obtained as follows:

m_{j} (θ) = \frac{1}{L} \sum_{i = 1}^{L} d_{i j} (θ Z_{i}^{a}), y = 1, \dots, c .

(10)

Lastly, the largest confidence of the class will be assigned by

θ

.

3.2.5. Best First Decision Trees

The main idea of the expansion of decision tree nodes of Best First Decision Trees (BFDT) algorithm was introduced by Friedman et al. (2000). In this algorithm, the best node expanded in depth-first order as compared to C4.5 and CART [47]. The best node among all nodes to split is a node that leads to maximum reduction of impurity such as Gini index or information gain. The BFDT creates a binary tree in which each internal node is assigned two outgoing edges.

The growth of the tree will continue until the internal nodes reach maximum homogeneity. This means that a terminal node does not split further when it will be pureed so that all cases have the same value for the dependent variable (landslide and non-landslide). To assess the impurity in this algorithm, information gain and Gini index measures based on the entropy are used. In this study, Information Gain (IG) is used for assessing the impurity. Moreover, the entropy specifies the purity of any sample set. Consider D as the training dataset, A as a conditioning factor such as slope angle, and “i” a class label (landslide and non-landslide). The following equation can obtain the IG values of factors (e.g., slope angle):

Entropy (D) = - \sum_{i = 1} p_{i} \log_{2} p_{i},

(11)

where

p_{i}

is the proportion of D belonging to class i. The IG leads to splitting the training dataset by a reduction in entropy using the following equation:

Information Gain (D, A) = Entropy (D) - \sum_{i \in value (A)} \frac{| D_{i} |}{| D |} Entropy (D_{i}),

(12)

where values (A) is the set of all possible values for slope angle factor (A) and

D_{i}

is the subset of D for which attribute A has value i. The tree in the BFDT algorithm will be stopped when all instances belonging to a landslide or non-landslide as a target feature or the best value of IG value are less than zero [48].

3.2.6. Validation Assessment

In this study, mean square error (MSE), root mean square error (RMSE), and area under the receiver operative characteristic (AUC) curve were used to validate the performance of the developed models. The MSE estimates the generalization error of the model, whereas the RMSE measures the forecasting errors of the models [49]. The MSE and RMSE can be expressed as follows:

MSE = \frac{\sum_{i = 1}^{N} {(X_{obs} - X_{est})}^{2}}{n}

(13)

RMSE = \sqrt{\frac{\sum_{i = 1}^{N} {(X_{obs} - X_{est})}^{2}}{n}},

(14)

where X_obs denotes the observed values in the training dataset or validation dataset, X_est represents the estimated (output) values from the landslide susceptibility models, and n is the total number of samples in the training or validation datasets [50]. The result of modeling is effective when the values of RMSE and MSE are small [51].

In addition, another standard and applicable technique that has been utilized in almost all landslide susceptibility assessments is the Area under the Receiver Operative Characteristic (AUC) Curve [52]. Generally, the ROC curve is plotted based on the sensitivity as the y-axis and the 1-specificity as the x-axis [53]. The AUC pinpoints the performance of a model so that a higher AUC indicates better model performance [52]. It has a range between 0.5 (random model) and 1 (ideal model) [54,55]. The AUC can be formulated as follows:

AUC = \frac{\sum TP + \sum TN}{R},

(15)

where TP and TN are the number of correctly and incorrectly classified as landslides, respectively; R is the total number of landslides and non-landslides [53].

4. Methodology Adopted for Developing Landslide Susceptibility Maps

The methodology of the present study includes four main steps: (1) generation of training and testing dataset, (2) building of the hybrid models, (3) validation of the hybrid models, and (4) development of landslide susceptibility map (Figure 6). A brief description of methodology is below:

Step 1: Training and testing datasets were generated using landslide data of the study area. A training dataset was generated with 70% of landslide inventory (117 locations), whereas a testing dataset was constructed with the 30% remaining landslide inventory (50 locations). In the datasets, non-landslide locations were also taken into account as landslide prediction is considered a binary classification problem. Non-landslide locations were identified based on the study of the area. Out of these, 117 non-landslide locations were used for the training dataset while 50 non-landslide locations were used for testing datasets. For modeling, landslide instances were assigned “1” whereas non-landslide instances were assigned “0”.

Step 2: Using the training dataset, the hybrid models (RFBDFT, PSOANFIS, and PSOANN) were constructed for spatial prediction of landslides at the study area. More specifically, the RFBDFT was constructed by combining the RF ensemble and the BDFT classifier. In the RFBDFT, the RF was trained with 25 iterations and the BDFT was trained with 10 folds in internal cross-validation. The PSOANFIS was constructed by combining the PSO optimization and the ANFIS classifier, while the PSOANN was constructed by combining the PSO and the ANN classifier. In the PSOANFIS, the model was trained with 1500 iterations, 0.99 inertia weight, and 25 populations. In the PSOANN, the number of hidden layers was set to nine.

Step 3: The hybrid models was validated using several criteria, namely MEA, RMSE, and AUC. In this step, the models were validated in goodness-of-fit using the training dataset and predictive capability using the testing dataset.

Step 4: Mapping landslide susceptibility started with generation of Landslide Susceptibility Index (LSI) values for each pixel of the study area using the hybrid models. Thereafter, the LSIs were assigned to each pixel in the GIS environment and were reclassified using the natural break classification method [19].

5. Results and Discussion

Goodness-of-fit and prediction accuracy of the RFBFDT model are given in Figure 7. This figure has three parts including outputs and targets versus number of samples, errors versus number of samples, and frequency versus errors. In first part, the hybrid model graphically predicts the value of landslide and non-landslide, as output values, according to landside and non-landslide locations, as target values, overlaid with normalized conditioning factors.

The predictive values range between 0 and 1. The error part of this figure specifies the values of MSE and RMSE. The frequency versus errors depicts the values of error mean and standard deviation (SD). Results indicate that in the training phase using the RFBFDT model, the values of RMSE, RMSE, error mean, and error SD are 0.172, 0.414, −1.7 × 10^−0.8, and 0.415, respectively. In the validation phase, these values are 0.189, 0.434, 0.017, and 0.436, respectively. In the PSOANFIS model, Figure 8 shows the results of goodness of fit and prediction accuracy using training and validation datasets. The results indicate that using the training dataset the values of RMSE, RMSE, error mean, and error SD are 0.14, 0.374, 0.005, and 0.375, respectively. These values using the validation dataset are 0.225, 0.474, −0.0298, and 0.476, respectively. Moreover, the results expressed that in the PSOANN model, the values of RMSE, RMSE, error mean, and error SD using training dataset are 0.168, 0.41, −0.0005, and 0.411, respectively. In the validation process, the results stated that the values of 0.312, 0.558, 0.0003, and 0.561 acquired for RMSE, RMSE, error mean, and error SD, respectively (Figure 9).

Landslide hybrid models were then evaluated through the ROC curve analysis. The results are given in Figure 10. The results of performance of the ensemble models exhibited that the RFBFDT model acquired the highest of AUC value (0.891), followed by the PSOANFIS model (0.890) and the PSOANN model (0.850). Additionally, the validation dataset confirmed that the RFBFDT ensemble models had the highest prediction accuracy, with an AUC value of 0.826. This is followed by the PSOANFIS model (AUC = 0.760) and the PSOANN model (AUC = 0.720). The results of AUC are completely in agreement with the results of model validation using MSE, RMSE, error mean, and error SD values in the training and validation phases. Overall, the RFBFDT ensemble model is the best model for predicting landslide locations compared to the other models (PSOANFIS and PSOANN).

Landslide susceptibility is assessed based on the landslide susceptibility index (LSI), which was generated from the model construction process. Thereafter, the obtained LSI was transferred to all pixels of the study area and they were classified for determining the susceptibility levels of landslides in the study area. Landslide susceptibility maps of the study area were finally constructed with five susceptibility classes including very low, low, moderate, high, and very high (Figure 11). The distribution of these susceptibility classes on the maps was calculated and shown in Figure 12. A map generated by the RFBDFT model indicated that 48% of the study area falls into the low class, 42% in the moderate class, and 11% in the high class, whereas, in the map generated by the PSOANFIS model, 25% of the study area is covered by the low class, 44% by the moderate class, and 31% by the high class. A further map generated by the PSOANN model indicated that 25% of the study area falls in the low class, 63% in the moderate class, and 13% in the high class.

6. Conclusions

In this study, three novel hybrid machine learning approaches, namely PSOANFIS, PSOANN, and RFBFDT, were applied for the development of landslide susceptibility maps. A spatial database of 167 past landslides of Van Chan district, Yen Bai province, Vietnam was used to generate the datasets for modeling, considering 12 landslide conditioning factors. Validation of the models was done using the AUC, MSE, and RMSE methods. The results show that the RFBFDT (AUC = 0.826, MSE = 0.189, RMSE = 0.434) is the best model in comparison to other hybrid models, namely PSOANFIS (AUC = 0.76, MSE = 0.225, RMSE = 0.474) and PSOANN (AUC = 0.72, MSE = 0.312, RMSE = 0.558). Thus, it can be reasonably concluded that the RFBFDT model can be used for better landslide susceptibility assessment, land use planning, and hazard management in landslide-prone areas. However, as these proposed models were applied in only one of the areas of Vietnam, their applicability must be tested in other hilly areas of Vietnam as well as other parts of the world. Moreover, another limitation of this research is that we considered a fixed combination of conditioning factors for modeling; therefore, it would be better to test the effectiveness of the models with different combinations of conditioning factors to explore the possibility of further improvement of the models.

Author Contributions

For research articles with several authors, a short paragraph specifying their individual contributions must be provided. The following statements should be used “Conceptualization, B.T.P.; Methodology, B.T.P.; Software, S.J.; Validation, H.S. and A.S.; Formal Analysis, I.P.; Investigation, V.V.N., B.T.V. and D.B.N.; Resources, B.T.P.; Data Curation, V.V.N., B.T.V. and D.B.N.; Writing—Original Draft Preparation, S.J., H.S., A.S., R.K. and J.M.C.; Writing—Review & Editing, I.P.; Visualization, I.P., R.K. and J.M.C.; Supervision, B.T.P.; Project Administration, D.T.B.; Funding Acquisition, D.T.B.”, please turn to the CRediT taxonomy for the term explanation. Authorship must be limited to those who have contributed substantially to the work reported.

Funding

This research received no external funding.

Acknowledgments

This study was supported by a research project named “Study on assessment of causes of landslides and proposal of measures for prevention and mitigation landslide hazards in some provinces in the northern parts of Vietnam” carried out at the Vietnam Academy for Water Resources, Hanoi, Vietnam.

Conflicts of Interest

The authors declare no conflict of interest.

References

Cruden, D.M. A simple definition of a landslide. Bull. Eng. Geol. Environ. 1991, 43, 27–29. [Google Scholar] [CrossRef]
Fell, R.; Corominas, J.; Bonnard, C.; Cascini, L.; Leroi, E.; Savage, W.Z. Guidelines for landslide susceptibility, hazard and risk zoning for land use planning. Eng. Geol. 2008, 102, 85–98. [Google Scholar] [CrossRef] [Green Version]
Ercanoglu, M.; Gokceoglu, C. Use of fuzzy relations to produce landslide susceptibility map of a landslide prone area (West Black Sea Region, Turkey). Eng. Geol. 2004, 75, 229–250. [Google Scholar] [CrossRef]
Song, Y.; Gong, J.; Gao, S.; Wang, D.; Cui, T.; Li, Y.; Wei, B. Susceptibility assessment of earthquake-induced landslides using Bayesian network: A case study in Beichuan, China. Comput. Geosci. 2012, 42, 189–199. [Google Scholar] [CrossRef]
Pradhan, B. A comparative study on the predictive ability of the decision tree, support vector machine and neuro-fuzzy models in landslide susceptibility mapping using GIS. Comput. Geosci. 2013, 51, 350–365. [Google Scholar] [CrossRef] [Green Version]
Bui, D.T.; Pradhan, B.; Lofman, O.; Revhaug, I.; Dick, O.B. Landslide susceptibility mapping at Hoa Binh province (Vietnam) using an adaptive neuro-fuzzy inference system and GIS. Comput. Geosci. 2012, 45, 199–211. [Google Scholar]
Umar, Z.; Pradhan, B.; Ahmad, A.; Jebur, M.N.; Tehrany, M.S. Earthquake induced landslide susceptibility mapping using an integrated ensemble frequency ratio and logistic regression models in West Sumatera Province, Indonesia. Catena 2014, 118, 124–135. [Google Scholar] [CrossRef]
Su, C.; Wang, L.; Wang, X.; Huang, Z.; Zhang, X. Mapping of rainfall-induced landslide susceptibility in Wencheng, China, using support vector machine. Nat. Hazards 2015, 76, 1759–1779. [Google Scholar] [CrossRef]
Chen, W.; Yan, X.; Zhao, Z.; Hong, H.; Bui, D.T.; Pradhan, B. Spatial prediction of landslide susceptibility using data mining-based kernel logistic regression, naive Bayes and RBFNetwork models for the Long County area (China). Bull. Eng. Geol. Environ. 2018, 1–20. [Google Scholar] [CrossRef]
Youssef, A.M.; Pourghasemi, H.R.; Pourtaghi, Z.S.; Al-Katheeri, M.M. Landslide susceptibility mapping using random forest, boosted regression tree, classification and regression tree, and general linear models and comparison of their performance at Wadi Tayyah Basin, Asir Region, Saudi Arabia. Landslides 2016, 13, 839–856. [Google Scholar] [CrossRef]
Dou, J.; Yamagishi, H.; Pourghasemi, H.R.; Yunus, A.P.; Song, X.; Xu, Y.; Zhu, Z. An integrated artificial neural network model for the landslide susceptibility assessment of Osado Island, Japan. Nat. Hazards 2015, 78, 1749–1776. [Google Scholar] [CrossRef]
Chen, W.; Zhang, S.; Li, R.; Shahabi, H. Performance evaluation of the gis-based data mining techniques of best-first decision tree, random forest, and naïve bayes tree for landslide susceptibility modeling. Sci. Total Environ. 2018, 644, 1006–1018. [Google Scholar] [CrossRef]
Pham, B.T.; Prakash, I. Machine Learning Methods of Kernel Logistic Regression and Classification and Regression Trees for Landslide Susceptibility Assessment at Part of Himalayan Area, India. Indian J. Sci. Technol. 2018, 11. [Google Scholar] [CrossRef]
Abedini, M.; Ghasemian, B.; Shirzadi, A.; Shahabi, H.; Chapi, K.; Pham, B.T.; Bin Ahmad, B.; Tien Bui, D. A novel hybrid approach of bayesian logistic regression and its ensembles for landslide susceptibility assessment. Geocarto Int. 2018, 1–31. [Google Scholar] [CrossRef]
Zhang, T.; Han, L.; Chen, W.; Shahabi, H. Hybrid Integration Approach of Entropy with Logistic Regression and Support Vector Machine for Landslide Susceptibility Modeling. Entropy 2018, 20, 884. [Google Scholar] [CrossRef]
Chen, W.; Shahabi, H.; Zhang, S.; Khosravi, K.; Shirzadi, A.; Chapi, K.; Pham, B.; Zhang, T.; Zhang, L.; Chai, H. Landslide susceptibility modeling based on gis and novel bagging-based kernel logistic regression. Appl. Sci. 2018, 8, 2540. [Google Scholar] [CrossRef]
Hoang, N.-D.; Tien Bui, D. A novel relevance vector machine classifier with cuckoo search optimization for spatial prediction of landslides. J. Comput. Civ. Eng. 2016, 30, 04016001. [Google Scholar] [CrossRef]
Chen, W.; Pourghasemi, H.R.; Naghibi, S.A. A comparative study of landslide susceptibility maps produced using support vector machine with different kernel functions and entropy data mining models in China. Bull. Eng. Geol. Environ. 2018, 77, 647–664. [Google Scholar] [CrossRef]
Pourghasemi, H.R.; Pradhan, B.; Gokceoglu, C. Application of fuzzy logic and analytical hierarchy process (AHP) to landslide susceptibility mapping at Haraz watershed, Iran. Nat. Hazards 2012, 63, 965–996. [Google Scholar] [CrossRef]
Jebur, M.N.; Pradhan, B.; Tehrany, M.S. Manifestation of LiDAR-derived parameters in the spatial prediction of landslides using novel ensemble evidential belief functions and support vector machine models in GIS. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2015, 8, 674–690. [Google Scholar] [CrossRef]
Meinhardt, M.; Fink, M.; Tünschel, H. Landslide susceptibility analysis in central Vietnam based on an incomplete landslide inventory: Comparison of a new method to calculate weighting factors by means of bivariate statistics. Geomorphology 2015, 234, 80–97. [Google Scholar] [CrossRef]
Gomez, H.; Kavzoglu, T. Assessment of shallow landslide susceptibility using artificial neural networks in Jabonosa River Basin, Venezuela. Eng. Geol. 2005, 78, 11–27. [Google Scholar] [CrossRef]
Bai, S.; Lü, G.; Wang, J.; Zhou, P.; Ding, L. GIS-based rare events logistic regression for landslide-susceptibility mapping of Lianyungang, China. Environ. Earth Sci. 2011, 62, 139–149. [Google Scholar] [CrossRef]
Lee, S.; Min, K. Statistical analysis of landslide susceptibility at Yongin, Korea. Environ. Geol. 2001, 40, 1095–1113. [Google Scholar] [CrossRef]
Galli, M.; Ardizzone, F.; Cardinali, M.; Guzzetti, F.; Reichenbach, P. Comparing landslide inventory maps. Geomorphology 2008, 94, 268–289. [Google Scholar] [CrossRef]
Yilmaz, I. Comparison of landslide susceptibility mapping methodologies for Koyulhisar, Turkey: Conditional probability, logistic regression, artificial neural networks, and support vector machine. Environ. Earth Sci. 2010, 61, 821–836. [Google Scholar] [CrossRef]
Ermini, L.; Catani, F.; Casagli, N. Artificial neural networks applied to landslide susceptibility assessment. Geomorphology 2005, 66, 327–343. [Google Scholar] [CrossRef]
Ayalew, L.; Yamagishi, H. The application of GIS-based logistic regression for landslide susceptibility mapping in the Kakuda-Yahiko Mountains, Central Japan. Geomorphology 2005, 65, 15–31. [Google Scholar] [CrossRef]
Demir, G.; Aytekin, M.; Akgün, A.; Ikizler, S.B.; Tatar, O. A comparison of landslide susceptibility mapping of the eastern part of the North Anatolian Fault Zone (Turkey) by likelihood-frequency ratio and analytic hierarchy process methods. Nat. Hazards 2013, 65, 1481–1506. [Google Scholar] [CrossRef]
Nefeslioglu, H.A.; Duman, T.Y.; Durmaz, S. Landslide susceptibility mapping for a part of tectonic Kelkit Valley (Eastern Black Sea region of Turkey). Geomorphology 2008, 94, 401–418. [Google Scholar] [CrossRef]
Różycka, M.; Migoń, P.; Michniewicz, A. Topographic Wetness Index and Terrain Ruggedness Index in geomorphic characterisation of landslide terrains, on examples from the Sudetes, SW Poland. Z. Für Geomorphol. Suppl. Issues 2017, 61, 61–80. [Google Scholar]
Jang, J.-S. ANFIS: Adaptive-network-based fuzzy inference system. IEEE Trans. Syst. Manand Cybern. 1993, 23, 665–685. [Google Scholar] [CrossRef]
Shirzadi, A.; Shahabi, H.; Chapi, K.; Bui, D.T.; Pham, B.T.; Shahedi, K.; Ahmad, B.B. A comparative study between popular statistical and machine learning methods for simulating volume of landslides. CATENA 2017, 157, 213–226. [Google Scholar] [CrossRef]
Chen, W.; Panahi, M.; Pourghasemi, H.R. Performance evaluation of GIS-based new ensemble data mining techniques of adaptive neuro-fuzzy inference system (ANFIS) with genetic algorithm (GA), differential evolution (DE), and particle swarm optimization (PSO) for landslide spatial modelling. Catena 2017, 157, 310–324. [Google Scholar] [CrossRef]
Sarikaya, N.; Guney, K.; Yildiz, C. Adaptive neuro-fuzzy inference system for the computation of the characteristic impedance and the effective permittivity of the micro-coplanar strip line. Prog. Electromagn. Res. 2008, 6, 225–237. [Google Scholar] [CrossRef]
Turkmen, I.; Guney, K. Genetic tracker with adaptive neuro-fuzzy inference system for multiple target tracking. Expert Syst. Appl. 2008, 35, 1657–1667. [Google Scholar] [CrossRef]
Aghdam, I.N.; Varzandeh, M.H.M.; Pradhan, B. Landslide susceptibility mapping using an ensemble statistical index (Wi) and adaptive neuro-fuzzy inference system (ANFIS) model at Alborz Mountains (Iran). Environ. Earth Sci. 2016, 75, 553. [Google Scholar] [CrossRef]
Haykin, S.; Haykin, S. Neural Networks and Learning Machines. vol. 3; Pearson. Prentice Hall: Upper Saddle River, NJ, USA, 2009; ISBN 10: 0-13-147139-2. [Google Scholar]
Zare, M.; Pourghasemi, H.R.; Vafakhah, M.; Pradhan, B. Landslide susceptibility mapping at Vaz Watershed (Iran) using an artificial neural network model: A comparison between multilayer perceptron (MLP) and radial basic function (RBF) algorithms. Arab. J. Geosci. 2013, 6, 2873–2888. [Google Scholar] [CrossRef]
Pham, B.T.; Bui, D.T.; Prakash, I.; Dholakia, M. Hybrid integration of Multilayer Perceptron Neural Networks and machine learning ensembles for landslide susceptibility assessment at Himalayan area (India) using GIS. Catena 2017, 149, 52–63. [Google Scholar] [CrossRef]
Bui, D.T.; Tuan, T.A.; Klempe, H.; Pradhan, B.; Revhaug, I. Spatial prediction models for shallow landslide hazards: A comparative assessment of the efficacy of support vector machines, artificial neural networks, kernel logistic regression, and logistic model tree. Landslides 2016, 13, 361–378. [Google Scholar]
Eberhart, R.; Kennedy, J. A new optimizer using particle swarm theory. In Proceedings of the Sixth International Symposium on Micro Machine and Human Science, Nagoya, Japan, 4–6 October 1995; pp. 39–43. [Google Scholar]
Rodriguez, J.J.; Kuncheva, L.I.; Alonso, C.J. Rotation forest: A new classifier ensemble method. Ieee Trans. Pattern Anal. Mach. Intell. 2006, 28, 1619–1630. [Google Scholar] [CrossRef] [PubMed]
Rodriguez, J.J. Rotation forest and random oracles: Two classifier ensemble methods. In Proceedings of the Twentieth IEEE International Symposium on Computer-Based Medical Systems, Maribor, Slovenia, 20–22 June 2007; p. 3. [Google Scholar]
Wold, S.; Esbensen, K.; Geladi, P. Principal component analysis. Chemom. Intell. Lab. Syst. 1987, 2, 37–52. [Google Scholar] [CrossRef]
Ozcift, A.; Gulten, A. Classifier ensemble construction with rotation forest to improve medical diagnosis performance of machine learning algorithms. Comput. Methods Programs Biomed. 2011, 104, 443–451. [Google Scholar] [CrossRef] [PubMed]
Dufour, D. Finding Cost-Efficient Decision Trees; University of Waterloo: Waterloo, ON, Canada, 2014. [Google Scholar]
Kumar, N.; Reddy, G.O.; Chatterji, S. Evaluation of best first decision tree on categorical soil survey data for land capability classification. Int. J. Comput. Appl. 2013, 72, 5–8. [Google Scholar] [CrossRef]
Gorum, T.; Gonencgil, B.; Gokceoglu, C.; Nefeslioglu, H. Implementation of reconstructed geomorphologic units in landslide susceptibility mapping: The Melen Gorge (NW Turkey). Nat. Hazards 2008, 46, 323–351. [Google Scholar] [CrossRef]
Chen, W.; Shirzadi, A.; Shahabi, H.; Ahmad, B.B.; Zhang, S.; Hong, H.; Zhang, N. A novel hybrid artificial intelligence approach based on the rotation forest ensemble and naïve Bayes tree classifiers for a landslide susceptibility assessment in Langao County, China. Geomat. Nat. Hazards Risk 2017, 8, 1955–1977. [Google Scholar] [CrossRef] [Green Version]
Zhou, C.; Yin, K. Landslide displacement prediction of WA-SVM coupling model based on chaotic sequence. Electr. J. Geol. Eng. 2014, 19, 2973–2987. [Google Scholar]
Pham, B.T.; Bui, D.; Prakash, I.; Dholakia, M. Evaluation of predictive ability of support vector machines and naive Bayes trees methods for spatial prediction of landslides in Uttarakhand state (India) using GIS. J. Geomat. 2016, 10, 71–79. [Google Scholar]
Shirzadi, A.; Bui, D.T.; Pham, B.T.; Solaimani, K.; Chapi, K.; Kavian, A.; Shahabi, H.; Revhaug, I. Shallow landslide susceptibility assessment using a novel hybrid intelligence approach. Environ. Earth Sci. 2017, 76, 60. [Google Scholar] [CrossRef]
Shahabi, H.; Hashim, M. Landslide susceptibility mapping using GIS-based statistical models and Remote sensing data in tropical environment. Sci. Rep. 2015, 5, 9899. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Chen, W.; Peng, J.; Hong, H.; Shahabi, H.; Pradhan, B.; Liu, J.; Zhu, A.-X.; Pei, X.; Duan, Z. Landslide susceptibility modelling using GIS-based machine learning techniques for Chongren County, Jiangxi Province, China. Sci. Total Environ. 2018, 626, 1121–1135. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Location of the Van Chan district, Vietnam.

Figure 2. Geological map of the study area.

Figure 3. Photos of landslides in the Van Chan district (Photographs by Thai Minh Hai, Vo Nguyen Thien, and Nguyen Van Phu).

Figure 4. Thematic maps of the study area: (A) Slope, (B) distance to faults, (C) curvature, (D) slope aspect map, (E) slope length, (F) distance to rivers, (G) elevation, (H) distance to roads, (I) lithology, (J) valley depth, (K) TWI, and (L) TRI.

Figure 5. The architecture of ANFIS.

Figure 6. Methodology chart.

Figure 7. Analysis of errors of the RFBFDT model using (A) the training dataset and (B) the validating dataset.

Figure 8. Analysis of errors of the PSOANFIS model using (A) the training dataset and (B) the validating dataset.

Figure 9. Analysis of errors of the PSOANN model using (A) the training dataset and (B) the validating dataset.

Figure 10. ROC curves and AUC values of: (A) RFBFDT with the training dataset, (B) RFBFDT with the validating dataset, (C) PSOANFIS with the training dataset, (D) PSOANFIS with the validating dataset, (E) PSOANN with the training dataset, and (F) PSOANN with the validating dataset.

Figure 11. Landslide susceptibility maps of different models: (A) RFBFDT, (B) PSOANFIS, and (C) PSOANN.

Figure 12. Distribution of classes on the susceptibility maps.

Table 1. Geological formations and complexes and the main characteristics of the research zone.

No	Geological Formations and Complexes	Notation	Area (%)	Landslide Pixels (%)	Thickness (m)
1	Ban Cai Formation	D₃bc	0.76	1.18	810
2	Ban Nguon Formation	D₁bn	3.18	2.4	-
3	Ban Pap Formation	D_1-2bp	1.61	3.0	560
4	Bac Son Formation	C-Pbs	4.62	1.2	360–770
5	Ba Vi Complex	U/T₁bv	0.04	0	-
6	Ben Khe Formation	Є-Obk	1.23	0	300–500
7	Ca Vinh Complex	G/PP-MPcv	13.17	4.19	-
8	Cam Duong Formation	Є₁cđ	4.72	4.79	500–700
9	Nghia Lo Formation	T_1-2nl	0.22	6.59	500–550
10	Phu Sa Phin Complex	sG,Sy/Kpp	0.42	7.18	-
11	Quaternary	-	4.18	7.78	2–18
12	Song Mua Formation	D₁sm	4.01	8.98	700–800
13	Da Dinh Formation	NP-Є₁đđ	0.98	0	200–400
14	Cha Pa Formation	NPcp	3.07	5.39	500–700
15	Suoi Bang Formation	T₃n-rsb	8.40	9.58	990
16	Tu Le–Ngoi Thia Complex	tR/Ktl–R/Knt	21.56	10.78	-
17	Tram Tau Formation	J-Ktt	15.42	10.18	200–800
18	Unknown in age dykes and veins	-	0.22	11.38	-
19	Van Yen Formation	N₁²vy	0.04	0	100
20	Vien Nam Formation	T₁vn	0.45	0	800–1500
21	Xom Giau Complex	G/NPxg	0.25	0	-
22	Sinh Quyen Formation	PP-MPsq	9.89	8.38	1600–1800
23	Yen Chau Formation	K₂yc	1.58	0	300

Table 2. Analysis of frequency of landslides on the thematic maps.

No.	Parameter	Attribute	Class	Number of Pixels in Class	No. of Landslide in Pixels	% Class Pixels	% Landslide Pixels	FR
1	Slope (^o)	1	0–7.92	515,596	0	17.18	0	0.00
		2	7.92–17.82	541,470	51	18.04	30.54	1.69
		3	17.82–26.07	711,557	57	23.71	34.13	1.44
		4	26.07–34.65	668,546	42	22.27	25.15	1.13
		5	34.65–44.88	431,726	14	14.38	8.38	0.58
		6	44.88–84.16	132,683	3	4.42	1.8	0.41
2	Aspect	1	Flat	143,317	0	4.77	0	0.00
		2	North	327,283	21	10.9	12.57	1.15
		3	Northeast	418,241	26	13.93	15.57	1.12
		4	East	395,523	31	13.18	18.56	1.41
		5	Southeast	325,218	22	10.83	13.17	1.22
		6	South	339,844	24	11.32	14.37	1.27
		7	Southwest	388,176	18	12.93	10.78	0.83
		8	West	349,264	13	11.64	7.78	0.67
		9	Northwest	314,712	12	10.48	7.19	0.69
3	Elevation (m)	1	0–200	311,586	11	10.38	6.59	0.63
		2	200–400	822,680	53	27.41	31.74	1.16
		3	400–600	583,190	54	19.43	32.34	1.66
		4	600–800	474,387	26	15.8	15.57	0.99
		5	800–1000	328,800	16	10.95	9.58	0.87
		6	1000–1200	218,799	5	7.29	2.99	0.41
		7	1200–1400	122,496	2	4.08	1.2	0.29
		8	1400–1600	65,695	0	2.19	0	0.00
		9	1600–1800	35,632	0	1.19	0	0.00
		10	1800–2542	38,313	0	1.28	0	0.00
4	Curvature	1	Concave (<−0.05)	1,251,973	93	41.71	55.69	1.34
		2	Flat (−0.05–0.05)	477,452	0	15.91	0	0.00
		3	Convex (>0.05)	1,272,153	74	42.38	44.31	1.05
5	Lithology	1	Group A	1,156,217	94	38.52	56.29	1.46
		2	Group B	253,577	17	8.45	10.18	1.20
		3	Group C	208,547	3	6.95	1.8	0.26
		4	Group D	335,011	18	11.16	10.78	0.97
		5	Group E	419,594	9	13.98	5.39	0.39
		6	Group F	124,353	4	4.14	2.4	0.58
		7	Group G	504,270	22	16.8	13.17	0.78
6	Slope length (m)	1	0–20	917,077	36	30.55	21.56	0.71
		2	20–50	440,296	20	14.67	11.98	0.82
		3	50–100	586,102	33	19.53	19.76	1.01
		4	100–150	343,241	25	11.44	14.97	1.31
		5	150–200	227,146	21	7.57	12.57	1.66
		6	200–2501	487,716	32	16.25	19.16	1.18
7	Valley depth (m)	1	0–5	1,379,429	80	45.96	47.9	1.04
		2	5–30	538,948	34	17.96	20.36	1.13
		3	30–60	320,995	16	10.69	9.58	0.90
		4	60–100	272,900	10	9.09	5.99	0.66
		5	100–150	221,974	20	7.4	11.98	1.62
		6	150–656	267,332	7	8.91	4.19	0.47
8	Distance (Roads) (m)	1	0–100	528,102	80	17.59	47.9	2.72
		2	100–200	402,641	19	13.41	11.38	0.85
		3	200–300	300,834	15	10.02	8.98	0.90
		4	300–400	235,154	10	7.83	5.99	0.76
		5	>400	1,534,838	43	51.13	25.75	0.50
9	Distance (Rivers) (m)	1	0–100	692,491	32	23.07	19.16	0.83
		2	100–200	599,333	52	19.97	31.14	1.56
		3	200–300	469,911	29	15.66	17.37	1.11
		4	300–400	342,122	19	11.4	11.38	1.00
		5	>400	897,712	35	29.91	20.96	0.70
10	Distance (Faults) (m)	1	0–250	442,100	30	14.73	17.96	1.22
		2	250–500	393,956	28	13.13	16.77	1.28
		3	500–750	342,641	21	11.42	12.57	1.10
		4	750–900	179,677	9	5.99	5.39	0.90
		5	>900	1,643,195	79	54.74	47.31	0.86
11	TWI	1	0–8	800,751	22	26.7	13.17	0.49
		2	8–9	86,528	2	2.89	1.2	0.42
		3	9–10	240,496	17	8.02	10.18	1.27
		4	10–11	360,506	23	12.02	13.77	1.15
		5	11–24	1,510,529	103	50.37	61.68	1.22
12	TRI	1	0–1	366,542	0	12.21	0	0.00
		2	1–3	274,886	12	9.16	7.19	0.78
		3	3–5	460,466	46	15.34	27.54	1.80
		4	5–7	596,576	49	19.88	29.34	1.48
		5	>7	1,303,108	60	43.41	35.93	0.83

Table 3. Lithology groups and their characteristics.

No.	Group	Name	Characteristics of Rock Types
1	A	Acid-neutral igneous magmatic rocks	Dacite, felsite, rhyolite, and andesite rocks
2	B	Terrigenous sedimentary rocks with rich aluminosilicate components	Rhyolites, gritstone, siltstone, carbonates, claystone, alternated dacites, sandstone, and andesite sediments
3	C	Terrigenous sedimentary and transformative rocks with rich quartz segments	Quartz–mica sandstone, gritstone, sandstone, claystone, siltstone, alternated rhyolites, dacites, carbonates, quartzitic sandstone, andesite sediments, cherty shale
4	D	Carbonate rocks	Cherty limestone, clayish limestone, and dolomitized limestone
5	E	Acid-neutral intrusive magmatic rocks	Plagioclase–granite, rhyolite, felsite, dacite, andesite rocks, granophyre, granodiorite, granosyenite, diorite, and quartz-diorite
6	F	Quaternary deposits	Pluvial and alluvial sedimentary: pebbles, cobble, stone, sand, silt
7	G	Metamorphic rocks with rich aluminosilicate components	Quartz sericite–schist, quartz mica–schist, quartzite, sericite–quartzite

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Nguyen, V.V.; Pham, B.T.; Vu, B.T.; Prakash, I.; Jha, S.; Shahabi, H.; Shirzadi, A.; Ba, D.N.; Kumar, R.; Chatterjee, J.M.; et al. Hybrid Machine Learning Approaches for Landslide Susceptibility Modeling. Forests 2019, 10, 157. https://0-doi-org.brum.beds.ac.uk/10.3390/f10020157

AMA Style

Nguyen VV, Pham BT, Vu BT, Prakash I, Jha S, Shahabi H, Shirzadi A, Ba DN, Kumar R, Chatterjee JM, et al. Hybrid Machine Learning Approaches for Landslide Susceptibility Modeling. Forests. 2019; 10(2):157. https://0-doi-org.brum.beds.ac.uk/10.3390/f10020157

Chicago/Turabian Style

Nguyen, Vu Viet, Binh Thai Pham, Ba Thao Vu, Indra Prakash, Sudan Jha, Himan Shahabi, Ataollah Shirzadi, Dong Nguyen Ba, Raghvendra Kumar, Jyotir Moy Chatterjee, and et al. 2019. "Hybrid Machine Learning Approaches for Landslide Susceptibility Modeling" Forests 10, no. 2: 157. https://0-doi-org.brum.beds.ac.uk/10.3390/f10020157

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Hybrid Machine Learning Approaches for Landslide Susceptibility Modeling

Abstract

1. Introduction

2. Study Area

3. Materials and Methods

3.1. Data Used

3.1.1. Landslide Inventory

3.1.2. Landslide Influencing Parameters

3.2. Methods Used

3.2.1. Adaptive Neuro Fuzzy Inference System (ANFIS)

3.2.2. Multilayer Perceptron Neural Networks

3.2.3. Particle Swarm Optimization (Pso)

3.2.4. Rotation Forest

3.2.5. Best First Decision Trees

3.2.6. Validation Assessment

4. Methodology Adopted for Developing Landslide Susceptibility Maps

5. Results and Discussion

6. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI