IGSNRR OpenIR
A self-training semi-supervised machine learning method for predictive mapping of soil classes with limited sample data
Zhang, Lei1,4; Yang, Lin1,2; Ma, Tianwu3,4,5; Shen, Feixue1; Cai, Yanyan1; Zhou, Chenghu1,2
2021-02-15
Source PublicationGEODERMA
ISSN0016-7061
Volume384Pages:10
Corresponding AuthorYang, Lin(yanglin@nju.edu.cn)
AbstractNumerous machine learning models have been developed for constructing the relationship between soil classes or properties and its environmental covariates in digital soil mapping (DSM). Most machine learning models are trained with a supervised learning (SL) method based on training samples. However, the collected sample data is often limited in practice due to that field sampling is expensive and time-consuming. The insufficient samples may limit the learning ability of the model to a large extent. Semi-supervised machine learning, a new machine learning paradigm that makes use of both unsampled data and a small amount of sampled data in the learning process, can be a potential effective method for DSM. In this study, we present a self-training semi-supervised learning (SSL) method for DSM. Different with the SL method for machine learning models, the SSL method not only utilizes the sampled locations but also the abundant environmental covariate information at the unvisited locations. Its basic idea is to iteratively enlarge the training data set by adding the unsampled points with high prediction confidence from the unvisited locations until a stopping criterion reached. The proposed SSL method was applied in machine learning models for predicting soil classes in Heshan Farm of Nenjiang County in Heilongjiang Province, China. Three machine learning models, including multinomial logistic regression (MLR), k-nearest neighbor (KNN) and random forest (RF), were selected to evaluate the efficiency of the SSL method. The entropy threshold was an important parameter in the SSL method, and a sensitivity analysis on this parameter was conducted with using a series of entropy thresholds. The SSL method was compared with the SL method for the three machine learning models for soil prediction. A cross-validation was employed to evaluate the accuracy of the predicted soil class maps generated based on each method. The results showed that the prediction accuracies (the proportion of the correctly predicted samples over the total number of validation samples) of the SSL method were higher than those of the SL method for MLR, KNN, and RF by 5.9%, 12.2%, and 6.0%, respectively. RF-SSL was the most accurate model in the study area, followed by KNN-SSL. Meanwhile, the self-training SSL method for the KNN model had the largest improvement comparing with the other two models. Furthermore, the predicted soil maps using the SSL method showed a more reasonable spatial variation pattern of soil classes. In the study area, a suitable value of the entropy threshold was 0.8 similar to 1.0. We concluded that the SSL method improved the soil prediction accuracy compared with the SL method when applying machine learning models for DSM, and thus is a potential efficient method for DSM with limit sample data.
KeywordDigital soil sampling Machine learning Semi-supervised learning Self-training Predictive mapping
DOI10.1016/j.geoderma.2020.114809
WOS KeywordSPATIAL PREDICTION ; RANDOM FORESTS ; REGRESSION ; CLASSIFICATION ; RESOLUTION ; LANDSCAPE ; REGION ; STOCKS ; MAP
Indexed BySCI
Language英语
Funding ProjectNational Natural Science Foundation of China[41971054] ; National Natural Science Foundation of China[41530749] ; National Natural Science Foundation of China[41871300]
Funding OrganizationNational Natural Science Foundation of China
WOS Research AreaAgriculture
WOS SubjectSoil Science
WOS IDWOS:000594244300014
PublisherELSEVIER
Citation statistics
Cited Times:5[WOS]   [WOS Record]     [Related Records in WOS]
Document Type期刊论文
Identifierhttp://ir.igsnrr.ac.cn/handle/311030/136525
Collection中国科学院地理科学与资源研究所
Corresponding AuthorYang, Lin
Affiliation1.Nanjing Univ, Sch Geog & Ocean Sci, Nanjing 210023, Peoples R China
2.Chinese Acad Sci, Inst Geog Sci & Nat Resources Res, State Key Lab Resources & Environm Informat Syst, Beijing 100101, Peoples R China
3.Nanjing Normal Univ, Sch Geog, Nanjing 210023, Peoples R China
4.Jiangsu Ctr Collaborat Innovat Geog Informat Reso, Nanjing 210023, Peoples R China
5.Nanjing Normal Univ, Minist Educ, Key Lab Virtual Geog Environm, Nanjing 210023, Peoples R China
Recommended Citation
GB/T 7714
Zhang, Lei,Yang, Lin,Ma, Tianwu,et al. A self-training semi-supervised machine learning method for predictive mapping of soil classes with limited sample data[J]. GEODERMA,2021,384:10.
APA Zhang, Lei,Yang, Lin,Ma, Tianwu,Shen, Feixue,Cai, Yanyan,&Zhou, Chenghu.(2021).A self-training semi-supervised machine learning method for predictive mapping of soil classes with limited sample data.GEODERMA,384,10.
MLA Zhang, Lei,et al."A self-training semi-supervised machine learning method for predictive mapping of soil classes with limited sample data".GEODERMA 384(2021):10.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Zhang, Lei]'s Articles
[Yang, Lin]'s Articles
[Ma, Tianwu]'s Articles
Baidu academic
Similar articles in Baidu academic
[Zhang, Lei]'s Articles
[Yang, Lin]'s Articles
[Ma, Tianwu]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Zhang, Lei]'s Articles
[Yang, Lin]'s Articles
[Ma, Tianwu]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.