An ensemble semi-supervised learning method for predicting defaults in social lending

Aleum Kim, Sung-Bae Cho

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

Social lending is made between peers, and with the risk that the investor can take direct damages from the borrower's failure to repay, accurate default prediction for borrowers is important. The repayment result can be known after the end of the repayment period, and such data is limited. However, social loans are matched online in real time and large amounts of unlabeled data are being generated. In this paper, we propose a method to combine label propagation and transductive support vector machine (TSVM) with Dempster–Shafer theory for accurate default prediction of social lending using unlabeled data. In order to train a lot of data effectively, we ensemble semi-supervised learning methods with different characteristics. Label propagation is performed so that data having similar features are assigned to the same class and TSVM makes moving away data having different features. Dempster–Shafer fusion method allows accurate labeling by exploiting the merits of the two methods. Experiments are performed using the open data set from Lending Club. The accuracy of the proposed method is improved by about 10% against that of the model using only labeled data, and more accurate labeling can be performed through the proposed ensemble method.

Original languageEnglish
Pages (from-to)193-199
Number of pages7
JournalEngineering Applications of Artificial Intelligence
Volume81
DOIs
Publication statusPublished - 2019 May 1

Fingerprint

Supervised learning
Labeling
Support vector machines
Labels
Fusion reactions
Experiments

All Science Journal Classification (ASJC) codes

  • Control and Systems Engineering
  • Artificial Intelligence
  • Electrical and Electronic Engineering

Cite this

@article{838db8cf928b4c66892fe8ffde9e2f88,
title = "An ensemble semi-supervised learning method for predicting defaults in social lending",
abstract = "Social lending is made between peers, and with the risk that the investor can take direct damages from the borrower's failure to repay, accurate default prediction for borrowers is important. The repayment result can be known after the end of the repayment period, and such data is limited. However, social loans are matched online in real time and large amounts of unlabeled data are being generated. In this paper, we propose a method to combine label propagation and transductive support vector machine (TSVM) with Dempster–Shafer theory for accurate default prediction of social lending using unlabeled data. In order to train a lot of data effectively, we ensemble semi-supervised learning methods with different characteristics. Label propagation is performed so that data having similar features are assigned to the same class and TSVM makes moving away data having different features. Dempster–Shafer fusion method allows accurate labeling by exploiting the merits of the two methods. Experiments are performed using the open data set from Lending Club. The accuracy of the proposed method is improved by about 10{\%} against that of the model using only labeled data, and more accurate labeling can be performed through the proposed ensemble method.",
author = "Aleum Kim and Sung-Bae Cho",
year = "2019",
month = "5",
day = "1",
doi = "10.1016/j.engappai.2019.02.014",
language = "English",
volume = "81",
pages = "193--199",
journal = "Engineering Applications of Artificial Intelligence",
issn = "0952-1976",
publisher = "Elsevier Limited",

}

An ensemble semi-supervised learning method for predicting defaults in social lending. / Kim, Aleum; Cho, Sung-Bae.

In: Engineering Applications of Artificial Intelligence, Vol. 81, 01.05.2019, p. 193-199.

Research output: Contribution to journalArticle

TY - JOUR

T1 - An ensemble semi-supervised learning method for predicting defaults in social lending

AU - Kim, Aleum

AU - Cho, Sung-Bae

PY - 2019/5/1

Y1 - 2019/5/1

N2 - Social lending is made between peers, and with the risk that the investor can take direct damages from the borrower's failure to repay, accurate default prediction for borrowers is important. The repayment result can be known after the end of the repayment period, and such data is limited. However, social loans are matched online in real time and large amounts of unlabeled data are being generated. In this paper, we propose a method to combine label propagation and transductive support vector machine (TSVM) with Dempster–Shafer theory for accurate default prediction of social lending using unlabeled data. In order to train a lot of data effectively, we ensemble semi-supervised learning methods with different characteristics. Label propagation is performed so that data having similar features are assigned to the same class and TSVM makes moving away data having different features. Dempster–Shafer fusion method allows accurate labeling by exploiting the merits of the two methods. Experiments are performed using the open data set from Lending Club. The accuracy of the proposed method is improved by about 10% against that of the model using only labeled data, and more accurate labeling can be performed through the proposed ensemble method.

AB - Social lending is made between peers, and with the risk that the investor can take direct damages from the borrower's failure to repay, accurate default prediction for borrowers is important. The repayment result can be known after the end of the repayment period, and such data is limited. However, social loans are matched online in real time and large amounts of unlabeled data are being generated. In this paper, we propose a method to combine label propagation and transductive support vector machine (TSVM) with Dempster–Shafer theory for accurate default prediction of social lending using unlabeled data. In order to train a lot of data effectively, we ensemble semi-supervised learning methods with different characteristics. Label propagation is performed so that data having similar features are assigned to the same class and TSVM makes moving away data having different features. Dempster–Shafer fusion method allows accurate labeling by exploiting the merits of the two methods. Experiments are performed using the open data set from Lending Club. The accuracy of the proposed method is improved by about 10% against that of the model using only labeled data, and more accurate labeling can be performed through the proposed ensemble method.

UR - http://www.scopus.com/inward/record.url?scp=85062467625&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85062467625&partnerID=8YFLogxK

U2 - 10.1016/j.engappai.2019.02.014

DO - 10.1016/j.engappai.2019.02.014

M3 - Article

AN - SCOPUS:85062467625

VL - 81

SP - 193

EP - 199

JO - Engineering Applications of Artificial Intelligence

JF - Engineering Applications of Artificial Intelligence

SN - 0952-1976

ER -