Offensive Sentence Classification Using Character-Level CNN and Transfer Learning with Fake Sentences

Suin Seo, Sung-Bae Cho

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

There are two difficulties in classifying offensive sentences: One is the modifiability of offensive terms, and the other is the class imbalance which appears in general offensive corpus. Solving these problems, we propose a method of pre-training fake sentences generated as character-level to convolution layers preventing under-fitting from data shortage, and dealing with the data imbalance. We insert the offensive words to half of the randomly generated sentences, and train the convolution neural networks (CNN) with theses sentences and the labels of whether offensive word is included. We use the trained filter of CNN for training new CNN given original data, resulting in the increase of the amount of training data. We get higher F1-score with the proposed method than that without pre-training in three dataset of insult from kaggle, Bullying trace, and formspring.

Original languageEnglish
Title of host publicationNeural Information Processing - 24th International Conference, ICONIP 2017, Proceedings
EditorsDongbin Zhao, El-Sayed M. El-Alfy, Derong Liu, Shengli Xie, Yuanqing Li
PublisherSpringer Verlag
Pages532-539
Number of pages8
ISBN (Print)9783319700953
DOIs
Publication statusPublished - 2017 Jan 1
Event24th International Conference on Neural Information Processing, ICONIP 2017 - Guangzhou, China
Duration: 2017 Nov 142017 Nov 18

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume10635 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other24th International Conference on Neural Information Processing, ICONIP 2017
CountryChina
CityGuangzhou
Period17/11/1417/11/18

Fingerprint

Transfer Learning
Convolution
Neural Networks
Neural networks
Shortage
Labels
Trace
Filter
Training
Character
Term

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Seo, S., & Cho, S-B. (2017). Offensive Sentence Classification Using Character-Level CNN and Transfer Learning with Fake Sentences. In D. Zhao, E-S. M. El-Alfy, D. Liu, S. Xie, & Y. Li (Eds.), Neural Information Processing - 24th International Conference, ICONIP 2017, Proceedings (pp. 532-539). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 10635 LNCS). Springer Verlag. https://doi.org/10.1007/978-3-319-70096-0_55
Seo, Suin ; Cho, Sung-Bae. / Offensive Sentence Classification Using Character-Level CNN and Transfer Learning with Fake Sentences. Neural Information Processing - 24th International Conference, ICONIP 2017, Proceedings. editor / Dongbin Zhao ; El-Sayed M. El-Alfy ; Derong Liu ; Shengli Xie ; Yuanqing Li. Springer Verlag, 2017. pp. 532-539 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{4010ff5a9f1746979766f9632da83e25,
title = "Offensive Sentence Classification Using Character-Level CNN and Transfer Learning with Fake Sentences",
abstract = "There are two difficulties in classifying offensive sentences: One is the modifiability of offensive terms, and the other is the class imbalance which appears in general offensive corpus. Solving these problems, we propose a method of pre-training fake sentences generated as character-level to convolution layers preventing under-fitting from data shortage, and dealing with the data imbalance. We insert the offensive words to half of the randomly generated sentences, and train the convolution neural networks (CNN) with theses sentences and the labels of whether offensive word is included. We use the trained filter of CNN for training new CNN given original data, resulting in the increase of the amount of training data. We get higher F1-score with the proposed method than that without pre-training in three dataset of insult from kaggle, Bullying trace, and formspring.",
author = "Suin Seo and Sung-Bae Cho",
year = "2017",
month = "1",
day = "1",
doi = "10.1007/978-3-319-70096-0_55",
language = "English",
isbn = "9783319700953",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
pages = "532--539",
editor = "Dongbin Zhao and El-Alfy, {El-Sayed M.} and Derong Liu and Shengli Xie and Yuanqing Li",
booktitle = "Neural Information Processing - 24th International Conference, ICONIP 2017, Proceedings",
address = "Germany",

}

Seo, S & Cho, S-B 2017, Offensive Sentence Classification Using Character-Level CNN and Transfer Learning with Fake Sentences. in D Zhao, E-SM El-Alfy, D Liu, S Xie & Y Li (eds), Neural Information Processing - 24th International Conference, ICONIP 2017, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 10635 LNCS, Springer Verlag, pp. 532-539, 24th International Conference on Neural Information Processing, ICONIP 2017, Guangzhou, China, 17/11/14. https://doi.org/10.1007/978-3-319-70096-0_55

Offensive Sentence Classification Using Character-Level CNN and Transfer Learning with Fake Sentences. / Seo, Suin; Cho, Sung-Bae.

Neural Information Processing - 24th International Conference, ICONIP 2017, Proceedings. ed. / Dongbin Zhao; El-Sayed M. El-Alfy; Derong Liu; Shengli Xie; Yuanqing Li. Springer Verlag, 2017. p. 532-539 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 10635 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Offensive Sentence Classification Using Character-Level CNN and Transfer Learning with Fake Sentences

AU - Seo, Suin

AU - Cho, Sung-Bae

PY - 2017/1/1

Y1 - 2017/1/1

N2 - There are two difficulties in classifying offensive sentences: One is the modifiability of offensive terms, and the other is the class imbalance which appears in general offensive corpus. Solving these problems, we propose a method of pre-training fake sentences generated as character-level to convolution layers preventing under-fitting from data shortage, and dealing with the data imbalance. We insert the offensive words to half of the randomly generated sentences, and train the convolution neural networks (CNN) with theses sentences and the labels of whether offensive word is included. We use the trained filter of CNN for training new CNN given original data, resulting in the increase of the amount of training data. We get higher F1-score with the proposed method than that without pre-training in three dataset of insult from kaggle, Bullying trace, and formspring.

AB - There are two difficulties in classifying offensive sentences: One is the modifiability of offensive terms, and the other is the class imbalance which appears in general offensive corpus. Solving these problems, we propose a method of pre-training fake sentences generated as character-level to convolution layers preventing under-fitting from data shortage, and dealing with the data imbalance. We insert the offensive words to half of the randomly generated sentences, and train the convolution neural networks (CNN) with theses sentences and the labels of whether offensive word is included. We use the trained filter of CNN for training new CNN given original data, resulting in the increase of the amount of training data. We get higher F1-score with the proposed method than that without pre-training in three dataset of insult from kaggle, Bullying trace, and formspring.

UR - http://www.scopus.com/inward/record.url?scp=85035127324&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85035127324&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-70096-0_55

DO - 10.1007/978-3-319-70096-0_55

M3 - Conference contribution

AN - SCOPUS:85035127324

SN - 9783319700953

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 532

EP - 539

BT - Neural Information Processing - 24th International Conference, ICONIP 2017, Proceedings

A2 - Zhao, Dongbin

A2 - El-Alfy, El-Sayed M.

A2 - Liu, Derong

A2 - Xie, Shengli

A2 - Li, Yuanqing

PB - Springer Verlag

ER -

Seo S, Cho S-B. Offensive Sentence Classification Using Character-Level CNN and Transfer Learning with Fake Sentences. In Zhao D, El-Alfy E-SM, Liu D, Xie S, Li Y, editors, Neural Information Processing - 24th International Conference, ICONIP 2017, Proceedings. Springer Verlag. 2017. p. 532-539. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-319-70096-0_55