Offensive Sentence Classification Using Character-Level CNN and Transfer Learning with Fake Sentences

Suin Seo, Sung Bea Cho

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Citations (Scopus)

Abstract

There are two difficulties in classifying offensive sentences: One is the modifiability of offensive terms, and the other is the class imbalance which appears in general offensive corpus. Solving these problems, we propose a method of pre-training fake sentences generated as character-level to convolution layers preventing under-fitting from data shortage, and dealing with the data imbalance. We insert the offensive words to half of the randomly generated sentences, and train the convolution neural networks (CNN) with theses sentences and the labels of whether offensive word is included. We use the trained filter of CNN for training new CNN given original data, resulting in the increase of the amount of training data. We get higher F1-score with the proposed method than that without pre-training in three dataset of insult from kaggle, Bullying trace, and formspring.

Original languageEnglish
Title of host publicationNeural Information Processing - 24th International Conference, ICONIP 2017, Proceedings
EditorsDongbin Zhao, El-Sayed M. El-Alfy, Derong Liu, Shengli Xie, Yuanqing Li
PublisherSpringer Verlag
Pages532-539
Number of pages8
ISBN (Print)9783319700953
DOIs
Publication statusPublished - 2017
Event24th International Conference on Neural Information Processing, ICONIP 2017 - Guangzhou, China
Duration: 2017 Nov 142017 Nov 18

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume10635 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other24th International Conference on Neural Information Processing, ICONIP 2017
Country/TerritoryChina
CityGuangzhou
Period17/11/1417/11/18

Bibliographical note

Funding Information:
Acknowledgments. This work was supported by Institute for Information & Communications Technology Promotion (IITP) grant funded by the Korea government (MSIP) (R0124-16-0002), Emotional Intelligence Technology to Infer Human Emotion and Carry on Dialogue Accordingly.

Publisher Copyright:
© 2017, Springer International Publishing AG.

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint

Dive into the research topics of 'Offensive Sentence Classification Using Character-Level CNN and Transfer Learning with Fake Sentences'. Together they form a unique fingerprint.

Cite this