There are two difficulties in classifying offensive sentences: One is the modifiability of offensive terms, and the other is the class imbalance which appears in general offensive corpus. Solving these problems, we propose a method of pre-training fake sentences generated as character-level to convolution layers preventing under-fitting from data shortage, and dealing with the data imbalance. We insert the offensive words to half of the randomly generated sentences, and train the convolution neural networks (CNN) with theses sentences and the labels of whether offensive word is included. We use the trained filter of CNN for training new CNN given original data, resulting in the increase of the amount of training data. We get higher F1-score with the proposed method than that without pre-training in three dataset of insult from kaggle, Bullying trace, and formspring.
|Title of host publication||Neural Information Processing - 24th International Conference, ICONIP 2017, Proceedings|
|Editors||Dongbin Zhao, El-Sayed M. El-Alfy, Derong Liu, Shengli Xie, Yuanqing Li|
|Number of pages||8|
|Publication status||Published - 2017|
|Event||24th International Conference on Neural Information Processing, ICONIP 2017 - Guangzhou, China|
Duration: 2017 Nov 14 → 2017 Nov 18
|Name||Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)|
|Other||24th International Conference on Neural Information Processing, ICONIP 2017|
|Period||17/11/14 → 17/11/18|
Bibliographical noteFunding Information:
Acknowledgments. This work was supported by Institute for Information & Communications Technology Promotion (IITP) grant funded by the Korea government (MSIP) (R0124-16-0002), Emotional Intelligence Technology to Infer Human Emotion and Carry on Dialogue Accordingly.
All Science Journal Classification (ASJC) codes
- Theoretical Computer Science
- Computer Science(all)