This paper studies the problem of mining named entity translations by aligning comparable corpora. Current state-of-the-art approaches mine a translation pair by aligning an entity graph in one language to another based on node similarity or propagated similarity of related entities. However, they, building on the assumption of 'symmetry', quickly deteriorate on 'weakly' comparable corpora with some asymmetry. In this paper, we pursue two directions for overcoming relation and entity asymmetry respectively. The first approach starts from weakly comparable corpora (for high recall) then ensures precision by selective propagation only to entities of symmetric relations. The second approach starts from parallel corpora (for high precision) then enhances recall by extending the translation matrix based on node similarity and contextual similarity. Our experimental results on English-Chinese corpora show that both approaches are effective and complementary. Our combined approach outperforms the best-performing baseline in terms of F1-score by up to 0.28.
|Number of pages||13|
|Journal||IEEE Transactions on Knowledge and Data Engineering|
|Publication status||Published - 2014 Dec 1|
All Science Journal Classification (ASJC) codes
- Information Systems
- Computer Science Applications
- Computational Theory and Mathematics