Training Data Optimization for Pairwise Learning to Rank

Hojae Han, Seung Won Hwang, Young In Song, Siyeon Kim

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper studies data optimization for Learning to Rank (LtR), by dropping training labels to increase ranking accuracy. Our work is inspired by data dropout, showing some training data do not positively influence learning and are better dropped out, despite a common belief that a larger training dataset is beneficial. Our main contribution is to extend this intuition for noisy-and semi-supervised LtR scenarios: some human annotations can be noisy or out-of-date, and so are machine-generated pseudo-labels in semi-supervised scenarios. Dropping out such unreliable labels would contribute to both scenarios. State-of-the-arts propose Influence Function (IF) for estimating how each training instance affects learn-ing, and we identify and overcome two challenges specific to LtR. 1) Non-convex ranking functions violate the assumptions required for the robustness of IF estimation. 2) The pairwise learning of LtR incurs quadratic estimation overhead. Our technical contributions are addressing these challenges: First, we revise estimation and data optimization to accommodate reduced reliability; Second, we devise a group-wise estimation, reducing cost yet keeping accuracy high. We validate the effectiveness of our approach in a wide range of ad-hoc information retrieval benchmarks and real-life search engine datasets in both noisy-and semi-supervised scenarios.

Original languageEnglish
Title of host publicationICTIR 2020 - Proceedings of the 2020 ACM SIGIR International Conference on Theory of Information Retrieval
PublisherAssociation for Computing Machinery
Pages13-20
Number of pages8
ISBN (Electronic)9781450380676
DOIs
Publication statusPublished - 2020 Sep 14
Event6th ACM SIGIR / 10th International Conference on the Theory of Information Retrieval, ICTIR 2020 - Virtual, Online, Norway
Duration: 2020 Sep 142020 Sep 17

Publication series

NameICTIR 2020 - Proceedings of the 2020 ACM SIGIR International Conference on Theory of Information Retrieval

Conference

Conference6th ACM SIGIR / 10th International Conference on the Theory of Information Retrieval, ICTIR 2020
CountryNorway
CityVirtual, Online
Period20/9/1420/9/17

Bibliographical note

Funding Information:
This work is partly supported by Artificial Intelligence Graduate School Program (2020-0-01361) and ITRC support program (IITP-2020-2020-0-01789) supervised by IITP.

All Science Journal Classification (ASJC) codes

  • Computer Science (miscellaneous)
  • Information Systems

Fingerprint Dive into the research topics of 'Training Data Optimization for Pairwise Learning to Rank'. Together they form a unique fingerprint.

Cite this