TY - JOUR
T1 - Crowdsourcing identification of license violations
AU - Lee, Sanghoon
AU - German, Daniel M.
AU - Hwang, Seung won
AU - Kim, Sunghun
N1 - Publisher Copyright:
© 2015. The Korean Institute of Information Scientists and Engineers.
Copyright:
Copyright 2017 Elsevier B.V., All rights reserved.
PY - 2015
Y1 - 2015
N2 - Free and open source software (FOSS) has created a large pool of source codes that can be easily copied to create new applications. However, a copy should preserve copyright notice and license of the original file unless the license explicitly permits such a change. Through software evolution, it is challenging to keep original licenses or choose proper licenses. As a result, there are many potential license violations. Despite the fact that violations can have high impact on protecting copyright, identification of violations is highly complex. It relies on manual inspections by experts. However, such inspection cannot be scaled up with open source software released daily worldwide. To make this process scalable, we propose the following two methods: use machine-based algorithms to narrow down the potential violations; and guide non-experts to manually inspect violations. Using the first method, we found 219 projects (76.6%) with potential violations. Using the second method, we show that the accuracy of crowds is comparable to that of experts. Our techniques might help developers identify potential violations, understand the causes, and resolve these violations.
AB - Free and open source software (FOSS) has created a large pool of source codes that can be easily copied to create new applications. However, a copy should preserve copyright notice and license of the original file unless the license explicitly permits such a change. Through software evolution, it is challenging to keep original licenses or choose proper licenses. As a result, there are many potential license violations. Despite the fact that violations can have high impact on protecting copyright, identification of violations is highly complex. It relies on manual inspections by experts. However, such inspection cannot be scaled up with open source software released daily worldwide. To make this process scalable, we propose the following two methods: use machine-based algorithms to narrow down the potential violations; and guide non-experts to manually inspect violations. Using the first method, we found 219 projects (76.6%) with potential violations. Using the second method, we show that the accuracy of crowds is comparable to that of experts. Our techniques might help developers identify potential violations, understand the causes, and resolve these violations.
UR - http://www.scopus.com/inward/record.url?scp=85008255741&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85008255741&partnerID=8YFLogxK
U2 - 10.5626/JCSE.2015.9.4.190
DO - 10.5626/JCSE.2015.9.4.190
M3 - Article
AN - SCOPUS:85008255741
VL - 9
SP - 190
EP - 203
JO - Journal of Computing Science and Engineering
JF - Journal of Computing Science and Engineering
SN - 1976-4677
IS - 4
ER -