The Tool for the Automatic Analysis of Cohesion 2.0: Integrating semantic similarity and text overlap

Scott A. Crossley, Kristopher Kyle, Mihai Dascalu

Research output: Contribution to journalArticlepeer-review

43 Citations (Scopus)


This article introduces the second version of the Tool for the Automatic Analysis of Cohesion (TAACO 2.0). Like its predecessor, TAACO 2.0 is a freely available text analysis tool that works on the Windows, Mac, and Linux operating systems; is housed on a user’s hard drive; is easy to use; and allows for batch processing of text files. TAACO 2.0 includes all the original indices reported for TAACO 1.0, but it adds a number of new indices related to local and global cohesion at the semantic level, reported by latent semantic analysis, latent Dirichlet allocation, and word2vec. The tool also includes a source overlap feature, which calculates lexical and semantic overlap between a source and a response text (i.e., cohesion between the two texts based measures of text relatedness). In the first study in this article, we examined the effects that cohesion features, prompt, essay elaboration, and enhanced cohesion had on expert ratings of text coherence, finding that global semantic similarity as reported by word2vec was an important predictor of coherence ratings. A second study was conducted to examine the source and response indices. In this study we examined whether source overlap between the speaking samples found in the TOEFL-iBT integrated speaking tasks and the responses produced by test-takers was predictive of human ratings of speaking proficiency. The results indicated that the percentage of keywords found in both the source and response and the similarity between the source document and the response, as reported by word2vec, were significant predictors of speaking quality. Combined, these findings help validate the new indices reported for TAACO 2.0.

Original languageEnglish
Pages (from-to)14-27
Number of pages14
JournalBehavior Research Methods
Issue number1
Publication statusPublished - 2019 Feb 15

Bibliographical note

Funding Information:
Author note We thank Mary Jane White for early work on the data used in Study 1. Without her, this article would never have been possible. This project was supported in part by the National Science Foundation (DRL-1418378). We also thank YouJin Kim for her help collecting the data reported in Study 2. The ideas expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation. The research reported in Study 2 was funded by Educational Testing Service (ETS) under a Committee of Examiners and a Test of English as a Foreign Language research grant. ETS does not discount or endorse the methodology, results, implications, or opinions presented by the researchers. TOEFL test material is reprinted by permission of Educational Testing Service, the copyright owner. This research was also partially supported by Grant 644187 EC H2020 for the Realising an Applied Gaming Eco-System (RAGE) project, as well as by the European Funds of Regional Development through the Operation Productivity Program 2014–2020 Priority Axis 1, Action 1.2.1 D-2015, BInnovative Technology Hub Based on Semantic Models and High Performance Computing,^ Contract no. 6/1 09/2016.

Publisher Copyright:
© 2018, Psychonomic Society, Inc.

All Science Journal Classification (ASJC) codes

  • Experimental and Cognitive Psychology
  • Developmental and Educational Psychology
  • Arts and Humanities (miscellaneous)
  • Psychology (miscellaneous)
  • Psychology(all)


Dive into the research topics of 'The Tool for the Automatic Analysis of Cohesion 2.0: Integrating semantic similarity and text overlap'. Together they form a unique fingerprint.

Cite this