ReaderBench – Semantic Models and Topic Mining

The ReaderBench framework introduced a generalized model for assessment based on the cohesion graph, applicable to both plain essay- or story-like texts and CSCL conversations (in particular chats, forum discussion threads or blog communities). Text cohesion, viewed as lexical, syntactical and semantic relationships that link together textual units, is defined within our implemented model in terms of semantic similarity measured through semantic distances in: lexicalized ontologies (e.g. WordNet), Latent Semantic Analysis (LSA), and Latent Dirichlet Allocation (LDA). Additionally, specific Natural Language Processing techniques are applied to reduce noise and to improve system’s accuracy: tokenizing, splitting, part of speech tagging, parsing, stop words elimination, dictionary-only words selection, lemmatization, named entity recognition and co-reference resolution. Moreover, we have developed a topic mining module that integrates the previously defined semantic models (available for English, French, and partially Spanish, Romanian, Italian, Dutch and Latin).


Current status:

  • Training dedicated semantic models for different languages supported by the ReaderBench framework
  • Consideration of alternative semantic deep learning models (i.e., word2vec) to represent words in vector spaces