New Publication in Communication Methods and Measures

30.01.2024

Congratulations to Petro Tolochko, Paul Balluff, Jana Bernhard, Sebastian Galyga, Noëlle S. Lebernegg and Hajo Boomgaarden on publishing a new paper in Communication Methods and Measures!

The journal Communication Methods and Measures has recently published a paper titled "What’s in a name? The effect of named entities on topic modelling interpretability," authored by Petro Tolochko, Paul Balluff, Jana Bernhard, Sebastian Galyga, Noëlle S. Lebernegg, and Hajo G. Boomgaarden. This research delves into the impact of named entities on the interpretability of topic models.

While Topic Modelling (TM) has gained prominence for its dual functionality in generating clusters and classifying texts, the pre-processing stage, including Named Entity Recognition (NE), is often overlooked. The paper highlights that decisions regarding the retention or removal of named entities during pre-processing can significantly influence the outcomes and interpretations of topic models. Drawing on both model statistics and human validation, the authors explore the consequences of removing or retaining named entities.

The findings reveal notable differences in the structural characteristics and human perception of topic models trained on corpora with and without named entities. Not only do topic models exhibit distinct features when NEs are removed, but they are also perceived differently by human coders. The paper concludes by offering recommendations for the pre-processing of named entities in the context of topic modeling applications.

For those interested in exploring the full details of this research, the open-access paper can be accessed at: https://www.tandfonline.com/doi/full/10.1080/19312458.2024.2302120

Cite article:

Tolochko, P., Balluff, P., Bernhard, J., Galyga, S., Lebernegg, N. S., & Boomgaarden, H. G. (2024). What’s in a name? The effect of named entities on topic modelling interpretability. Communication Methods and Measures. DOI: 10.1080/19312458.2024.2302120