Towards a temporospatial framework for measurements of disorganization in speech using semantic vectors

Terje B. Holmlund, Department of Clinical Medicine, University of Tromsø - the Arctic University of Norway, Tromsø, Norway. Electronic address: terje.holmlund@uit.no.
Chelsea Chandler, Institute of Cognitive Science, University of Colorado Boulder, United States of America.
Peter W. Foltz, Institute of Cognitive Science, University of Colorado Boulder, United States of America.
Catherine Diaz-Asper, Department of Psychology, Marymount University, United States of America.
Alex S. Cohen, Department of Psychology, Louisiana State University, United States of America; Center for Computation and Technology, Louisiana State University, United States of America.
Zachary Rodriguez, Department of Psychology, Louisiana State University, United States of America; Center for Computation and Technology, Louisiana State University, United States of America.
Brita Elvevåg, Department of Clinical Medicine, University of Tromsø - the Arctic University of Norway, Tromsø, Norway; Norwegian Center for eHealth Research, University Hospital of North Norway, Tromsø, Norway.

Abstract

Incoherent speech in schizophrenia has long been described as the mind making "leaps" of large distances between thoughts and ideas. Such a view seems intuitive, and for almost two decades, attempts to operationalize these conceptual "leaps" in spoken word meanings have used language-based embedding spaces. An embedding space represents meaning of words as numerical vectors where a greater proximity between word vectors represents more shared meaning. However, there are limitations with word vector-based operationalizations of coherence which can limit their appeal and utility in clinical practice. First, the use of esoteric word embeddings can be conceptually hard to grasp, and this is complicated by several different operationalizations of incoherent speech. This problem can be overcome by a better visualization of methods. Second, temporal information from the act of speaking has been largely neglected since models have been built using written text, yet speech is spoken in real time. This issue can be resolved by leveraging time stamped transcripts of speech. Third, contextual information - namely the situation of where something is spoken - has often only been inferred and never explicitly modeled. Addressing this situational issue opens up new possibilities for models with increased temporal resolution and contextual relevance. In this paper, direct visualizations of semantic distances are used to enable the inspection of examples of incoherent speech. Some common operationalizations of incoherence are illustrated, and suggestions are made for how temporal and spatial contextual information can be integrated in future implementations of measures of incoherence.