359x Filetype PDF File size 0.55 MB Source: aclanthology.org
Detecting Cognitive Distortions from Patient-Therapist Interactions
Sagarika Shreevastava Peter W. Foltz
Department of Computer Science, and Institute of Cognitive Science
Department of Linguistics, University of Colorado, Boulder
University of Colorado, Boulder peter.foltz@colorado.edu
sagarika.shreevastava@colorado.edu
Abstract Oneofthemajoraspects of CBT is to recognize
and restructure certain types of negative thinking
An important part of Cognitive Behavioral patterns. Some established negative thinking pat-
Therapy (CBT) is to recognize and restructure terns are commonly observed in patients dealing
certain negative thinking patterns that are also with anxiety or depression. These cognitive distor-
known as cognitive distortions. This project tions arise due to errors in reasoning (Beck, 1963).
aims to detect these distortions using natural
language processing. We compare and con- The aim of educating the patient about these dis-
trast different types of linguistic features as tortions during CBT is to equip the patient with
well as different classification algorithms and the right tools to detect errors in their own thought
explore the limitations of applying these tech- processes. Once the patient is aware of the error
niques on a small dataset. We find that pre- in their reasoning, they can start to work on re-
trained Sentence-BERT embeddings to train structuring how to perceive the same situations in
an SVM classifier yields the best results with a healthier way.
an F1-score of 0.79. Lastly, we discuss how
this work provides insights into the types of 1.1 Cognitive Distortions
linguistic features that are inherent in cognitive
distortions. The concept of cognitive distortions was first in-
1 Introduction troduced by Beck (1963). There is no definitive
number of types of distortions, and the number
Cognitive Behavioral Therapy (CBT) is one of the varies widely in existing literature depending on
mostcommonmethodsofpsycho-therapeuticinter- the level of detail in reasoning considered by the au-
vention to treat depression or anxiety. Due to the thor. For example, the Cognitive Distortion Scale
COVID-19pandemic,mentalhealthissues are on developed by Briere (2000) consists of only five
the rise. At the same time, more and more interac- types. In this work, we consider a total of ten types
tions are now held virtually. Furthermore, mental of cognitive distortions that are described below:
health issues are not limited to the one-hour-per- 1. Emotional Reasoning: Believing “I feel that
week window that patients usually get with their way, so it must be true”
therapists. This has led to a growth in the demand 2. Overgeneralization: Drawing conclusions
for digitally accessible therapy sessions. As mental with limited and often un negative experience.
health care is often inaccessible to people, there is 3. Mental Filter: Focusing only on limited neg-
a need for innovative ways to make it more widely ative aspects and not the excessive positive
available and affordable (Holmlund et al., 2019). ones.
Onepossiblesolutionistodevelopanautomated 4. Should Statements: Expecting things or per-
system that could serve by performing some ancil- sonal behavior should be a certain way.
lary tasks more efficiently. Towards that, Natural 5. All or Nothing: Binary thought pattern. Con-
Language Processing (NLP) and Machine learning sidering anything short of perfection as a fail-
(ML)algorithms are now gaining widespread pop- ure.
ularity and are being implemented in many fields 6. Mind Reading: Concluding that others are
where language is used. While we are far from reacting negatively to you, without any basis
a chatbot replacing a therapist’s nuanced skillset, in fact.
having easy access to an intelligent support system 7. FortuneTelling: Predicting that an event will
can help fill in these gaps. always result in the worst possible outcome.
151
Proceedings of the Seventh Workshop on Computational Linguistics and Clinical Psychology, pages 151–158
June 11, 2021. ©2021 Association for Computational Linguistics
8. Magnification: Exaggerating or Catastro- CNN mechanism (Rojas-Barahona et al., 2018).
phizing the outcome of certain events or be- Themodelassociated certain thinking errors (cog-
havior. nitive distortions) with specific emotions and sit-
9. Personalization: Holding oneself personally uations. Their study uses a dataset consisting of
responsible for events beyond one’s control. about 500k posts taken from a platform that is used
10. Labeling: Attaching labels to oneself or oth- for peer-to-peer therapy. The distribution of types
ers (ex: “loser”, “perfect”). of distortion is very similar to our results. These
These distortions are based on the 10 types of tasks come with annotator agreement issues - their
cognitive distortion defined by Burns and Beck inter-annotator agreement rate was 61%. One pos-
(1999). Some of these distortions are either com- sible reason for the low agreement rate given by
bined into a super-category, or further divided into the authors is the presence of multiple distortions
sub-categories, and hence the varying number of in a single data point.
types of distortions. For example, mind reading Asthereisalackofpubliclyavailable structured
and fortune telling are sometimes grouped and con- data that was curated specifically for the detection
sidered as a single distortion called Jumping to of cognitive distortions, datasets from other do-
conclusions. mains, such as social media data or personal blogs
are used instead. One such study was conducted on
1.2 Problemstatement Tumblr data collected by using selected keywords
The first goal of this research project is to detect (Simmsetal., 2017). By using the LIWC features
cognitive distortions from natural language text. (Section 3.3) to train a Decision Tree model to de-
This can be done by implementing and comparing tect the presence of cognitive distortions, they were
different methodologies for binary classification able to lower the false positive rate to 24% and the
of annotated data, obtained from mental health pa- false-negative rate to 30.4%.
tients, into Distorted and Non-Distorted thinking. Asimilar study was conducted by Shickel et al.
Thesecondgoalistoanalyze the linguistic impli- (2020) on a crowdsourced dataset and some mental
cations of classification tasks of different types of health therapy logs. Their approach was to divide
distortions. the task into two sub-tasks - first to detect if an
In particular, this research aims to answer the entry has a distortion (F1-score of 0.88) and sec-
following questions: ond to classify the type of distortion (F1-score of
0.68). For this study, 15 different classes are con-
1. Which type of NLP features is more suitable sidered for the types of distortion. For both of the
for cognitive distortion detection: semantic tasks - logistic regression outperformed more com-
or syntactic? Simply put, to compare what is plexdeeplearningalgorithmssuchasBi-LSTMsor
said and how is it said in the context of this GRUs. Onapplying this model to smaller counsel-
task. And, how important is word order in this ing datasets, however, the F1-score dropped down
context? to 0.45.
2. HowwelldotheseNLPfeaturesandMLclas-
sification algorithms perform this task with a 2 MethodsandDataset
limited-sized dataset? OneofthemostcommonroadblocksinusingArtifi-
1.3 Related work cial Intelligence for Clinical Psychology is the lack
Previous work done in this field includes the Stan- of available data. Most of the datasets that have
ford Woebot, which is a therapy chatbot (Fitz- patients interacting with licensed professionals are
patrick et al., 2017). The dialogue decision in Woe- confidential and therefore not publicly available.
bot is primarily implemented using decision trees. Here, we use a dataset, named Therapist Q&A,
It functions on concepts based on CBT including obtained from the crowd-sourced data science
repository, Kaggle1. The dataset follows a Ques-
the concept of cognitive distortions. However, it tion and Answer format and the identity of each
only outlines several types of distortions for the patient is anonymized, to maintain their privacy.
user and leaves the user to identify which one ap- Each patient entry usually consists of a brief
plies to their case. description of their circumstance, symptoms, and
Another study established a mental health ontol-
ogybasedontheprinciples of CBT using a gated- 1https://www.kaggle.com/arnmaud/therapist-qa
152
their thoughts. Each of these concerns is then an- wereresolved by enabling the annotators to discuss
swered by a licensed therapist addressing their is- their reasoning and come to a consensus. The types
sues followed by a suggestion. Since the patient of distortion were found to be evenly distributed
entry is not just a vague request and it provides across the 10 classes of distortions mentioned ear-
some insight into the situation as well as their re- lier (figure 1). The annotated dataset will be made
action to it, it can be used to detect if they were available to the public to encourage similar work
engaging in any negative thinking patterns. in this domain.
2.1 Annotation of dataset
Fortheannotationtask, wehavejustfocusedonthe
patient’s input. One of the key factors in detecting
cognitive distortions is context. While the data
does give some insight into the situation a patient
is in, it should be noted that the description itself is
given by the patient themselves. As a result, their
version of the situation itself may be distorted.
In this task, we focus on detecting cues in lan-
guage that would indicate any type of distortion
and there was no way to verify the veracity of their
statements. Thus each entry is perceived as a viable
candidate for cognitive distortion and given one out
of 11 labels (’No distortion’ and 10 different types
of distortions as listed in section 1.1). It is noted
that an entry can have multiple types of distortions.
Howeverforthisproject, the annotators were asked
to determine a dominant distortion for each of the
entries, and an optional secondary distortion if it is
too hard to determine a dominant distortion. The
decision between dominant or secondary distortion
wasmadebasedontheseverityofeachdistortion. Figure 1: Distribution of the types of Cognitive Distor-
Since the project aims to detect the presence of tions in the Kaggle dataset
these distortions, the severity of distortions was
not marked by any quantitative value. They were
also asked to flag the sentences that led them to 2.2 Experiments
conclude that the reasoning was distorted.
Theannotatorscoded3000samplesoutofwhich, Duetothelimitedsizeoftheannotateddataset,sev-
39.2% were marked as not distorted, while the eral machine learning algorithms such as complex
remaining were identified to have some type of deep learning methods were eliminated from the
distortion. The highly subjective nature of this experiments. Finally, the four types of features (Ta-
task makes it very hard to achieve a high agree- ble 1) were tested using the following classification
ment rate between the annotators. On comparing algorithms:
the dominant distortion of about 730 data points 1. Logistic regression
encoded by two annotators, the Inter-Annotator 2. Support vector machines
Agreement(IAA)forspecifictypeofdistortionwas 3. Decision trees
33.7%. Considering the secondary distortion labels 4. K- Nearest Neighbors (k = 15)
as well and computing a more relaxed agreement 5. Multi-Layer Perceptron (with a single hidden
rate bumped the agreement to ∼ 40%. On the other layer having 100 units)
hand, the agreement rate increased to 61% when
wefocus on distorted versus non-distorted think- All of these classification algorithms were imple-
ing only. The IAA metric used here is the Joint mented with the default hyper-parameter settings
Probability of Agreement. These disagreements using the python package commonly used for ML
153
algorithms, scikit-learn 2. anchor sentence and a positive sample while maxi-
mizing the distance between the anchor sentence
3 Feature Selection and a negative sample.
Toaddressthedifferentaspectsoflanguage,feature
selection was divided into two categories - Seman- 3.3 Linguistic Inquiry and Word Count
tic and Syntactic features. Two different training (LIWC)Features
approaches were implemented for each of these
categories. A brief description of each training Thelinguistic inquiry and word count (LIWC) is a
methodis given below. tool used to analyze textual data (Pennebaker et al.,
2001). The LIWC program generates about 80 fea-
Bag-of-words Sequential tures based on the words used in the text. While we
approach approach categorize the LIWC features as syntactic in table
Semantic SIF S-BERT 1, these features reflect the percentage of words in
Syntactic LIWC POS different categories. A lot of these features are syn-
tactic, such as the count of pronouns, proper nouns,
Table 1: Types of linguistic features. Note that LIWC etc. Other categories are psychological, linguis-
features are not limited to the Syntactic category. tic, cognitive, or other (Tausczik and Pennebaker,
2010).
3.1 SmoothInverseFrequency(SIF) LIWCfeatures are widely used for conducting
There are multiple ways of encoding Sentence em- linguistic analysis in almost any domain. Specific
beddings where the word order does not matter. to mental illness, these features were used to detect
Oneofthemostcommonmethodsissimplyusing the linguistic indicators of Schizophrenia (Zomick
the mean value of all the word embeddings. et al., 2019), Depression (Jones et al., 2020) and
Another commonapproach is to treat these sen- even Cognitive Distortions (Simms et al., 2017).
tences as documents and use TF-IDF (Term Fre-
quency - Inverse Document Frequency) vectors. 3.4 Parts of Speech (POS) tag embeddings
However, the issue with treating sentences as docu-
mentsisthatsentencesusuallydonothavemultiple Themainmotivation behind using Parts of speech
words repeated. tags was to eliminate any specific Noun or Verb
Toaddress this, smooth inverse frequency (SIF) from heavily dominating the classification process.
can be used instead. The SIF method for sen- Twoentries having the same context can have dif-
tence embeddings improves the performance for ferent distortions. Using POS tags as features have
textual similarity tasks, beating sequential deep provedtobeusefulforsimilarapplications, such as
learning models such as RNNs or LSTM (Arora detecting depression from text (Morales and Levi-
et al., 2016). tan, 2016).
Here, the sentence embeddings are generated Syntactic features generally do not consider
using the SIF method on pre-trained GloVe embed- word order as an important aspect. To maintain
dings (Pennington et al., 2014) for each word in the impact of word order each word is replaced
the sentence. with its Part-Of-Speech (POS) tag 3 using the pre-
trained Spacy language model 4.
3.2 Sentence-BERT(Bidirectional Encoder These POS tags are then converted to embed-
Representations from Transformers) dings by similarly training them as word embed-
For the sequential semantic representation of these dings using Skip-gram word2vec model (Mikolov
entries, a pre-trained sentence-BERT model was et al., 2013). This is done to encode POS tag-
used(ReimersandGurevych,2019). Toensurethat order in the embeddings. Once each tag has an
in this vector space, semantically similar sentences embedding, these vectors are padded with zeros for
are closer, the authors have used Triplet Objective normalization.
Function as the loss function. This triplet objec-
tive function minimizes the distance between the 3
https://universaldependencies.org/docs/u/pos
2https://scikit-learn.org 4https://spacy.io/usage/linguistic-featurespos-tagging
154
no reviews yet
Please Login to review.