191x Filetype PDF File size 0.55 MB Source: aclanthology.org
Detecting Cognitive Distortions from Patient-Therapist Interactions Sagarika Shreevastava Peter W. Foltz Department of Computer Science, and Institute of Cognitive Science Department of Linguistics, University of Colorado, Boulder University of Colorado, Boulder peter.foltz@colorado.edu sagarika.shreevastava@colorado.edu Abstract Oneofthemajoraspects of CBT is to recognize and restructure certain types of negative thinking An important part of Cognitive Behavioral patterns. Some established negative thinking pat- Therapy (CBT) is to recognize and restructure terns are commonly observed in patients dealing certain negative thinking patterns that are also with anxiety or depression. These cognitive distor- known as cognitive distortions. This project tions arise due to errors in reasoning (Beck, 1963). aims to detect these distortions using natural language processing. We compare and con- The aim of educating the patient about these dis- trast different types of linguistic features as tortions during CBT is to equip the patient with well as different classification algorithms and the right tools to detect errors in their own thought explore the limitations of applying these tech- processes. Once the patient is aware of the error niques on a small dataset. We find that pre- in their reasoning, they can start to work on re- trained Sentence-BERT embeddings to train structuring how to perceive the same situations in an SVM classifier yields the best results with a healthier way. an F1-score of 0.79. Lastly, we discuss how this work provides insights into the types of 1.1 Cognitive Distortions linguistic features that are inherent in cognitive distortions. The concept of cognitive distortions was first in- 1 Introduction troduced by Beck (1963). There is no definitive number of types of distortions, and the number Cognitive Behavioral Therapy (CBT) is one of the varies widely in existing literature depending on mostcommonmethodsofpsycho-therapeuticinter- the level of detail in reasoning considered by the au- vention to treat depression or anxiety. Due to the thor. For example, the Cognitive Distortion Scale COVID-19pandemic,mentalhealthissues are on developed by Briere (2000) consists of only five the rise. At the same time, more and more interac- types. In this work, we consider a total of ten types tions are now held virtually. Furthermore, mental of cognitive distortions that are described below: health issues are not limited to the one-hour-per- 1. Emotional Reasoning: Believing “I feel that week window that patients usually get with their way, so it must be true” therapists. This has led to a growth in the demand 2. Overgeneralization: Drawing conclusions for digitally accessible therapy sessions. As mental with limited and often un negative experience. health care is often inaccessible to people, there is 3. Mental Filter: Focusing only on limited neg- a need for innovative ways to make it more widely ative aspects and not the excessive positive available and affordable (Holmlund et al., 2019). ones. Onepossiblesolutionistodevelopanautomated 4. Should Statements: Expecting things or per- system that could serve by performing some ancil- sonal behavior should be a certain way. lary tasks more efficiently. Towards that, Natural 5. All or Nothing: Binary thought pattern. Con- Language Processing (NLP) and Machine learning sidering anything short of perfection as a fail- (ML)algorithms are now gaining widespread pop- ure. ularity and are being implemented in many fields 6. Mind Reading: Concluding that others are where language is used. While we are far from reacting negatively to you, without any basis a chatbot replacing a therapist’s nuanced skillset, in fact. having easy access to an intelligent support system 7. FortuneTelling: Predicting that an event will can help fill in these gaps. always result in the worst possible outcome. 151 Proceedings of the Seventh Workshop on Computational Linguistics and Clinical Psychology, pages 151–158 June 11, 2021. ©2021 Association for Computational Linguistics 8. Magnification: Exaggerating or Catastro- CNN mechanism (Rojas-Barahona et al., 2018). phizing the outcome of certain events or be- Themodelassociated certain thinking errors (cog- havior. nitive distortions) with specific emotions and sit- 9. Personalization: Holding oneself personally uations. Their study uses a dataset consisting of responsible for events beyond one’s control. about 500k posts taken from a platform that is used 10. Labeling: Attaching labels to oneself or oth- for peer-to-peer therapy. The distribution of types ers (ex: “loser”, “perfect”). of distortion is very similar to our results. These These distortions are based on the 10 types of tasks come with annotator agreement issues - their cognitive distortion defined by Burns and Beck inter-annotator agreement rate was 61%. One pos- (1999). Some of these distortions are either com- sible reason for the low agreement rate given by bined into a super-category, or further divided into the authors is the presence of multiple distortions sub-categories, and hence the varying number of in a single data point. types of distortions. For example, mind reading Asthereisalackofpubliclyavailable structured and fortune telling are sometimes grouped and con- data that was curated specifically for the detection sidered as a single distortion called Jumping to of cognitive distortions, datasets from other do- conclusions. mains, such as social media data or personal blogs are used instead. One such study was conducted on 1.2 Problemstatement Tumblr data collected by using selected keywords The first goal of this research project is to detect (Simmsetal., 2017). By using the LIWC features cognitive distortions from natural language text. (Section 3.3) to train a Decision Tree model to de- This can be done by implementing and comparing tect the presence of cognitive distortions, they were different methodologies for binary classification able to lower the false positive rate to 24% and the of annotated data, obtained from mental health pa- false-negative rate to 30.4%. tients, into Distorted and Non-Distorted thinking. Asimilar study was conducted by Shickel et al. Thesecondgoalistoanalyze the linguistic impli- (2020) on a crowdsourced dataset and some mental cations of classification tasks of different types of health therapy logs. Their approach was to divide distortions. the task into two sub-tasks - first to detect if an In particular, this research aims to answer the entry has a distortion (F1-score of 0.88) and sec- following questions: ond to classify the type of distortion (F1-score of 0.68). For this study, 15 different classes are con- 1. Which type of NLP features is more suitable sidered for the types of distortion. For both of the for cognitive distortion detection: semantic tasks - logistic regression outperformed more com- or syntactic? Simply put, to compare what is plexdeeplearningalgorithmssuchasBi-LSTMsor said and how is it said in the context of this GRUs. Onapplying this model to smaller counsel- task. And, how important is word order in this ing datasets, however, the F1-score dropped down context? to 0.45. 2. HowwelldotheseNLPfeaturesandMLclas- sification algorithms perform this task with a 2 MethodsandDataset limited-sized dataset? OneofthemostcommonroadblocksinusingArtifi- 1.3 Related work cial Intelligence for Clinical Psychology is the lack Previous work done in this field includes the Stan- of available data. Most of the datasets that have ford Woebot, which is a therapy chatbot (Fitz- patients interacting with licensed professionals are patrick et al., 2017). The dialogue decision in Woe- confidential and therefore not publicly available. bot is primarily implemented using decision trees. Here, we use a dataset, named Therapist Q&A, It functions on concepts based on CBT including obtained from the crowd-sourced data science repository, Kaggle1. The dataset follows a Ques- the concept of cognitive distortions. However, it tion and Answer format and the identity of each only outlines several types of distortions for the patient is anonymized, to maintain their privacy. user and leaves the user to identify which one ap- Each patient entry usually consists of a brief plies to their case. description of their circumstance, symptoms, and Another study established a mental health ontol- ogybasedontheprinciples of CBT using a gated- 1https://www.kaggle.com/arnmaud/therapist-qa 152 their thoughts. Each of these concerns is then an- wereresolved by enabling the annotators to discuss swered by a licensed therapist addressing their is- their reasoning and come to a consensus. The types sues followed by a suggestion. Since the patient of distortion were found to be evenly distributed entry is not just a vague request and it provides across the 10 classes of distortions mentioned ear- some insight into the situation as well as their re- lier (figure 1). The annotated dataset will be made action to it, it can be used to detect if they were available to the public to encourage similar work engaging in any negative thinking patterns. in this domain. 2.1 Annotation of dataset Fortheannotationtask, wehavejustfocusedonthe patient’s input. One of the key factors in detecting cognitive distortions is context. While the data does give some insight into the situation a patient is in, it should be noted that the description itself is given by the patient themselves. As a result, their version of the situation itself may be distorted. In this task, we focus on detecting cues in lan- guage that would indicate any type of distortion and there was no way to verify the veracity of their statements. Thus each entry is perceived as a viable candidate for cognitive distortion and given one out of 11 labels (’No distortion’ and 10 different types of distortions as listed in section 1.1). It is noted that an entry can have multiple types of distortions. Howeverforthisproject, the annotators were asked to determine a dominant distortion for each of the entries, and an optional secondary distortion if it is too hard to determine a dominant distortion. The decision between dominant or secondary distortion wasmadebasedontheseverityofeachdistortion. Figure 1: Distribution of the types of Cognitive Distor- Since the project aims to detect the presence of tions in the Kaggle dataset these distortions, the severity of distortions was not marked by any quantitative value. They were also asked to flag the sentences that led them to 2.2 Experiments conclude that the reasoning was distorted. Theannotatorscoded3000samplesoutofwhich, Duetothelimitedsizeoftheannotateddataset,sev- 39.2% were marked as not distorted, while the eral machine learning algorithms such as complex remaining were identified to have some type of deep learning methods were eliminated from the distortion. The highly subjective nature of this experiments. Finally, the four types of features (Ta- task makes it very hard to achieve a high agree- ble 1) were tested using the following classification ment rate between the annotators. On comparing algorithms: the dominant distortion of about 730 data points 1. Logistic regression encoded by two annotators, the Inter-Annotator 2. Support vector machines Agreement(IAA)forspecifictypeofdistortionwas 3. Decision trees 33.7%. Considering the secondary distortion labels 4. K- Nearest Neighbors (k = 15) as well and computing a more relaxed agreement 5. Multi-Layer Perceptron (with a single hidden rate bumped the agreement to ∼ 40%. On the other layer having 100 units) hand, the agreement rate increased to 61% when wefocus on distorted versus non-distorted think- All of these classification algorithms were imple- ing only. The IAA metric used here is the Joint mented with the default hyper-parameter settings Probability of Agreement. These disagreements using the python package commonly used for ML 153 algorithms, scikit-learn 2. anchor sentence and a positive sample while maxi- mizing the distance between the anchor sentence 3 Feature Selection and a negative sample. Toaddressthedifferentaspectsoflanguage,feature selection was divided into two categories - Seman- 3.3 Linguistic Inquiry and Word Count tic and Syntactic features. Two different training (LIWC)Features approaches were implemented for each of these categories. A brief description of each training Thelinguistic inquiry and word count (LIWC) is a methodis given below. tool used to analyze textual data (Pennebaker et al., 2001). The LIWC program generates about 80 fea- Bag-of-words Sequential tures based on the words used in the text. While we approach approach categorize the LIWC features as syntactic in table Semantic SIF S-BERT 1, these features reflect the percentage of words in Syntactic LIWC POS different categories. A lot of these features are syn- tactic, such as the count of pronouns, proper nouns, Table 1: Types of linguistic features. Note that LIWC etc. Other categories are psychological, linguis- features are not limited to the Syntactic category. tic, cognitive, or other (Tausczik and Pennebaker, 2010). 3.1 SmoothInverseFrequency(SIF) LIWCfeatures are widely used for conducting There are multiple ways of encoding Sentence em- linguistic analysis in almost any domain. Specific beddings where the word order does not matter. to mental illness, these features were used to detect Oneofthemostcommonmethodsissimplyusing the linguistic indicators of Schizophrenia (Zomick the mean value of all the word embeddings. et al., 2019), Depression (Jones et al., 2020) and Another commonapproach is to treat these sen- even Cognitive Distortions (Simms et al., 2017). tences as documents and use TF-IDF (Term Fre- quency - Inverse Document Frequency) vectors. 3.4 Parts of Speech (POS) tag embeddings However, the issue with treating sentences as docu- mentsisthatsentencesusuallydonothavemultiple Themainmotivation behind using Parts of speech words repeated. tags was to eliminate any specific Noun or Verb Toaddress this, smooth inverse frequency (SIF) from heavily dominating the classification process. can be used instead. The SIF method for sen- Twoentries having the same context can have dif- tence embeddings improves the performance for ferent distortions. Using POS tags as features have textual similarity tasks, beating sequential deep provedtobeusefulforsimilarapplications, such as learning models such as RNNs or LSTM (Arora detecting depression from text (Morales and Levi- et al., 2016). tan, 2016). Here, the sentence embeddings are generated Syntactic features generally do not consider using the SIF method on pre-trained GloVe embed- word order as an important aspect. To maintain dings (Pennington et al., 2014) for each word in the impact of word order each word is replaced the sentence. with its Part-Of-Speech (POS) tag 3 using the pre- trained Spacy language model 4. 3.2 Sentence-BERT(Bidirectional Encoder These POS tags are then converted to embed- Representations from Transformers) dings by similarly training them as word embed- For the sequential semantic representation of these dings using Skip-gram word2vec model (Mikolov entries, a pre-trained sentence-BERT model was et al., 2013). This is done to encode POS tag- used(ReimersandGurevych,2019). Toensurethat order in the embeddings. Once each tag has an in this vector space, semantically similar sentences embedding, these vectors are padded with zeros for are closer, the authors have used Triplet Objective normalization. Function as the loss function. This triplet objec- tive function minimizes the distance between the 3 https://universaldependencies.org/docs/u/pos 2https://scikit-learn.org 4https://spacy.io/usage/linguistic-featurespos-tagging 154
no reviews yet
Please Login to review.