137x Filetype PDF File size 0.59 MB Source: www.scitepress.org
Language-oriented Sentiment Analysis based on the Grammar Structure and Improved Self-attention Network 1,2,* a 3,4,* 4b 5 6c Hien D. Nguyen , Tai Huynh , Suong N. Hoang , Vuong T. Pham and Ivan Zelinka 1Faculty of Computer Science, University of Information Technology, Ho Chi Minh City, Vietnam 2Vietnam National University, Ho Chi Minh City, Vietnam 3Ton Duc Thang University, Ho Chi Minh City, Vietnam 4Kyanon Digital, Vietnam 5Faculty of Information Technology, Sai Gon University, Ho Chi Minh City, Vietnam 6Technical University of Ostrava (VŠB-TU), Czech Republic ivan.zelinka@vsb.cz * Equal contribution by Hien D. Nguyen and Tai Huynh Keywords: Sentiment Analysis, Sentiment Classification, Vietnamese, Self-attention, Transformer, Natural Language Processing. Abstract: In the businesses, the sentiment analysis makes the brands understanding the sentiment of their customers. They can know what people are saying, how they’re saying it, and what they mean. There are many methods for sentiment analysis; however, they are not effective when were applied in Vietnamese language. In this paper, a method for Vietnamese sentiment analysis is studied based on the combining between the structure of Vietnamese language and the technique of natural language processing, self-attention with the Transformer architecture. Based on the analysing of the structure of a sentence, the transformer is used to process the word positions to determine the meaning of that sentence. The experimental results for Vietnamese sentiment analysis of our method is more effectively than others. Its accuracy and F-measure are more than 91% and its results are suitable to apply in practice for business intelligence. 1 INTRODUCTION influencer on the social network for the influencer marketing (Huynh et al, 2019). Sentiment analysis (SA) is one of the subfields of Vietnamese is a language isolate (Nguyen et al., Computational Linguistics and Natural Language 2006). The meaning of a sentence belongs to the way Processing (NLP) (Gamal et al., 2019). In the for organizing of its predicates (Clark, 1974). In other businesses intelligence, the sentiment analysis makes words, the information about word positions the brands understanding the sentiment of their contribute the sentence meaning and grammatical customers (Rokade and Kumari, 2019). They can meaning. The analysing on the Vietnamese sentence know what people are saying, how they’re saying it, has to combine the studying of the grammar structure. and what they mean. The sentiment of customer Some machine learning-based approaches have sentiment can be found in tweets, comments, reviews, been studied to analysis the sentiment of a or other places where people mention the brands. Vietnamese sentence. In the current era, social network is a popular CountVectorizer (Irfan et al., 2015) and Term platform for communication and interaction (Beigi, Frequency–Inverse Document Frequency (Tf-idf) 2016). Many people found innovative information on (Aggarwal, 2011) are used for word representations. social network and due to that social network is the However, they cannot analysis the positions of words important data source. SA is also used to detect the in a sentence, so their results are not exactly. Support a https://orcid.org/0000-0002-8527-0602 b https://orcid.org/0000-0002-3354-013X c https://orcid.org/0000-0002-3858-7340 339 Nguyen, H., Huynh, T., Hoang, S., Pham, V. and Zelinka, I. Language-oriented Sentiment Analysis based on the Grammar Structure and Improved Self-attention Network. DOI: 10.5220/0009358803390346 In Proceedings of the 15th International Conference on Evaluation of Novel Approaches to Software Engineering (ENASE 2020), pages 339-346 ISBN: 978-989-758-421-3 c Copyright 2020 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved ENASE2020-15thInternational Conference on Evaluation of Novel Approaches to Software Engineering Vector Machine (Joachims, 1998) and Naïve Bayes analysed to determine whether they are positive, (Irfan et al., 2015) are used as classifiers. However, negative or neutral. those methods did not mention to the structure of a The experimental results show that our method sentence, so their results are not suitable in the being more effective than other in Vietnamese practice. sentiment analysis. Its accuracy and F-measure are In (Krouska et al., 2017, Troussas et al., 2016), more than 91% and its results are suitable to apply in authors present five well-known learning-based practice for business intelligence. classifiers (Naïve Bayes, Support Vector Machine, k- The next section presents some techniques of the Nearest Neighbor, Logistic Regression and C4.5) and Transformer. Section 3 presents the method for a lexicon-based approach (SentiStrength) to analysis Vietnamese sentiment analysis. That method uses the the sentiment on Twitter. However, it only studies on improved architecture of self-attention with English. transformer on the structure of the sentences in Besides, some types of recurrent neural networks Vietnamese to determine their meaning. Section 4 (RNNs), such as long short-term memory (LSTM) described the experimental results. The last section (Hochreiter, 1997, Cheng et al., 2016), Bi-Directional concludes the main results in this paper. LSTM (biLSTM) (Schuster and Paliwal, 1997) or gated recurrent unit (GRU) (Chung et al., 2014), are very complex and take a long time to solve the 2 SELF-ATTENTION NETWORK problem about sentiment analysis on Vietnamese. The sentiment analysis for Vietnamese was Scaled Dot-Product Attention: Let si - 1 be a query researched in (Nguyen et al., 2014). This study vector q, and h is duplicated with one is key vector k investigated the task regarding both Support Vector j j and the other is value vector v (in Machine (SVM) model and linguistics feature aspects j which is an annotated corpus for sentiment current NLP work, the key and value vector are frequently the same, there for h can be considered as classification extracted from hotel reviews in j k or v). Vietnamese. However, this method is not designed j j based on the grammar structure, so some sentences n (1) ca v jj cannot be determined accurately. j1 Self-attention has been used successfully in a T exp(eq) .k jj where ae, and (q,k) (2) jjj variety of tasks including reading comprehension, n d abstractive summarization, textual entailment and exp(e ) model k learning task-independent sentence representations k1 (Zhou et al., 2018). The Transformer (Vaswani et al., 2017) is the transduction model based on self- (1 j n) attention to compute representations of its input and dmodel is the dimension of input vectors or k vector output without using sequence aligned RNNs or (q, k, v have the same dimension as input embedding convolution. In (Hoang et al., 2019), authors study vector) sentiment analysis of product reviews in Vietnamese Self-attention is a mechanism to apply Scaled by using Self-attention neural networks. However, Dot-Product Attention to every token of the sentence that study does not mention to the structure of for all others. Vietnamese sentence in the analysing, so its results For every token in sentence, three vectors Query, are not exactly and suitable the practical Key, Value are created by using a linear feed-forward requirements. layer as a transformation, then the attention In this paper, the method for Vietnamese mechanism is applied to get the context matrix. sentiment analysis is proposed. This method is used However, this process is very slow, so we consider to determine the sentiment of a sentiment sentence three matrices Q, K, V: including positive, negative or neutral. The structures Q is a matrix containing all the query vectors, Q = [q, q ,..., qn] with q is a query vector. of a Vietnamese sentence are studied. Based on those 1 2 i structures, the meaning of this sentence is analysed by K is a matrix containing all the key vectors, K = [k , k , ..., kn] with k is a key vector. using the self-attention neural network architecture 1 2 i Transformer. Besides, the layer of Squeeze and V is a matrix containing all the key vectors, V = [v , v , ..., vn] with v is a value vector. Excitation (Hu et al., 2018) is also used to recalibrate 1 2 i features in the process. The sentences will be Thus, we have: 340 Language-oriented Sentiment Analysis based on the Grammar Structure and Improved Self-attention Network T indicates the speaker’s desire to influence future QK. Attention(,Q K,V)softmax .V (3) events. In the problem about sentiment analysis, we dmodel only need to determine whether a sentence is positive, Multi-head Attention performs the attention h times negative or neutral; thus, in the scope of this paper, with (Q, K, V) matrices of the dimension d /h. Each we only mention to the declarative sentence type. model head is a time for applying Attention. For each head, The structure of a single declarative sentence in the (Q, K, V) matrices are uniquely projected with the Vietnamese is shown in Fig.1: dimensions d /h. Self-attention mechanism is model performed to yield an output of the same dimension d /h. After all, the outputs of h heads are model concatenated, and applied a linear projection layer once again. The formula for this process is as follows: MultiHead(Q,K,V)Concat head ,head ,...,head .WO 12h OOO where head Q.W ,K.W ,V.W (4) i Figure 1: Structure of a single declarative sentence in 3 METHOD FOR VIETNAMESES Vietnamese. SENTIMENT ANALYSIS Definition 1: Kinds of the structure of a positive sentence In this section, the method for analysing the sentiment A single positive declarative sentence in of a Vietnamese sentence is proposed. The sentences Vietnamese has the foundation structure: will be analysed to determine whether they are= positive, negative or neutral. It is classified as Table 1. Firstly, the structures of a Vietnamese sentence Table 1: Kinds of the structure of a positive sentence. are studied. Because the scope of this study is the evaluation comments for a product on the social Kinds Variants network, there are two kinds of declarative sentence P is
: “là”were mentioned: positive and negative sentence. Secondly, based on those structures, the meaning = of this sentence is analysed by using the self-attention neural network architecture Transformer. Because the meaning of a Vietnamese sentence belongs to the P is :
no reviews yet
Please Login to review.