jagomart
digital resources
picture1_Vietnamese Grammar Pdf 105222 | 93588 Item Download 2022-09-24 06-35-03


 137x       Filetype PDF       File size 0.59 MB       Source: www.scitepress.org


File: Vietnamese Grammar Pdf 105222 | 93588 Item Download 2022-09-24 06-35-03
language oriented sentiment analysis based on the grammar structure and improved self attention network 1 2 a 3 4 4b 5 6c hien d nguyen tai huynh suong n hoang ...

icon picture PDF Filetype PDF | Posted on 24 Sep 2022 | 3 years ago
Partial capture of text on file.
                                   Language-oriented Sentiment Analysis based on the Grammar 
                                                    Structure and Improved Self-attention Network 
                                                       1,2,*    a                  3,4,*                             4b                              5                            6c
                              Hien D. Nguyen                     , Tai Huynh            , Suong N. Hoang                  , Vuong T. Pham  and Ivan Zelinka                              
                                            1Faculty of Computer Science, University of Information Technology, Ho Chi Minh City, Vietnam 
                                                                     2Vietnam National University, Ho Chi Minh City, Vietnam 
                                                                      3Ton Duc Thang University, Ho Chi Minh City, Vietnam 
                                                                                          4Kyanon Digital, Vietnam 
                                                    5Faculty of Information Technology, Sai Gon University, Ho Chi Minh City, Vietnam 
                                                                    6Technical University of Ostrava (VŠB-TU), Czech Republic 
                                                                                              ivan.zelinka@vsb.cz 
                                                                      * Equal contribution by Hien D. Nguyen and Tai Huynh 
                           Keywords:           Sentiment Analysis, Sentiment Classification, Vietnamese, Self-attention, Transformer, Natural Language 
                                               Processing. 
                           Abstract:           In the businesses, the sentiment analysis makes the brands understanding the sentiment of their customers. 
                                               They can know what people are saying, how they’re saying it, and what they mean. There are many methods 
                                               for sentiment analysis; however, they are not effective when were applied in Vietnamese language. In this 
                                               paper, a method for Vietnamese sentiment analysis is studied based on the combining between the structure 
                                               of Vietnamese language and the technique of natural language processing, self-attention with the Transformer 
                                               architecture. Based on the analysing of the structure of a sentence, the transformer is used to process the word 
                                               positions to determine the meaning of that sentence. The experimental results for Vietnamese sentiment 
                                               analysis of our method is more effectively than others.  Its accuracy and F-measure are more than 91% and 
                                               its results are suitable to apply in practice for business intelligence. 
                           1 INTRODUCTION                                                                      influencer on the social network for the influencer 
                                                                                                               marketing (Huynh et al, 2019). 
                           Sentiment analysis (SA) is one of the subfields of                                       Vietnamese is a language isolate (Nguyen et al., 
                           Computational Linguistics and Natural Language                                      2006). The meaning of a sentence belongs to the way 
                           Processing (NLP) (Gamal et al., 2019).  In the                                      for organizing of its predicates (Clark, 1974). In other 
                           businesses intelligence, the sentiment analysis makes                               words, the information about word positions 
                           the brands understanding the sentiment of their                                     contribute the sentence meaning and grammatical 
                           customers (Rokade and Kumari, 2019). They can                                       meaning. The analysing on the Vietnamese sentence 
                           know what people are saying, how they’re saying it,                                 has to combine the studying of the grammar structure. 
                           and what they mean. The sentiment of customer                                            Some machine learning-based approaches have 
                           sentiment can be found in tweets, comments, reviews,                                been studied to analysis the sentiment of a 
                           or other places where people mention the brands.                                    Vietnamese sentence.  
                                In the current era, social network is a popular                                     CountVectorizer (Irfan et al., 2015) and Term 
                           platform for communication and interaction (Beigi,                                  Frequency–Inverse Document Frequency (Tf-idf) 
                           2016). Many people found innovative information on                                  (Aggarwal, 2011) are used for word representations. 
                           social network and due to that social network is the                                However, they cannot analysis the positions of words 
                           important data source. SA is also used to detect the                                in a sentence, so their results are not exactly. Support 
                                                                                                                            
                           a    https://orcid.org/0000-0002-8527-0602 
                           b    https://orcid.org/0000-0002-3354-013X 
                           c    https://orcid.org/0000-0002-3858-7340 
                                                                                                                                                                                     339
                           Nguyen, H., Huynh, T., Hoang, S., Pham, V. and Zelinka, I.
                           Language-oriented Sentiment Analysis based on the Grammar Structure and Improved Self-attention Network.
                           DOI: 10.5220/0009358803390346
                           In Proceedings of the 15th International Conference on Evaluation of Novel Approaches to Software Engineering (ENASE 2020), pages 339-346
                           ISBN: 978-989-758-421-3
                                    c
                           Copyright 
 2020 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved
                      ENASE2020-15thInternational Conference on Evaluation of Novel Approaches to Software Engineering
                      Vector Machine (Joachims, 1998) and Naïve Bayes                      analysed to determine whether they are positive, 
                      (Irfan et al., 2015) are used as classifiers. However,               negative or neutral.  
                      those methods did not mention to the structure of a                      The experimental results show that our method 
                      sentence, so their results are not suitable in the                   being more effective than other in Vietnamese 
                      practice.                                                            sentiment analysis. Its accuracy and F-measure are 
                          In (Krouska et al., 2017, Troussas et al., 2016),                more than 91% and its results are suitable to apply in 
                      authors present five well-known learning-based                       practice for business intelligence. 
                      classifiers (Naïve Bayes, Support Vector Machine, k-                     The next section presents some techniques of the 
                      Nearest Neighbor, Logistic Regression and C4.5) and                  Transformer. Section 3 presents the method for 
                      a lexicon-based approach (SentiStrength) to analysis                 Vietnamese sentiment analysis. That method uses the 
                      the sentiment on Twitter. However, it only studies on                improved architecture of self-attention with 
                      English.                                                             transformer on the structure of the sentences in 
                          Besides, some types of recurrent neural networks                 Vietnamese to determine their meaning. Section 4 
                      (RNNs), such as long short-term memory (LSTM)                        described the experimental results. The last section 
                      (Hochreiter, 1997, Cheng et al., 2016), Bi-Directional               concludes the main results in this paper. 
                      LSTM (biLSTM) (Schuster and Paliwal, 1997) or 
                      gated recurrent unit (GRU) (Chung et al., 2014), are 
                      very complex and take a long time to solve the                       2 SELF-ATTENTION NETWORK 
                      problem about sentiment analysis on Vietnamese. 
                          The sentiment analysis for Vietnamese was                        Scaled Dot-Product Attention: Let si - 1 be a query 
                      researched in (Nguyen et al., 2014). This study                      vector q, and h is duplicated with one is key vector k 
                      investigated the task regarding both Support Vector                                   j                                            j
                                                                                           and the other is value vector v (in 
                      Machine (SVM) model and linguistics feature aspects                                                                        j
                      which is an annotated corpus for sentiment                           current NLP work, the key and value vector are 
                                                                                           frequently the same, there for h can be considered as 
                      classification extracted from hotel reviews in                                                            j
                                                                                           k or v).  
                      Vietnamese. However, this method is not designed                      j     j
                      based on the grammar structure, so some sentences                                                n                                      (1) 
                                                                                                                  ca       v
                                                                                                                       jj
                      cannot be determined accurately.                                                                j1
                          Self-attention has been used successfully in a                                                                         T
                                                                                                           exp(eq)                             .k
                                                                                                                 jj
                                                                                           where  ae, and                         (q,k)               (2)
                                                                                                        
                                                                                                     jjj
                      variety of tasks including reading comprehension,                                   n                                   d
                      abstractive summarization, textual entailment and                                  exp(e )                               model
                                                                                                                   k
                      learning task-independent sentence representations                                 k1
                      (Zhou et al., 2018). The Transformer (Vaswani et al.,                                                                               
                      2017) is the transduction model based on self-                                                                                       (1  j  n)  
                      attention to compute representations of its input and                    dmodel is the dimension of input vectors or k vector 
                      output without using sequence aligned RNNs or                        (q, k, v have the same dimension as input embedding 
                      convolution. In (Hoang et al., 2019), authors study                  vector) 
                      sentiment analysis of product reviews in Vietnamese                      Self-attention is a mechanism to apply Scaled 
                      by using Self-attention neural networks. However,                    Dot-Product Attention to every token of the sentence 
                      that study does not mention to the structure of                      for all others.  
                      Vietnamese sentence in the analysing, so its results                     For every token in sentence, three vectors Query, 
                      are not exactly and suitable the practical                           Key, Value are created by using a linear feed-forward 
                      requirements.                                                        layer as a transformation, then the attention 
                          In this paper, the method for Vietnamese                         mechanism is applied to get the context matrix. 
                      sentiment analysis is proposed. This method is used                  However, this process is very slow, so we consider 
                      to determine the sentiment of a sentiment sentence                   three matrices Q, K, V:  
                      including positive, negative or neutral. The structures                    Q is a matrix containing all the query vectors, 
                                                                                           Q = [q, q ,..., qn] with q is a query vector.  
                      of a Vietnamese sentence are studied. Based on those                         1  2                i
                      structures, the meaning of this sentence is analysed by                    K is a matrix containing all the key vectors, K 
                                                                                           = [k , k , ..., kn] with k  is a key vector. 
                      using the self-attention neural network architecture                      1  2                i
                      Transformer. Besides, the layer of Squeeze and                             V is a matrix containing all the key vectors, V 
                                                                                           = [v , v , ..., vn] with v  is a value vector. 
                      Excitation (Hu et al., 2018) is also used to recalibrate                  1  2                i
                      features in the process. The sentences will be                       Thus, we have: 
                      340
                                                   Language-oriented Sentiment Analysis based on the Grammar Structure and Improved Self-attention Network
                                                                       T                   indicates the speaker’s desire to influence future 
                                                               
                                                                  QK.                 
                            Attention(,Q K,V)softmax                      .V    (3)       events. In the problem about sentiment analysis, we 
                                                               
                                                               
                                                                   dmodel                  only need to determine whether a sentence is positive, 
                                                               
                      Multi-head Attention performs the attention h times                  negative or neutral; thus, in the scope of this paper, 
                      with (Q, K, V) matrices of the dimension d           /h. Each        we only mention to the declarative sentence type. 
                                                                        model
                      head is a time for applying Attention. For each head,                     The structure of a single declarative sentence in 
                      the (Q, K, V) matrices are uniquely projected with the               Vietnamese is shown in Fig.1: 
                      dimensions  d        /h. Self-attention mechanism is 
                                        model
                      performed to yield an output of the same dimension 
                      d    /h. After all, the outputs of h  heads are 
                        model
                      concatenated, and applied a linear projection layer 
                      once again. The formula for this process is as follows: 
                       MultiHead(Q,K,V)Concat head ,head ,...,head .WO
                                                       
                                                             12h
                                                                                      
                                                    OOO  
                            where  head  Q.W ,K.W ,V.W                             (4)
                                         i                                                                                                        
                                                                                           Figure 1: Structure of a single declarative sentence in 
                      3     METHOD FOR VIETNAMESES                                         Vietnamese. 
                            SENTIMENT ANALYSIS                                             Definition 1: Kinds of the structure of a positive 
                                                                                           sentence 
                      In this section, the method for analysing the sentiment                   A single positive declarative sentence in 
                      of a Vietnamese sentence is proposed. The sentences                  Vietnamese has the foundation structure: 
                      will be analysed to determine whether they are                                            =  

positive, negative or neutral. It is classified as Table 1. Firstly, the structures of a Vietnamese sentence Table 1: Kinds of the structure of a positive sentence. are studied. Because the scope of this study is the evaluation comments for a product on the social Kinds Variants network, there are two kinds of declarative sentence P is : “là” were mentioned: positive and negative sentence. Secondly, based on those structures, the meaning = of this sentence is analysed by using the self-attention neural network architecture Transformer. Because the meaning of a Vietnamese sentence belongs to the P is : positions of words, our method is added the layer

determining the word positions into the processing = 1 2 2

the transformer. Besides, the layer of Squeeze and 1 2 Excitation (Hu et al., 2018) is also used to recalibrate P is : “thì” features in the process. = 3.1 Structure of a Vietnamese Sentence P is with belongs to = Vietnamese is a language isolate. The structure of a Definition 2: Kinds of the structure of a negative normal sentence of Vietnamese includes subjectum sentence (or thema) and praedicatum (or rhema). Subjectum is A single negative declarative sentence in the direct factor of a sentence describing the scope of Vietnamese has the foundation structure: thing which is mentioned in the second direct factor - =

praedicatum (Cao, 2017). It is classified as Table 2. There are three frequent sentence types: declarative, interrogative, and imperative. The declarative is subject to judgments of truth and falsehood (Cao, 2017). The interrogative elicits a verbal response from the addressee. The imperative 341 ENASE2020-15thInternational Conference on Evaluation of Novel Approaches to Software Engineering Table 2: Kinds of the structure of a negative sentence. 3.2 Pre-processing Method Kinds Variants P is : “là” Datasets will be gone through a pre-processing pipeline of the text documents. Some available = research, such as sentence segmentation, normalize word> the text, word segmentation and noise cleaning, were mainly used to do this pipeline automatically. Sentence segmentation is a procedure to split a text normalized. P is : In the text normalization, the input will be low cased. Next, all the links, phone numbers and email = addresses were replaced by “urlObj”, word> 1 “phonenumObj” and “mailObj”, respectively. Finally, words tokenizer from Underthesea (2019) for Vietnamese was also applied. The input text will be split into words, phrases, or other meaningful parts, P is : namely tokens. = P is with belongs to 3.3 Word Embedding = The fastText (2019) is used for word embedding. In In a Vietnamese declarative sentence, each word many cases, users may type a wrong word has to been appeared orderly. Although two sentences accidentally or intentionally. fastText deals with this have the same referent, “same referent” means they problem very well by encoding at the character level. both describes an objectivity fact, they are not identity In case having a wrong word, very rare words or out- about the meaning. The meaning of a sentence of-vocabulary words, fastText still can represent them belongs to the way for organizing of its predicates. In with an embedding vector that most similar to word other words, the information about word positions met in trained sentences. contribute the sentence meaning and grammatical There had been no fastText pre-trained model for meaning. Vietnamese spoken language. Therefore, we trained Some characteristics of an isolate language, fastText for Vietnamese vocabulary as embedding especially Vietnamese, for learning context are as pre-trained weights from a corpus over 70,000 follows: documents of multi-products reviews crawled from e- ● In linguistic activities, words do not change commerce sites mentioned above with no label. Rare their morphemes. Grammatical meanings are words that occur less than 5 times in the vocabulary not included in words. were removed. Embedding size is 384. After training, ● Formal word, word position and word order we have 5,534 vocabularies in total. clarify the grammatical relationship as well as the grammatical meaning of words and 3.4 Sentiment Analysis in Vietnamese sentences. Example: Add the formal words “sẽ” (will) or “đang” (_ing) before “học” (study) will In original architecture of Transformer, the position change the tense of the action. Another example encoding for a word is summed with Context of reversing words also changes the meaning of encoding from pre-trained fastText model (with same grammar, for example "chân bàn" (leg of table) dimensions of features). After this process, the and "bàn chân" (foot). outputs were applied a linear projection to create ● The lines between syllables, morphemes and three vectors Q (query), K (key), V (value) as input words are not clear. Example: In Vietnamese for Multi-head Attention layer: "nhà" is a morpheme, and also is a word. (5) AWCP.(  ) The main point of this research is around the A importance of word position information to where A is one of the three vectors Q, K, or V, as contribute sentence meanings and grammatical inputs of Multi-head Attention, which were meanings. mentioned in Section 2. C is the context encoding with d dimension, and P is the position encoding model 342
The words contained in this file might help you see if this file matches what you are looking for:

...Language oriented sentiment analysis based on the grammar structure and improved self attention network a b c hien d nguyen tai huynh suong n hoang vuong t pham ivan zelinka faculty of computer science university information technology ho chi minh city vietnam national ton duc thang kyanon digital sai gon technical ostrava vsb tu czech republic cz equal contribution by keywords classification vietnamese transformer natural processing abstract in businesses makes brands understanding their customers they can know what people are saying how re it mean there many methods for however not effective when were applied this paper method is studied combining between technique with architecture analysing sentence used to process word positions determine meaning that experimental results our more effectively than others its accuracy f measure suitable apply practice business intelligence introduction influencer social marketing et al sa one subfields isolate computational linguistics belongs way ...

no reviews yet
Please Login to review.