jagomart
digital resources
picture1_Text Mining Pdf 88918 | Lecture 9   Representing And Mining Text


 176x       Filetype PDF       File size 0.86 MB       Source: pages.stern.nyu.edu


File: Text Mining Pdf 88918 | Lecture 9 Representing And Mining Text
data mining for business analytics lecture 9 representing and mining text stern school of business new york university spring 2014 p adamopoulos new york university dealing with text data are ...

icon picture PDF Filetype PDF | Posted on 15 Sep 2022 | 3 years ago
Partial capture of text on file.
                                  Data Mining for Business Analytics 
                                  Lecture 9: Representing and Mining Text 
                                   
                                  Stern School of Business 
                                  New York University 
                                  Spring 2014 
        P. Adamopoulos                                                                  New York University 
       Dealing with Text 
        •    Data are represented in ways natural to problems from which they 
             were derived 
        •    Vast amount of text.. 
        •    If we want to apply the many data mining tools that we have at our 
             disposal, we must  
               •  either engineer the data representation to match the tools 
                  (representation engineering), or  
               •  build new tools to match the data 
         P. Adamopoulos                                                                           New York University 
       Why Text is Difficult 
        •    Text is “unstructured” 
               •  Linguistic structure is intended for human communication and not 
                  computers 
        •    Word order matters sometimes 
        •    Text can be dirty 
               •  People write ungrammatically, misspell words, abbreviate unpredictably, 
                  and punctuate randomly 
               •  Synonyms, homograms, abbreviations, etc. 
        •    Context matters 
         P. Adamopoulos                                                                           New York University 
       Text Representation 
        •    Goal: Take a set of documents –each of which is a relatively free-
             form sequence of words– and turn it into our familiar feature-vector 
             form 
        •    A collection of documents is called a corpus 
        •    A document is composed of individual tokens or terms 
        •    Each document is one instance  
               •  but we don’t know in advance what the features will be 
              
         P. Adamopoulos                                                                           New York University 
The words contained in this file might help you see if this file matches what you are looking for:

...Data mining for business analytics lecture representing and text stern school of new york university spring p adamopoulos dealing with are represented in ways natural to problems from which they were derived vast amount if we want apply the many tools that have at our disposal must either engineer representation match engineering or build why is difficult unstructured linguistic structure intended human communication not computers word order matters sometimes can be dirty people write ungrammatically misspell words abbreviate unpredictably punctuate randomly synonyms homograms abbreviations etc context goal take a set documents each relatively free form sequence turn it into familiar feature vector collection called corpus document composed individual tokens terms one instance but don t know advance what features will...

no reviews yet
Please Login to review.