The STUDIA UNIVERSITATIS BABEŞ-BOLYAI issue article summary

The summary of the selected article appears at the bottom of the page. In order to get back to the contents of the issue this article belongs to you have to access the link from the title. In order to see all the articles of the archive which have as author/co-author one of the authors mentioned below, you have to access the link from the author's name.

 
       
         
    STUDIA INFORMATICA - Issue no. 2 / 2013  
         
  Article:   TEXT REPRESENTATION AND GENERAL TOPIC ANNOTATION BASED ON LATENT DIRICHLET ALLOCATION.

Authors:  DIANA INKPEN.
 
       
         
  Abstract:  

We propose a low-dimensional text representation method for topic classification. A Latent Dirichet Allocation (LDA) model is built on a large amount of unlabelled data, in order to extract potential topic clusters. Each document is represented as a distribution over these clusters.We experiment with two datasets. We collected the first dataset from the FriendFeed social network and we manually annotated part of it with 10 general classes. The second dataset is a standard text classification bench-mark, Reuters 21578, the R8 subset (annotated with 8 classes). We show that classification based on the LDA representation leads to acceptable results, while combining a bag-of-words representation with the LDA representation leads to further improvements. We also propose a multi-level LDA representation that catches topic cluster distributions from generic ones to more specific ones.2010

Mathematics Subject Classification. 62Fxx Parametric inference, 62Pxx Applications.1998 CR Categories and Descriptors. code [I.2.7 Natural Language Processing]:Subtopic - Text analisys code [H.3.1 Content Analysis and Indexing]: Subtopic - Linguistic processing;

Key words and phrases. automatic text classification, topic detection, latent Dirichlet allocation.This paper has been presented at the International Conference KEPT2013: Knowledge Engineering Principles and Techniques, organized by Babeș-Bolyai University, Cluj-Napoca, July 5-7 2013.

 
         
     
         
         
      Back to previous page