Enhancing document clustering by integrating semantic background knowledge and syntactic features into the bag of words representation
The basic Bag of Words (BOW) representation generally used in text documents clustering or categorization loses important syntactic and semantic information contained in the documents. When the texts contain a lot of stop words or when they are of a short length this may be particularly problematic....
Main Authors: | Rayner Alfred, Suraya Alias, Asni Tahir |
---|---|
Formato: | Research Report |
Idioma: | English |
Publicado: |
Universiti Malaysia Sabah
2011
|
Subjects: | |
Acceso en liña: | https://eprints.ums.edu.my/id/eprint/22890/1/Enhancing%20document%20clustering%20by%20integrating%20semantic%20background%20knowledge%20and%20syntactic%20features%20into%20the%20bag%20of%20words%20representation.pdf |
Títulos similares
-
Fuzzy bag-of-words model for document representation
por: Zhao, Rui, et al.
Publicado: (2020) -
Development of a text analyzer for automatic categorization of texts documents based on interactive visualization approach
por: Mohd Norhisham Razali, et al.
Publicado: (2011) -
Human Document Classification Using Bags of Words
por: Wolf, Florian, et al.
Publicado: (2006) -
Syntactic and semantic image representations for computer vision
por: Horowitz, Bradley Joseph
Publicado: (2011) -
Clustering Syntactic Positions with Similar Semantic Requirements
por: Pablo Gamallo, et al.
Publicado: (2021-03-01)