The textcat Package for n -Gram Based Text Categorization in R
Identifying the language used will typically be the first step in most natural language processing tasks. Among the wide variety of language identification methods discussed in the literature, the ones employing the Cavnar and Trenkle (1994) approach to text categorization based on character n-gram...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Foundation for Open Access Statistics
2013-01-01
|
Series: | Journal of Statistical Software |
Subjects: | |
Online Access: | http://www.jstatsoft.org/v52/i06/paper |