The textcat Package for n -Gram Based Text Categorization in R

Identifying the language used will typically be the first step in most natural language processing tasks. Among the wide variety of language identification methods discussed in the literature, the ones employing the Cavnar and Trenkle (1994) approach to text categorization based on character n-gram...

Full description

Bibliographic Details
Main Authors:	Kurt Hornik, Patrick Mair, Johannes Rauch, Wilhelm Geiger, Christian Buchta, Ingo Feinerer
Format:	Article
Language:	English
Published:	Foundation for Open Access Statistics 2013-01-01
Series:	Journal of Statistical Software
Subjects:	text mining text categorization language identication n -grams textcat R
Online Access:	http://www.jstatsoft.org/v52/i06/paper

Internet

http://www.jstatsoft.org/v52/i06/paper

The textcat Package for n -Gram Based Text Categorization in R

Internet

Similar Items