“The Naming of Catsâ€: Automated Genre Classification

This paper builds on the work presented at the ECDL 2006 in automated genre classification as a step toward automating metadata extraction from digital documents for ingest into digital repositories such as those run by archives, libraries and eprint services (Kim & Ross, 2006b). We have previou...

Full description

Bibliographic Details
Main Authors: Yunhyong Kim, Seamus Ross
Format: Article
Language:English
Published: University of Edinburgh 2008-12-01
Series:International Journal of Digital Curation
Online Access:http://129.215.67.233/ijdc/article/view/13
_version_ 1797435012609474560
author Yunhyong Kim
Seamus Ross
author_facet Yunhyong Kim
Seamus Ross
author_sort Yunhyong Kim
collection DOAJ
description This paper builds on the work presented at the ECDL 2006 in automated genre classification as a step toward automating metadata extraction from digital documents for ingest into digital repositories such as those run by archives, libraries and eprint services (Kim & Ross, 2006b). We have previously proposed dividing features of a document into five types (features for visual layout, language model features, stylometric features, features for semantic structure, and contextual features as an object linked to previously classified objects and other external sources) and have examined visual and language model features. The current paper compares results from testing classifiers based on image and stylometric features in a binary classification to show that certain genres have strong image features which enable effective separation of documents belonging to the genre from a large pool of other documents.
first_indexed 2024-03-09T10:41:08Z
format Article
id doaj.art-f104b101e07746a4818bdacea4ca3bdd
institution Directory Open Access Journal
issn 1746-8256
language English
last_indexed 2024-03-09T10:41:08Z
publishDate 2008-12-01
publisher University of Edinburgh
record_format Article
series International Journal of Digital Curation
spelling doaj.art-f104b101e07746a4818bdacea4ca3bdd2023-12-01T14:18:29ZengUniversity of EdinburghInternational Journal of Digital Curation1746-82562008-12-0121“The Naming of Catsâ€: Automated Genre ClassificationYunhyong KimSeamus RossThis paper builds on the work presented at the ECDL 2006 in automated genre classification as a step toward automating metadata extraction from digital documents for ingest into digital repositories such as those run by archives, libraries and eprint services (Kim & Ross, 2006b). We have previously proposed dividing features of a document into five types (features for visual layout, language model features, stylometric features, features for semantic structure, and contextual features as an object linked to previously classified objects and other external sources) and have examined visual and language model features. The current paper compares results from testing classifiers based on image and stylometric features in a binary classification to show that certain genres have strong image features which enable effective separation of documents belonging to the genre from a large pool of other documents.http://129.215.67.233/ijdc/article/view/13
spellingShingle Yunhyong Kim
Seamus Ross
“The Naming of Catsâ€: Automated Genre Classification
International Journal of Digital Curation
title “The Naming of Catsâ€: Automated Genre Classification
title_full “The Naming of Catsâ€: Automated Genre Classification
title_fullStr “The Naming of Catsâ€: Automated Genre Classification
title_full_unstemmed “The Naming of Catsâ€: Automated Genre Classification
title_short “The Naming of Catsâ€: Automated Genre Classification
title_sort a€oethe naming of catsa€ automated genre classification
url http://129.215.67.233/ijdc/article/view/13
work_keys_str_mv AT yunhyongkim aœthenamingofcatsaautomatedgenreclassification
AT seamusross aœthenamingofcatsaautomatedgenreclassification