mbonsai: Application Package for Sequence Classification by Tree Methodology

In many applications such as transaction data analysis, the classification of long chains of sequences is required. For example, brand purchase history in customer transaction data is in a form like AABCABAA, where A, B, and C are brands of a consumer product. The decision tree-based package mbonsai...

Full description

Bibliographic Details
Main Authors: Yukinobu Hamuro, Masakazu Nakamoto, Stephane Cheung, Edward H. Ip
Format: Article
Language:English
Published: Foundation for Open Access Statistics 2018-09-01
Series:Journal of Statistical Software
Subjects:
Online Access:https://www.jstatsoft.org/index.php/jss/article/view/2603
_version_ 1819173390088404992
author Yukinobu Hamuro
Masakazu Nakamoto
Stephane Cheung
Edward H. Ip
author_facet Yukinobu Hamuro
Masakazu Nakamoto
Stephane Cheung
Edward H. Ip
author_sort Yukinobu Hamuro
collection DOAJ
description In many applications such as transaction data analysis, the classification of long chains of sequences is required. For example, brand purchase history in customer transaction data is in a form like AABCABAA, where A, B, and C are brands of a consumer product. The decision tree-based package mbonsai is designed to handle sequence data of varying lengths using one or multiple variables of interest as predictor variables. This software package uses tree growing and pruning strategies adopted from C4.5 and CART algorithms, and includes new features for handling sequence data and indexing for classification purpose. The software uses a simple command line program for learning and predicting processes, and has the ability to generate user-friendly graphics depicting decision trees. The underlying C++ codes are designed to efficiently process large data sets in ASCII files. Two examples from transaction data sets are used to illustrate the application of mbonsai.
first_indexed 2024-12-22T20:22:19Z
format Article
id doaj.art-3dbeb6c5beb54d069c04ae3ce9950bab
institution Directory Open Access Journal
issn 1548-7660
language English
last_indexed 2024-12-22T20:22:19Z
publishDate 2018-09-01
publisher Foundation for Open Access Statistics
record_format Article
series Journal of Statistical Software
spelling doaj.art-3dbeb6c5beb54d069c04ae3ce9950bab2022-12-21T18:13:49ZengFoundation for Open Access StatisticsJournal of Statistical Software1548-76602018-09-0186113010.18637/jss.v086.i061238mbonsai: Application Package for Sequence Classification by Tree MethodologyYukinobu HamuroMasakazu NakamotoStephane CheungEdward H. IpIn many applications such as transaction data analysis, the classification of long chains of sequences is required. For example, brand purchase history in customer transaction data is in a form like AABCABAA, where A, B, and C are brands of a consumer product. The decision tree-based package mbonsai is designed to handle sequence data of varying lengths using one or multiple variables of interest as predictor variables. This software package uses tree growing and pruning strategies adopted from C4.5 and CART algorithms, and includes new features for handling sequence data and indexing for classification purpose. The software uses a simple command line program for learning and predicting processes, and has the ability to generate user-friendly graphics depicting decision trees. The underlying C++ codes are designed to efficiently process large data sets in ASCII files. Two examples from transaction data sets are used to illustrate the application of mbonsai.https://www.jstatsoft.org/index.php/jss/article/view/2603decision treesequenceclassificationalphabet indexing
spellingShingle Yukinobu Hamuro
Masakazu Nakamoto
Stephane Cheung
Edward H. Ip
mbonsai: Application Package for Sequence Classification by Tree Methodology
Journal of Statistical Software
decision tree
sequence
classification
alphabet indexing
title mbonsai: Application Package for Sequence Classification by Tree Methodology
title_full mbonsai: Application Package for Sequence Classification by Tree Methodology
title_fullStr mbonsai: Application Package for Sequence Classification by Tree Methodology
title_full_unstemmed mbonsai: Application Package for Sequence Classification by Tree Methodology
title_short mbonsai: Application Package for Sequence Classification by Tree Methodology
title_sort mbonsai application package for sequence classification by tree methodology
topic decision tree
sequence
classification
alphabet indexing
url https://www.jstatsoft.org/index.php/jss/article/view/2603
work_keys_str_mv AT yukinobuhamuro mbonsaiapplicationpackageforsequenceclassificationbytreemethodology
AT masakazunakamoto mbonsaiapplicationpackageforsequenceclassificationbytreemethodology
AT stephanecheung mbonsaiapplicationpackageforsequenceclassificationbytreemethodology
AT edwardhip mbonsaiapplicationpackageforsequenceclassificationbytreemethodology