spa: Semi-Supervised Semi-Parametric Graph-Based Estimation in R
In this paper, we present an R package that combines feature-based (X) data and graph-based (G) data for prediction of the response Y . In this particular case, Y is observed for a subset of the observations (labeled) and missing for the remainder (unlabeled). We examine an approach for fitting Y =...
Main Author: | |
---|---|
Format: | Article |
Language: | English |
Published: |
Foundation for Open Access Statistics
2011-04-01
|
Series: | Journal of Statistical Software |
Subjects: | |
Online Access: | http://www.jstatsoft.org/v40/i10/paper |
_version_ | 1828804700701458432 |
---|---|
author | Mark Culp |
author_facet | Mark Culp |
author_sort | Mark Culp |
collection | DOAJ |
description | In this paper, we present an R package that combines feature-based (X) data and graph-based (G) data for prediction of the response Y . In this particular case, Y is observed for a subset of the observations (labeled) and missing for the remainder (unlabeled). We examine an approach for fitting Y = Xβ + f(G) where β is a coefficient vector and f is a function over the vertices of the graph. The procedure is semi-supervised in nature (trained on the labeled and unlabeled sets), requiring iterative algorithms for fitting this estimate. The package provides several key functions for fitting and evaluating an estimator of this type. The package is illustrated on a text analysis data set, where the observations are text documents (papers), the response is the category of paper (either applied or theoretical statistics), the X information is the name of the journal in which the paper resides, and the graph is a co-citation network, with each vertex an observation and each edge the number of times that the two papers cite a common paper. An application involving classification of protein location using a protein interaction graph and an application involving classification on a manifold with part of the feature data converted to a graph are also presented. |
first_indexed | 2024-12-12T07:44:20Z |
format | Article |
id | doaj.art-1bce8b4649d34b459443c01e7f3a80ba |
institution | Directory Open Access Journal |
issn | 1548-7660 |
language | English |
last_indexed | 2024-12-12T07:44:20Z |
publishDate | 2011-04-01 |
publisher | Foundation for Open Access Statistics |
record_format | Article |
series | Journal of Statistical Software |
spelling | doaj.art-1bce8b4649d34b459443c01e7f3a80ba2022-12-22T00:32:40ZengFoundation for Open Access StatisticsJournal of Statistical Software1548-76602011-04-014010spa: Semi-Supervised Semi-Parametric Graph-Based Estimation in RMark CulpIn this paper, we present an R package that combines feature-based (X) data and graph-based (G) data for prediction of the response Y . In this particular case, Y is observed for a subset of the observations (labeled) and missing for the remainder (unlabeled). We examine an approach for fitting Y = Xβ + f(G) where β is a coefficient vector and f is a function over the vertices of the graph. The procedure is semi-supervised in nature (trained on the labeled and unlabeled sets), requiring iterative algorithms for fitting this estimate. The package provides several key functions for fitting and evaluating an estimator of this type. The package is illustrated on a text analysis data set, where the observations are text documents (papers), the response is the category of paper (either applied or theoretical statistics), the X information is the name of the journal in which the paper resides, and the graph is a co-citation network, with each vertex an observation and each edge the number of times that the two papers cite a common paper. An application involving classification of protein location using a protein interaction graph and an application involving classification on a manifold with part of the feature data converted to a graph are also presented.http://www.jstatsoft.org/v40/i10/papersemi-supervised learninggraph-based classificationsemi-parametric modeslR |
spellingShingle | Mark Culp spa: Semi-Supervised Semi-Parametric Graph-Based Estimation in R Journal of Statistical Software semi-supervised learning graph-based classification semi-parametric modesl R |
title | spa: Semi-Supervised Semi-Parametric Graph-Based Estimation in R |
title_full | spa: Semi-Supervised Semi-Parametric Graph-Based Estimation in R |
title_fullStr | spa: Semi-Supervised Semi-Parametric Graph-Based Estimation in R |
title_full_unstemmed | spa: Semi-Supervised Semi-Parametric Graph-Based Estimation in R |
title_short | spa: Semi-Supervised Semi-Parametric Graph-Based Estimation in R |
title_sort | spa semi supervised semi parametric graph based estimation in r |
topic | semi-supervised learning graph-based classification semi-parametric modesl R |
url | http://www.jstatsoft.org/v40/i10/paper |
work_keys_str_mv | AT markculp spasemisupervisedsemiparametricgraphbasedestimationinr |