spa: Semi-Supervised Semi-Parametric Graph-Based Estimation in R

In this paper, we present an R package that combines feature-based (X) data and graph-based (G) data for prediction of the response Y . In this particular case, Y is observed for a subset of the observations (labeled) and missing for the remainder (unlabeled). We examine an approach for fitting Y =...

Full description

Bibliographic Details
Main Author: Mark Culp
Format: Article
Language:English
Published: Foundation for Open Access Statistics 2011-04-01
Series:Journal of Statistical Software
Subjects:
Online Access:http://www.jstatsoft.org/v40/i10/paper
_version_ 1828804700701458432
author Mark Culp
author_facet Mark Culp
author_sort Mark Culp
collection DOAJ
description In this paper, we present an R package that combines feature-based (X) data and graph-based (G) data for prediction of the response Y . In this particular case, Y is observed for a subset of the observations (labeled) and missing for the remainder (unlabeled). We examine an approach for fitting Y = Xβ + f(G) where β is a coefficient vector and f is a function over the vertices of the graph. The procedure is semi-supervised in nature (trained on the labeled and unlabeled sets), requiring iterative algorithms for fitting this estimate. The package provides several key functions for fitting and evaluating an estimator of this type. The package is illustrated on a text analysis data set, where the observations are text documents (papers), the response is the category of paper (either applied or theoretical statistics), the X information is the name of the journal in which the paper resides, and the graph is a co-citation network, with each vertex an observation and each edge the number of times that the two papers cite a common paper. An application involving classification of protein location using a protein interaction graph and an application involving classification on a manifold with part of the feature data converted to a graph are also presented.
first_indexed 2024-12-12T07:44:20Z
format Article
id doaj.art-1bce8b4649d34b459443c01e7f3a80ba
institution Directory Open Access Journal
issn 1548-7660
language English
last_indexed 2024-12-12T07:44:20Z
publishDate 2011-04-01
publisher Foundation for Open Access Statistics
record_format Article
series Journal of Statistical Software
spelling doaj.art-1bce8b4649d34b459443c01e7f3a80ba2022-12-22T00:32:40ZengFoundation for Open Access StatisticsJournal of Statistical Software1548-76602011-04-014010spa: Semi-Supervised Semi-Parametric Graph-Based Estimation in RMark CulpIn this paper, we present an R package that combines feature-based (X) data and graph-based (G) data for prediction of the response Y . In this particular case, Y is observed for a subset of the observations (labeled) and missing for the remainder (unlabeled). We examine an approach for fitting Y = Xβ + f(G) where β is a coefficient vector and f is a function over the vertices of the graph. The procedure is semi-supervised in nature (trained on the labeled and unlabeled sets), requiring iterative algorithms for fitting this estimate. The package provides several key functions for fitting and evaluating an estimator of this type. The package is illustrated on a text analysis data set, where the observations are text documents (papers), the response is the category of paper (either applied or theoretical statistics), the X information is the name of the journal in which the paper resides, and the graph is a co-citation network, with each vertex an observation and each edge the number of times that the two papers cite a common paper. An application involving classification of protein location using a protein interaction graph and an application involving classification on a manifold with part of the feature data converted to a graph are also presented.http://www.jstatsoft.org/v40/i10/papersemi-supervised learninggraph-based classificationsemi-parametric modeslR
spellingShingle Mark Culp
spa: Semi-Supervised Semi-Parametric Graph-Based Estimation in R
Journal of Statistical Software
semi-supervised learning
graph-based classification
semi-parametric modesl
R
title spa: Semi-Supervised Semi-Parametric Graph-Based Estimation in R
title_full spa: Semi-Supervised Semi-Parametric Graph-Based Estimation in R
title_fullStr spa: Semi-Supervised Semi-Parametric Graph-Based Estimation in R
title_full_unstemmed spa: Semi-Supervised Semi-Parametric Graph-Based Estimation in R
title_short spa: Semi-Supervised Semi-Parametric Graph-Based Estimation in R
title_sort spa semi supervised semi parametric graph based estimation in r
topic semi-supervised learning
graph-based classification
semi-parametric modesl
R
url http://www.jstatsoft.org/v40/i10/paper
work_keys_str_mv AT markculp spasemisupervisedsemiparametricgraphbasedestimationinr