Exploration of the (non-)asymptotic bias and variance of stochastic gradient Langevin dynamics

Comparing strings and assessing their similarity is a basic operation in many application domains of machine learning, such as in information retrieval, natural language processing and bioinformatics. The practitioner can choose from a large variety of available similarity measures for this task, ea...

תיאור מלא

מידע ביבליוגרפי
Main Authors: Vollmer, S, Zygalakis, K, Teh, Y
פורמט: Journal article
יצא לאור: Journal of Machine Learning Research 2016
_version_ 1826306677977120768
author Vollmer, S
Zygalakis, K
Teh, Y
author_facet Vollmer, S
Zygalakis, K
Teh, Y
author_sort Vollmer, S
collection OXFORD
description Comparing strings and assessing their similarity is a basic operation in many application domains of machine learning, such as in information retrieval, natural language processing and bioinformatics. The practitioner can choose from a large variety of available similarity measures for this task, each emphasizing different aspects of the string data. In this article, we present Harry, a small tool specifically designed for measuring the similarity of strings. Harry implements over 20 similarity measures, including common string distances and string kernels, such as the Levenshtein distance and the Subsequence kernel. The tool has been designed with efficiency in mind and allows for multi-threaded as well as distributed computing, enabling the analysis of large data sets of strings. Harry supports common data formats and thus can interface with analysis environments, such as Matlab, Pylab and Weka.
first_indexed 2024-03-07T06:51:36Z
format Journal article
id oxford-uuid:fcbd2a3c-16ed-4306-a024-e11c41c0da17
institution University of Oxford
last_indexed 2024-03-07T06:51:36Z
publishDate 2016
publisher Journal of Machine Learning Research
record_format dspace
spelling oxford-uuid:fcbd2a3c-16ed-4306-a024-e11c41c0da172022-03-27T13:23:15ZExploration of the (non-)asymptotic bias and variance of stochastic gradient Langevin dynamicsJournal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:fcbd2a3c-16ed-4306-a024-e11c41c0da17Symplectic Elements at OxfordJournal of Machine Learning Research2016Vollmer, SZygalakis, KTeh, YComparing strings and assessing their similarity is a basic operation in many application domains of machine learning, such as in information retrieval, natural language processing and bioinformatics. The practitioner can choose from a large variety of available similarity measures for this task, each emphasizing different aspects of the string data. In this article, we present Harry, a small tool specifically designed for measuring the similarity of strings. Harry implements over 20 similarity measures, including common string distances and string kernels, such as the Levenshtein distance and the Subsequence kernel. The tool has been designed with efficiency in mind and allows for multi-threaded as well as distributed computing, enabling the analysis of large data sets of strings. Harry supports common data formats and thus can interface with analysis environments, such as Matlab, Pylab and Weka.
spellingShingle Vollmer, S
Zygalakis, K
Teh, Y
Exploration of the (non-)asymptotic bias and variance of stochastic gradient Langevin dynamics
title Exploration of the (non-)asymptotic bias and variance of stochastic gradient Langevin dynamics
title_full Exploration of the (non-)asymptotic bias and variance of stochastic gradient Langevin dynamics
title_fullStr Exploration of the (non-)asymptotic bias and variance of stochastic gradient Langevin dynamics
title_full_unstemmed Exploration of the (non-)asymptotic bias and variance of stochastic gradient Langevin dynamics
title_short Exploration of the (non-)asymptotic bias and variance of stochastic gradient Langevin dynamics
title_sort exploration of the non asymptotic bias and variance of stochastic gradient langevin dynamics
work_keys_str_mv AT vollmers explorationofthenonasymptoticbiasandvarianceofstochasticgradientlangevindynamics
AT zygalakisk explorationofthenonasymptoticbiasandvarianceofstochasticgradientlangevindynamics
AT tehy explorationofthenonasymptoticbiasandvarianceofstochasticgradientlangevindynamics