Exploration of the (non-)asymptotic bias and variance of stochastic gradient Langevin dynamics
Comparing strings and assessing their similarity is a basic operation in many application domains of machine learning, such as in information retrieval, natural language processing and bioinformatics. The practitioner can choose from a large variety of available similarity measures for this task, ea...
Main Authors: | , , |
---|---|
פורמט: | Journal article |
יצא לאור: |
Journal of Machine Learning Research
2016
|
_version_ | 1826306677977120768 |
---|---|
author | Vollmer, S Zygalakis, K Teh, Y |
author_facet | Vollmer, S Zygalakis, K Teh, Y |
author_sort | Vollmer, S |
collection | OXFORD |
description | Comparing strings and assessing their similarity is a basic operation in many application domains of machine learning, such as in information retrieval, natural language processing and bioinformatics. The practitioner can choose from a large variety of available similarity measures for this task, each emphasizing different aspects of the string data. In this article, we present Harry, a small tool specifically designed for measuring the similarity of strings. Harry implements over 20 similarity measures, including common string distances and string kernels, such as the Levenshtein distance and the Subsequence kernel. The tool has been designed with efficiency in mind and allows for multi-threaded as well as distributed computing, enabling the analysis of large data sets of strings. Harry supports common data formats and thus can interface with analysis environments, such as Matlab, Pylab and Weka. |
first_indexed | 2024-03-07T06:51:36Z |
format | Journal article |
id | oxford-uuid:fcbd2a3c-16ed-4306-a024-e11c41c0da17 |
institution | University of Oxford |
last_indexed | 2024-03-07T06:51:36Z |
publishDate | 2016 |
publisher | Journal of Machine Learning Research |
record_format | dspace |
spelling | oxford-uuid:fcbd2a3c-16ed-4306-a024-e11c41c0da172022-03-27T13:23:15ZExploration of the (non-)asymptotic bias and variance of stochastic gradient Langevin dynamicsJournal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:fcbd2a3c-16ed-4306-a024-e11c41c0da17Symplectic Elements at OxfordJournal of Machine Learning Research2016Vollmer, SZygalakis, KTeh, YComparing strings and assessing their similarity is a basic operation in many application domains of machine learning, such as in information retrieval, natural language processing and bioinformatics. The practitioner can choose from a large variety of available similarity measures for this task, each emphasizing different aspects of the string data. In this article, we present Harry, a small tool specifically designed for measuring the similarity of strings. Harry implements over 20 similarity measures, including common string distances and string kernels, such as the Levenshtein distance and the Subsequence kernel. The tool has been designed with efficiency in mind and allows for multi-threaded as well as distributed computing, enabling the analysis of large data sets of strings. Harry supports common data formats and thus can interface with analysis environments, such as Matlab, Pylab and Weka. |
spellingShingle | Vollmer, S Zygalakis, K Teh, Y Exploration of the (non-)asymptotic bias and variance of stochastic gradient Langevin dynamics |
title | Exploration of the (non-)asymptotic bias and variance of stochastic gradient Langevin dynamics |
title_full | Exploration of the (non-)asymptotic bias and variance of stochastic gradient Langevin dynamics |
title_fullStr | Exploration of the (non-)asymptotic bias and variance of stochastic gradient Langevin dynamics |
title_full_unstemmed | Exploration of the (non-)asymptotic bias and variance of stochastic gradient Langevin dynamics |
title_short | Exploration of the (non-)asymptotic bias and variance of stochastic gradient Langevin dynamics |
title_sort | exploration of the non asymptotic bias and variance of stochastic gradient langevin dynamics |
work_keys_str_mv | AT vollmers explorationofthenonasymptoticbiasandvarianceofstochasticgradientlangevindynamics AT zygalakisk explorationofthenonasymptoticbiasandvarianceofstochasticgradientlangevindynamics AT tehy explorationofthenonasymptoticbiasandvarianceofstochasticgradientlangevindynamics |