Tweet2Vec: Learning Tweet Embeddings Using Character-level CNN-LSTM Encoder-Decoder

We present Tweet2Vec, a novel method for generating general- purpose vector representation of tweets. The model learns tweet embeddings using character-level CNN-LSTM encoder-decoder. We trained our model on 3 million, randomly selected English-language tweets. The model was evaluated using two meth...

Full description

Bibliographic Details
Main Authors:	Vosoughi, Soroush, Vijayaraghavan, Prashanth, Roy, Deb K
Other Authors:	Massachusetts Institute of Technology. Media Laboratory
Format:	Article
Language:	en_US
Published:	Association for Computing Machinery (ACM) 2016
Online Access:	http://hdl.handle.net/1721.1/104352 https://orcid.org/0000-0002-2564-8909 https://orcid.org/0000-0002-5826-1591 https://orcid.org/0000-0002-4333-7194

_version_	1826204092848930816
author	Vosoughi, Soroush Vijayaraghavan, Prashanth Roy, Deb K
author2	Massachusetts Institute of Technology. Media Laboratory
author_facet	Massachusetts Institute of Technology. Media Laboratory Vosoughi, Soroush Vijayaraghavan, Prashanth Roy, Deb K
author_sort	Vosoughi, Soroush
collection	MIT
description	We present Tweet2Vec, a novel method for generating general- purpose vector representation of tweets. The model learns tweet embeddings using character-level CNN-LSTM encoder-decoder. We trained our model on 3 million, randomly selected English-language tweets. The model was evaluated using two methods: tweet semantic similarity and tweet sentiment categorization, outperforming the previous state-of-the-art in both tasks. The evaluations demonstrate the power of the tweet embeddings generated by our model for various tweet categorization tasks. The vector representations generated by our model are generic, and hence can be applied to a variety of tasks. Though the model presented in this paper is trained on English-language tweets, the method presented can be used to learn tweet embeddings for different languages.
first_indexed	2024-09-23T12:48:47Z
format	Article
id	mit-1721.1/104352
institution	Massachusetts Institute of Technology
language	en_US
last_indexed	2024-09-23T12:48:47Z
publishDate	2016
publisher	Association for Computing Machinery (ACM)
record_format	dspace
spelling	mit-1721.1/1043522022-10-01T11:15:07Z Tweet2Vec: Learning Tweet Embeddings Using Character-level CNN-LSTM Encoder-Decoder Vosoughi, Soroush Vijayaraghavan, Prashanth Roy, Deb K Massachusetts Institute of Technology. Media Laboratory Program in Media Arts and Sciences (Massachusetts Institute of Technology) Vosoughi, Soroush Vosoughi, Soroush Vijayaraghavan, Prashanth Roy, Deb K We present Tweet2Vec, a novel method for generating general- purpose vector representation of tweets. The model learns tweet embeddings using character-level CNN-LSTM encoder-decoder. We trained our model on 3 million, randomly selected English-language tweets. The model was evaluated using two methods: tweet semantic similarity and tweet sentiment categorization, outperforming the previous state-of-the-art in both tasks. The evaluations demonstrate the power of the tweet embeddings generated by our model for various tweet categorization tasks. The vector representations generated by our model are generic, and hence can be applied to a variety of tasks. Though the model presented in this paper is trained on English-language tweets, the method presented can be used to learn tweet embeddings for different languages. 2016-09-20T14:25:48Z 2016-09-20T14:25:48Z 2016-07 Article http://purl.org/eprint/type/ConferencePaper 9781450340694 http://hdl.handle.net/1721.1/104352 Vosoughi, Soroush, Prashanth Vijayaraghavan, and Deb Roy. "Tweet2Vec: Learning Tweet Embeddings Using Character-level CNN-LSTM Encoder-Decoder." Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval - SIGIR ’16, July 17-21, 2016, Pisa, Italy. https://orcid.org/0000-0002-2564-8909 https://orcid.org/0000-0002-5826-1591 https://orcid.org/0000-0002-4333-7194 en_US http://dx.doi.org/10.1145/2911451.2914762 Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval - SIGIR '16 Creative Commons Attribution-Noncommercial-Share Alike http://creativecommons.org/licenses/by-nc-sa/4.0/ application/pdf Association for Computing Machinery (ACM) Vosoughi
spellingShingle	Vosoughi, Soroush Vijayaraghavan, Prashanth Roy, Deb K Tweet2Vec: Learning Tweet Embeddings Using Character-level CNN-LSTM Encoder-Decoder
title	Tweet2Vec: Learning Tweet Embeddings Using Character-level CNN-LSTM Encoder-Decoder
title_full	Tweet2Vec: Learning Tweet Embeddings Using Character-level CNN-LSTM Encoder-Decoder
title_fullStr	Tweet2Vec: Learning Tweet Embeddings Using Character-level CNN-LSTM Encoder-Decoder
title_full_unstemmed	Tweet2Vec: Learning Tweet Embeddings Using Character-level CNN-LSTM Encoder-Decoder
title_short	Tweet2Vec: Learning Tweet Embeddings Using Character-level CNN-LSTM Encoder-Decoder
title_sort	tweet2vec learning tweet embeddings using character level cnn lstm encoder decoder
url	http://hdl.handle.net/1721.1/104352 https://orcid.org/0000-0002-2564-8909 https://orcid.org/0000-0002-5826-1591 https://orcid.org/0000-0002-4333-7194
work_keys_str_mv	AT vosoughisoroush tweet2veclearningtweetembeddingsusingcharacterlevelcnnlstmencoderdecoder AT vijayaraghavanprashanth tweet2veclearningtweetembeddingsusingcharacterlevelcnnlstmencoderdecoder AT roydebk tweet2veclearningtweetembeddingsusingcharacterlevelcnnlstmencoderdecoder

Tweet2Vec: Learning Tweet Embeddings Using Character-level CNN-LSTM Encoder-Decoder

Similar Items