Technology Forecasting Using Data Mining and Semantics: First Annual Report
The planning and management of research and development is a challenging process which is compounded by the large amounts of information which is available. The goal of this project is to mine science and technology databases for patterns and trends which facilitate the formation of research strateg...
Main Authors: | , , , , |
---|---|
Format: | Working Paper |
Language: | en_US |
Published: |
Massachusetts Institute of Technology. Engineering Systems Division
2016
|
Online Access: | http://hdl.handle.net/1721.1/102838 |
_version_ | 1826191540820639744 |
---|---|
author | Woon, Wei Lee Madnick, Stuart E. Firat, Ayse Ziegler, Blaine Seshasai, Satwik |
author_facet | Woon, Wei Lee Madnick, Stuart E. Firat, Ayse Ziegler, Blaine Seshasai, Satwik |
author_sort | Woon, Wei Lee |
collection | MIT |
description | The planning and management of research and development is a challenging process which is compounded by the large amounts of information which is available. The goal of this project is to mine science and technology databases for patterns and trends which facilitate the formation of research strategies. Examples of the types of information sources which we exploit are diverse and include academic journals, patents, blogs and news stories. The intended outputs of the project include growth forecasts for various technological sectors (with an emphasis on sustainable energy), an improved understanding of the underlying research landscape, as well as the identification of influential researchers or research groups.
This paper focuses on the development of techniques to both organize and visualize the data in a way which reflects the semantic relationships between keywords. We studied the use of the joint term frequencies of pairs of keywords, as a means of characterizing this semantic relationship – this is based on the intuition that terms which frequently appear together are more likely to be closely related. Some of the results reported herein describe: (1) Using appropriate tools and methods, exploitable patterns and information can certainly be extracted from publicly available databases, (2) Adaptation of the Normalized Google Distance (NGD) formalism can provide measures of keyword distances that facilitate keyword clustering and hierarchical visualization, (3) Further adaptation of the NGD formalism can be used to provide an asymmetric measure of keyword distances to allow the automatic creation of a keyword taxonomy, and (4) Adaptation of the Latent Semantic Approach (LSA) can be used to identify concepts underlying collections of keywords. |
first_indexed | 2024-09-23T08:57:46Z |
format | Working Paper |
id | mit-1721.1/102838 |
institution | Massachusetts Institute of Technology |
language | en_US |
last_indexed | 2024-09-23T08:57:46Z |
publishDate | 2016 |
publisher | Massachusetts Institute of Technology. Engineering Systems Division |
record_format | dspace |
spelling | mit-1721.1/1028382019-04-10T19:58:41Z Technology Forecasting Using Data Mining and Semantics: First Annual Report Woon, Wei Lee Madnick, Stuart E. Firat, Ayse Ziegler, Blaine Seshasai, Satwik The planning and management of research and development is a challenging process which is compounded by the large amounts of information which is available. The goal of this project is to mine science and technology databases for patterns and trends which facilitate the formation of research strategies. Examples of the types of information sources which we exploit are diverse and include academic journals, patents, blogs and news stories. The intended outputs of the project include growth forecasts for various technological sectors (with an emphasis on sustainable energy), an improved understanding of the underlying research landscape, as well as the identification of influential researchers or research groups. This paper focuses on the development of techniques to both organize and visualize the data in a way which reflects the semantic relationships between keywords. We studied the use of the joint term frequencies of pairs of keywords, as a means of characterizing this semantic relationship – this is based on the intuition that terms which frequently appear together are more likely to be closely related. Some of the results reported herein describe: (1) Using appropriate tools and methods, exploitable patterns and information can certainly be extracted from publicly available databases, (2) Adaptation of the Normalized Google Distance (NGD) formalism can provide measures of keyword distances that facilitate keyword clustering and hierarchical visualization, (3) Further adaptation of the NGD formalism can be used to provide an asymmetric measure of keyword distances to allow the automatic creation of a keyword taxonomy, and (4) Adaptation of the Latent Semantic Approach (LSA) can be used to identify concepts underlying collections of keywords. 2016-06-02T15:43:28Z 2016-06-02T15:43:28Z 2009-04 Working Paper http://hdl.handle.net/1721.1/102838 en_US ESD Working Papers;ESD-WP-2009-04 application/pdf Massachusetts Institute of Technology. Engineering Systems Division |
spellingShingle | Woon, Wei Lee Madnick, Stuart E. Firat, Ayse Ziegler, Blaine Seshasai, Satwik Technology Forecasting Using Data Mining and Semantics: First Annual Report |
title | Technology Forecasting Using Data Mining and Semantics: First Annual Report |
title_full | Technology Forecasting Using Data Mining and Semantics: First Annual Report |
title_fullStr | Technology Forecasting Using Data Mining and Semantics: First Annual Report |
title_full_unstemmed | Technology Forecasting Using Data Mining and Semantics: First Annual Report |
title_short | Technology Forecasting Using Data Mining and Semantics: First Annual Report |
title_sort | technology forecasting using data mining and semantics first annual report |
url | http://hdl.handle.net/1721.1/102838 |
work_keys_str_mv | AT woonweilee technologyforecastingusingdataminingandsemanticsfirstannualreport AT madnickstuarte technologyforecastingusingdataminingandsemanticsfirstannualreport AT firatayse technologyforecastingusingdataminingandsemanticsfirstannualreport AT zieglerblaine technologyforecastingusingdataminingandsemanticsfirstannualreport AT seshasaisatwik technologyforecastingusingdataminingandsemanticsfirstannualreport |