Technology Forecasting Using Data Mining and Semantics: First Annual Report

The planning and management of research and development is a challenging process which is compounded by the large amounts of information which is available. The goal of this project is to mine science and technology databases for patterns and trends which facilitate the formation of research strateg...

Full description

Bibliographic Details
Main Authors: Woon, Wei Lee, Madnick, Stuart E., Firat, Ayse, Ziegler, Blaine, Seshasai, Satwik
Format: Working Paper
Language:en_US
Published: Massachusetts Institute of Technology. Engineering Systems Division 2016
Online Access:http://hdl.handle.net/1721.1/102838
_version_ 1826191540820639744
author Woon, Wei Lee
Madnick, Stuart E.
Firat, Ayse
Ziegler, Blaine
Seshasai, Satwik
author_facet Woon, Wei Lee
Madnick, Stuart E.
Firat, Ayse
Ziegler, Blaine
Seshasai, Satwik
author_sort Woon, Wei Lee
collection MIT
description The planning and management of research and development is a challenging process which is compounded by the large amounts of information which is available. The goal of this project is to mine science and technology databases for patterns and trends which facilitate the formation of research strategies. Examples of the types of information sources which we exploit are diverse and include academic journals, patents, blogs and news stories. The intended outputs of the project include growth forecasts for various technological sectors (with an emphasis on sustainable energy), an improved understanding of the underlying research landscape, as well as the identification of influential researchers or research groups. This paper focuses on the development of techniques to both organize and visualize the data in a way which reflects the semantic relationships between keywords. We studied the use of the joint term frequencies of pairs of keywords, as a means of characterizing this semantic relationship – this is based on the intuition that terms which frequently appear together are more likely to be closely related. Some of the results reported herein describe: (1) Using appropriate tools and methods, exploitable patterns and information can certainly be extracted from publicly available databases, (2) Adaptation of the Normalized Google Distance (NGD) formalism can provide measures of keyword distances that facilitate keyword clustering and hierarchical visualization, (3) Further adaptation of the NGD formalism can be used to provide an asymmetric measure of keyword distances to allow the automatic creation of a keyword taxonomy, and (4) Adaptation of the Latent Semantic Approach (LSA) can be used to identify concepts underlying collections of keywords.
first_indexed 2024-09-23T08:57:46Z
format Working Paper
id mit-1721.1/102838
institution Massachusetts Institute of Technology
language en_US
last_indexed 2024-09-23T08:57:46Z
publishDate 2016
publisher Massachusetts Institute of Technology. Engineering Systems Division
record_format dspace
spelling mit-1721.1/1028382019-04-10T19:58:41Z Technology Forecasting Using Data Mining and Semantics: First Annual Report Woon, Wei Lee Madnick, Stuart E. Firat, Ayse Ziegler, Blaine Seshasai, Satwik The planning and management of research and development is a challenging process which is compounded by the large amounts of information which is available. The goal of this project is to mine science and technology databases for patterns and trends which facilitate the formation of research strategies. Examples of the types of information sources which we exploit are diverse and include academic journals, patents, blogs and news stories. The intended outputs of the project include growth forecasts for various technological sectors (with an emphasis on sustainable energy), an improved understanding of the underlying research landscape, as well as the identification of influential researchers or research groups. This paper focuses on the development of techniques to both organize and visualize the data in a way which reflects the semantic relationships between keywords. We studied the use of the joint term frequencies of pairs of keywords, as a means of characterizing this semantic relationship – this is based on the intuition that terms which frequently appear together are more likely to be closely related. Some of the results reported herein describe: (1) Using appropriate tools and methods, exploitable patterns and information can certainly be extracted from publicly available databases, (2) Adaptation of the Normalized Google Distance (NGD) formalism can provide measures of keyword distances that facilitate keyword clustering and hierarchical visualization, (3) Further adaptation of the NGD formalism can be used to provide an asymmetric measure of keyword distances to allow the automatic creation of a keyword taxonomy, and (4) Adaptation of the Latent Semantic Approach (LSA) can be used to identify concepts underlying collections of keywords. 2016-06-02T15:43:28Z 2016-06-02T15:43:28Z 2009-04 Working Paper http://hdl.handle.net/1721.1/102838 en_US ESD Working Papers;ESD-WP-2009-04 application/pdf Massachusetts Institute of Technology. Engineering Systems Division
spellingShingle Woon, Wei Lee
Madnick, Stuart E.
Firat, Ayse
Ziegler, Blaine
Seshasai, Satwik
Technology Forecasting Using Data Mining and Semantics: First Annual Report
title Technology Forecasting Using Data Mining and Semantics: First Annual Report
title_full Technology Forecasting Using Data Mining and Semantics: First Annual Report
title_fullStr Technology Forecasting Using Data Mining and Semantics: First Annual Report
title_full_unstemmed Technology Forecasting Using Data Mining and Semantics: First Annual Report
title_short Technology Forecasting Using Data Mining and Semantics: First Annual Report
title_sort technology forecasting using data mining and semantics first annual report
url http://hdl.handle.net/1721.1/102838
work_keys_str_mv AT woonweilee technologyforecastingusingdataminingandsemanticsfirstannualreport
AT madnickstuarte technologyforecastingusingdataminingandsemanticsfirstannualreport
AT firatayse technologyforecastingusingdataminingandsemanticsfirstannualreport
AT zieglerblaine technologyforecastingusingdataminingandsemanticsfirstannualreport
AT seshasaisatwik technologyforecastingusingdataminingandsemanticsfirstannualreport