A commentary of Multi-skilled AI in MIT Technology Review 2021

Towards the end of 2012, artificial intelligence (AI) scientists first figured out how to impart “vision” to neural networks. Later, they also mastered how to enable neural networks to mimic human reasoning, hearing, speaking, and writing. Although AI has become similar to or even superior to humans...

Full description

Bibliographic Details
Main Author:	Rongrong Ji
Format:	Article
Language:	English
Published:	KeAi Communications Co. Ltd. 2021-11-01
Series:	Fundamental Research
Online Access:	http://www.sciencedirect.com/science/article/pii/S2667325821002223

_version_	1828085989490294784
author	Rongrong Ji
author_facet	Rongrong Ji
author_sort	Rongrong Ji
collection	DOAJ
description	Towards the end of 2012, artificial intelligence (AI) scientists first figured out how to impart “vision” to neural networks. Later, they also mastered how to enable neural networks to mimic human reasoning, hearing, speaking, and writing. Although AI has become similar to or even superior to humans in accomplishing specific tasks, it still does not possess the “flexibility” of the human brain, i.e., the human brain can apply skills learned in one situation to another.Taking cues from the growth process of children, we think about the following question. If senses and language can be combined, and AI can perform at a level closer to humans in terms of collecting and processing information, will it be able to develop an understanding of the world? The answer is yes. “Multi-modal” systems, which can simultaneously acquire human senses and language, thereby generating significantly stronger AI, and making it easier for AI to adapt to new situations and solve new problems. Hence, such algorithms can be used to solve more complex problems, or be implanted into robots for communication and collaboration with humans in our daily lives. In September 2020, researchers from the Allen Institute for AI (AI2) created a model that could generate images from captions, thus demonstrating the ability of the algorithm to associate words with visual information. In November, scientists from the University of North Carolina at Chapel Hill developed a method of incorporating images into existing language models, which significantly enhanced the ability of the model to comprehend text. Early in 2021, OpenAI extended GPT-3 and released two visual language models: one associates the objects in the image with the words in the descriptions, and another one generates a digital image based on the combination of concepts it has learned. The progress made by “multi-modal” systems, in the long run, will help break through the limits of AI. It will not only unlock new AI applications, but also make these applications safer and more reliable. More sophisticated multi-modal systems will also aid the development of more advanced robot assistants. Ultimately, multi-modal systems may prove to be the first AI that we can trust.①① Original source in Chinese: R. Ji, Multi-skilled AI, Bulletin of National Natural Science Foundation of China. 35 (3) (2021) 413-415.
first_indexed	2024-04-11T04:48:38Z
format	Article
id	doaj.art-3b6a645b9e1a49a5a2aafeb3abf6fb99
institution	Directory Open Access Journal
issn	2667-3258
language	English
last_indexed	2024-04-11T04:48:38Z
publishDate	2021-11-01
publisher	KeAi Communications Co. Ltd.
record_format	Article
series	Fundamental Research
spelling	doaj.art-3b6a645b9e1a49a5a2aafeb3abf6fb992022-12-27T04:42:43ZengKeAi Communications Co. Ltd.Fundamental Research2667-32582021-11-0116844845A commentary of Multi-skilled AI in MIT Technology Review 2021Rongrong Ji0School of Information Science and Engineering, Xiamen University, Xiamen 361005, ChinaTowards the end of 2012, artificial intelligence (AI) scientists first figured out how to impart “vision” to neural networks. Later, they also mastered how to enable neural networks to mimic human reasoning, hearing, speaking, and writing. Although AI has become similar to or even superior to humans in accomplishing specific tasks, it still does not possess the “flexibility” of the human brain, i.e., the human brain can apply skills learned in one situation to another.Taking cues from the growth process of children, we think about the following question. If senses and language can be combined, and AI can perform at a level closer to humans in terms of collecting and processing information, will it be able to develop an understanding of the world? The answer is yes. “Multi-modal” systems, which can simultaneously acquire human senses and language, thereby generating significantly stronger AI, and making it easier for AI to adapt to new situations and solve new problems. Hence, such algorithms can be used to solve more complex problems, or be implanted into robots for communication and collaboration with humans in our daily lives. In September 2020, researchers from the Allen Institute for AI (AI2) created a model that could generate images from captions, thus demonstrating the ability of the algorithm to associate words with visual information. In November, scientists from the University of North Carolina at Chapel Hill developed a method of incorporating images into existing language models, which significantly enhanced the ability of the model to comprehend text. Early in 2021, OpenAI extended GPT-3 and released two visual language models: one associates the objects in the image with the words in the descriptions, and another one generates a digital image based on the combination of concepts it has learned. The progress made by “multi-modal” systems, in the long run, will help break through the limits of AI. It will not only unlock new AI applications, but also make these applications safer and more reliable. More sophisticated multi-modal systems will also aid the development of more advanced robot assistants. Ultimately, multi-modal systems may prove to be the first AI that we can trust.①① Original source in Chinese: R. Ji, Multi-skilled AI, Bulletin of National Natural Science Foundation of China. 35 (3) (2021) 413-415.http://www.sciencedirect.com/science/article/pii/S2667325821002223
spellingShingle	Rongrong Ji A commentary of Multi-skilled AI in MIT Technology Review 2021 Fundamental Research
title	A commentary of Multi-skilled AI in MIT Technology Review 2021
title_full	A commentary of Multi-skilled AI in MIT Technology Review 2021
title_fullStr	A commentary of Multi-skilled AI in MIT Technology Review 2021
title_full_unstemmed	A commentary of Multi-skilled AI in MIT Technology Review 2021
title_short	A commentary of Multi-skilled AI in MIT Technology Review 2021
title_sort	commentary of multi skilled ai in mit technology review 2021
url	http://www.sciencedirect.com/science/article/pii/S2667325821002223
work_keys_str_mv	AT rongrongji acommentaryofmultiskilledaiinmittechnologyreview2021 AT rongrongji commentaryofmultiskilledaiinmittechnologyreview2021

A commentary of Multi-skilled AI in MIT Technology Review 2021

Similar Items