Multimodal image translation via deep learning inference model trained in video domain

Abstract Background Current medical image translation is implemented in the image domain. Considering the medical image acquisition is essentially a temporally continuous process, we attempt to develop a novel image translation framework via deep learning trained in video domain for generating synth...

Full description

Bibliographic Details
Main Authors:	Jiawei Fan, Zhiqiang Liu, Dong Yang, Jian Qiao, Jun Zhao, Jiazhou Wang, Weigang Hu
Format:	Article
Language:	English
Published:	BMC 2022-07-01
Series:	BMC Medical Imaging
Subjects:	Video domain Deep learning Medical image translation GAN
Online Access:	https://doi.org/10.1186/s12880-022-00854-x

_version_	1818510518809264128
author	Jiawei Fan Zhiqiang Liu Dong Yang Jian Qiao Jun Zhao Jiazhou Wang Weigang Hu
author_facet	Jiawei Fan Zhiqiang Liu Dong Yang Jian Qiao Jun Zhao Jiazhou Wang Weigang Hu
author_sort	Jiawei Fan
collection	DOAJ
description	Abstract Background Current medical image translation is implemented in the image domain. Considering the medical image acquisition is essentially a temporally continuous process, we attempt to develop a novel image translation framework via deep learning trained in video domain for generating synthesized computed tomography (CT) images from cone-beam computed tomography (CBCT) images. Methods For a proof-of-concept demonstration, CBCT and CT images from 100 patients were collected to demonstrate the feasibility and reliability of the proposed framework. The CBCT and CT images were further registered as paired samples and used as the input data for the supervised model training. A vid2vid framework based on the conditional GAN network, with carefully-designed generators, discriminators and a new spatio-temporal learning objective, was applied to realize the CBCT–CT image translation in the video domain. Four evaluation metrics, including mean absolute error (MAE), peak signal-to-noise ratio (PSNR), normalized cross-correlation (NCC), and structural similarity (SSIM), were calculated on all the real and synthetic CT images from 10 new testing patients to illustrate the model performance. Results The average values for four evaluation metrics, including MAE, PSNR, NCC, and SSIM, are 23.27 ± 5.53, 32.67 ± 1.98, 0.99 ± 0.0059, and 0.97 ± 0.028, respectively. Most of the pixel-wise hounsfield units value differences between real and synthetic CT images are within 50. The synthetic CT images have great agreement with the real CT images and the image quality is improved with lower noise and artifacts compared with CBCT images. Conclusions We developed a deep-learning-based approach to perform the medical image translation problem in the video domain. Although the feasibility and reliability of the proposed framework were demonstrated by CBCT–CT image translation, it can be easily extended to other types of medical images. The current results illustrate that it is a very promising method that may pave a new path for medical image translation research.
first_indexed	2024-12-10T23:21:05Z
format	Article
id	doaj.art-91cc80a685e441dfa172273e3d628927
institution	Directory Open Access Journal
issn	1471-2342
language	English
last_indexed	2024-12-10T23:21:05Z
publishDate	2022-07-01
publisher	BMC
record_format	Article
series	BMC Medical Imaging
spelling	doaj.art-91cc80a685e441dfa172273e3d6289272022-12-22T01:29:43ZengBMCBMC Medical Imaging1471-23422022-07-012211910.1186/s12880-022-00854-xMultimodal image translation via deep learning inference model trained in video domainJiawei Fan0Zhiqiang Liu1Dong Yang2Jian Qiao3Jun Zhao4Jiazhou Wang5Weigang Hu6Department of Radiation Oncology, Fudan University Shanghai Cancer CenterNational Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical CollegeDepartment of Radiation Oncology, Fudan University Shanghai Cancer CenterDepartment of Radiation Oncology, Fudan University Shanghai Cancer CenterDepartment of Radiation Oncology, Fudan University Shanghai Cancer CenterDepartment of Radiation Oncology, Fudan University Shanghai Cancer CenterDepartment of Radiation Oncology, Fudan University Shanghai Cancer CenterAbstract Background Current medical image translation is implemented in the image domain. Considering the medical image acquisition is essentially a temporally continuous process, we attempt to develop a novel image translation framework via deep learning trained in video domain for generating synthesized computed tomography (CT) images from cone-beam computed tomography (CBCT) images. Methods For a proof-of-concept demonstration, CBCT and CT images from 100 patients were collected to demonstrate the feasibility and reliability of the proposed framework. The CBCT and CT images were further registered as paired samples and used as the input data for the supervised model training. A vid2vid framework based on the conditional GAN network, with carefully-designed generators, discriminators and a new spatio-temporal learning objective, was applied to realize the CBCT–CT image translation in the video domain. Four evaluation metrics, including mean absolute error (MAE), peak signal-to-noise ratio (PSNR), normalized cross-correlation (NCC), and structural similarity (SSIM), were calculated on all the real and synthetic CT images from 10 new testing patients to illustrate the model performance. Results The average values for four evaluation metrics, including MAE, PSNR, NCC, and SSIM, are 23.27 ± 5.53, 32.67 ± 1.98, 0.99 ± 0.0059, and 0.97 ± 0.028, respectively. Most of the pixel-wise hounsfield units value differences between real and synthetic CT images are within 50. The synthetic CT images have great agreement with the real CT images and the image quality is improved with lower noise and artifacts compared with CBCT images. Conclusions We developed a deep-learning-based approach to perform the medical image translation problem in the video domain. Although the feasibility and reliability of the proposed framework were demonstrated by CBCT–CT image translation, it can be easily extended to other types of medical images. The current results illustrate that it is a very promising method that may pave a new path for medical image translation research.https://doi.org/10.1186/s12880-022-00854-xVideo domainDeep learningMedical image translationGAN
spellingShingle	Jiawei Fan Zhiqiang Liu Dong Yang Jian Qiao Jun Zhao Jiazhou Wang Weigang Hu Multimodal image translation via deep learning inference model trained in video domain BMC Medical Imaging Video domain Deep learning Medical image translation GAN
title	Multimodal image translation via deep learning inference model trained in video domain
title_full	Multimodal image translation via deep learning inference model trained in video domain
title_fullStr	Multimodal image translation via deep learning inference model trained in video domain
title_full_unstemmed	Multimodal image translation via deep learning inference model trained in video domain
title_short	Multimodal image translation via deep learning inference model trained in video domain
title_sort	multimodal image translation via deep learning inference model trained in video domain
topic	Video domain Deep learning Medical image translation GAN
url	https://doi.org/10.1186/s12880-022-00854-x
work_keys_str_mv	AT jiaweifan multimodalimagetranslationviadeeplearninginferencemodeltrainedinvideodomain AT zhiqiangliu multimodalimagetranslationviadeeplearninginferencemodeltrainedinvideodomain AT dongyang multimodalimagetranslationviadeeplearninginferencemodeltrainedinvideodomain AT jianqiao multimodalimagetranslationviadeeplearninginferencemodeltrainedinvideodomain AT junzhao multimodalimagetranslationviadeeplearninginferencemodeltrainedinvideodomain AT jiazhouwang multimodalimagetranslationviadeeplearninginferencemodeltrainedinvideodomain AT weiganghu multimodalimagetranslationviadeeplearninginferencemodeltrainedinvideodomain

Multimodal image translation via deep learning inference model trained in video domain

Similar Items