Crack45K: Integration of Vision Transformer with Tubularity Flow Field (TuFF) and Sliding-Window Approach for Crack-Segmentation in Pavement Structures

Recently, deep-learning (DL)-based crack-detection systems have proven to be the method of choice for image processing-based inspection systems. However, human-like generalization remains challenging, owing to a wide variety of factors such as crack type and size. Additionally, because of their loca...

Full description

Bibliographic Details
Main Authors:	Luqman Ali, Hamad Al Jassmi, Wasif Khan, Fady Alnajjar
Format:	Article
Language:	English
Published:	MDPI AG 2022-12-01
Series:	Buildings
Subjects:	crack-detection structural-health monitoring ViT transformer deep learning machine learning pavement cracks
Online Access:	https://www.mdpi.com/2075-5309/13/1/55

_version_	1827627842153742336
author	Luqman Ali Hamad Al Jassmi Wasif Khan Fady Alnajjar
author_facet	Luqman Ali Hamad Al Jassmi Wasif Khan Fady Alnajjar
author_sort	Luqman Ali
collection	DOAJ
description	Recently, deep-learning (DL)-based crack-detection systems have proven to be the method of choice for image processing-based inspection systems. However, human-like generalization remains challenging, owing to a wide variety of factors such as crack type and size. Additionally, because of their localized receptive fields, CNNs have a high false-detection rate and perform poorly when attempting to capture the relevant areas of an image. This study aims to propose a vision-transformer-based crack-detection framework that treats image data as a succession of small patches, to retrieve global contextual information (GCI) through self-attention (SA) methods, and which addresses the CNNs’ problem of inductive biases, including the locally constrained receptive-fields and translation-invariance. The vision-transformer (ViT) classifier was tested to enhance crack classification, localization, and segmentation performance by blending with a sliding-window and tubularity-flow-field (TuFF) algorithm. Firstly, the ViT framework was trained on a custom dataset consisting of 45K images with 224 × 224 pixels resolution, and achieved accuracy, precision, recall, and F1 scores of 0.960, 0.971, 0.950, and 0.960, respectively. Secondly, the trained ViT was integrated with the sliding-window (SW) approach, to obtain a crack-localization map from large images. The SW-based ViT classifier was then merged with the TuFF algorithm, to acquire efficient crack-mapping by suppressing the unwanted regions in the last step. The robustness and adaptability of the proposed integrated-architecture were tested on new data acquired under different conditions and which were not utilized during the training and validation of the model. The proposed ViT-architecture performance was evaluated and compared with that of various state-of-the-art (SOTA) deep-learning approaches. The experimental results show that ViT equipped with a sliding-window and the TuFF algorithm can enhance real-world crack classification, localization, and segmentation performance.
first_indexed	2024-03-09T13:21:31Z
format	Article
id	doaj.art-8e796bbe1e1b43b5934d68b3c1038176
institution	Directory Open Access Journal
issn	2075-5309
language	English
last_indexed	2024-03-09T13:21:31Z
publishDate	2022-12-01
publisher	MDPI AG
record_format	Article
series	Buildings
spelling	doaj.art-8e796bbe1e1b43b5934d68b3c10381762023-11-30T21:29:15ZengMDPI AGBuildings2075-53092022-12-011315510.3390/buildings13010055Crack45K: Integration of Vision Transformer with Tubularity Flow Field (TuFF) and Sliding-Window Approach for Crack-Segmentation in Pavement StructuresLuqman Ali0Hamad Al Jassmi1Wasif Khan2Fady Alnajjar3Department of Computer Science and Software Eng., College of Information Technology, UAEU, Al Ain 15551, United Arab EmiratesEmirates Center for Mobility Research, UAEU, Al Ain 15551, United Arab EmiratesDepartment of Computer Science and Software Eng., College of Information Technology, UAEU, Al Ain 15551, United Arab EmiratesDepartment of Computer Science and Software Eng., College of Information Technology, UAEU, Al Ain 15551, United Arab EmiratesRecently, deep-learning (DL)-based crack-detection systems have proven to be the method of choice for image processing-based inspection systems. However, human-like generalization remains challenging, owing to a wide variety of factors such as crack type and size. Additionally, because of their localized receptive fields, CNNs have a high false-detection rate and perform poorly when attempting to capture the relevant areas of an image. This study aims to propose a vision-transformer-based crack-detection framework that treats image data as a succession of small patches, to retrieve global contextual information (GCI) through self-attention (SA) methods, and which addresses the CNNs’ problem of inductive biases, including the locally constrained receptive-fields and translation-invariance. The vision-transformer (ViT) classifier was tested to enhance crack classification, localization, and segmentation performance by blending with a sliding-window and tubularity-flow-field (TuFF) algorithm. Firstly, the ViT framework was trained on a custom dataset consisting of 45K images with 224 × 224 pixels resolution, and achieved accuracy, precision, recall, and F1 scores of 0.960, 0.971, 0.950, and 0.960, respectively. Secondly, the trained ViT was integrated with the sliding-window (SW) approach, to obtain a crack-localization map from large images. The SW-based ViT classifier was then merged with the TuFF algorithm, to acquire efficient crack-mapping by suppressing the unwanted regions in the last step. The robustness and adaptability of the proposed integrated-architecture were tested on new data acquired under different conditions and which were not utilized during the training and validation of the model. The proposed ViT-architecture performance was evaluated and compared with that of various state-of-the-art (SOTA) deep-learning approaches. The experimental results show that ViT equipped with a sliding-window and the TuFF algorithm can enhance real-world crack classification, localization, and segmentation performance.https://www.mdpi.com/2075-5309/13/1/55crack-detectionstructural-health monitoringViT transformerdeep learningmachine learningpavement cracks
spellingShingle	Luqman Ali Hamad Al Jassmi Wasif Khan Fady Alnajjar Crack45K: Integration of Vision Transformer with Tubularity Flow Field (TuFF) and Sliding-Window Approach for Crack-Segmentation in Pavement Structures Buildings crack-detection structural-health monitoring ViT transformer deep learning machine learning pavement cracks
title	Crack45K: Integration of Vision Transformer with Tubularity Flow Field (TuFF) and Sliding-Window Approach for Crack-Segmentation in Pavement Structures
title_full	Crack45K: Integration of Vision Transformer with Tubularity Flow Field (TuFF) and Sliding-Window Approach for Crack-Segmentation in Pavement Structures
title_fullStr	Crack45K: Integration of Vision Transformer with Tubularity Flow Field (TuFF) and Sliding-Window Approach for Crack-Segmentation in Pavement Structures
title_full_unstemmed	Crack45K: Integration of Vision Transformer with Tubularity Flow Field (TuFF) and Sliding-Window Approach for Crack-Segmentation in Pavement Structures
title_short	Crack45K: Integration of Vision Transformer with Tubularity Flow Field (TuFF) and Sliding-Window Approach for Crack-Segmentation in Pavement Structures
title_sort	crack45k integration of vision transformer with tubularity flow field tuff and sliding window approach for crack segmentation in pavement structures
topic	crack-detection structural-health monitoring ViT transformer deep learning machine learning pavement cracks
url	https://www.mdpi.com/2075-5309/13/1/55
work_keys_str_mv	AT luqmanali crack45kintegrationofvisiontransformerwithtubularityflowfieldtuffandslidingwindowapproachforcracksegmentationinpavementstructures AT hamadaljassmi crack45kintegrationofvisiontransformerwithtubularityflowfieldtuffandslidingwindowapproachforcracksegmentationinpavementstructures AT wasifkhan crack45kintegrationofvisiontransformerwithtubularityflowfieldtuffandslidingwindowapproachforcracksegmentationinpavementstructures AT fadyalnajjar crack45kintegrationofvisiontransformerwithtubularityflowfieldtuffandslidingwindowapproachforcracksegmentationinpavementstructures

Crack45K: Integration of Vision Transformer with Tubularity Flow Field (TuFF) and Sliding-Window Approach for Crack-Segmentation in Pavement Structures

Similar Items