A Framework for Understanding Unstructured Financial Documents Using RPA and Multimodal Approach
The financial business process worldwide suffers from huge dependencies upon labor and written documents, thus making it tedious and time-consuming. In order to solve this problem, traditional robotic process automation (RPA) has recently been developed into a hyper-automation solution by combining...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-02-01
|
Series: | Electronics |
Subjects: | |
Online Access: | https://www.mdpi.com/2079-9292/12/4/939 |
_version_ | 1797621300777189376 |
---|---|
author | Seongkuk Cho Jihoon Moon Junhyeok Bae Jiwon Kang Sangwook Lee |
author_facet | Seongkuk Cho Jihoon Moon Junhyeok Bae Jiwon Kang Sangwook Lee |
author_sort | Seongkuk Cho |
collection | DOAJ |
description | The financial business process worldwide suffers from huge dependencies upon labor and written documents, thus making it tedious and time-consuming. In order to solve this problem, traditional robotic process automation (RPA) has recently been developed into a hyper-automation solution by combining computer vision (CV) and natural language processing (NLP) methods. These solutions are capable of image analysis, such as key information extraction and document classification. However, they could improve on text-rich document images and require much training data for processing multilingual documents. This study proposes a multimodal approach-based intelligent document processing framework that combines a pre-trained deep learning model with traditional RPA used in banks to automate business processes from real-world financial document images. The proposed framework can perform classification and key information extraction on a small amount of training data and analyze multilingual documents. In order to evaluate the effectiveness of the proposed framework, extensive experiments were conducted using Korean financial document images. The experimental results show the superiority of the multimodal approach for understanding financial documents and demonstrate that adequate labeling can improve performance by up to about 15%. |
first_indexed | 2024-03-11T08:55:00Z |
format | Article |
id | doaj.art-7831891b36514f118134c695997909ad |
institution | Directory Open Access Journal |
issn | 2079-9292 |
language | English |
last_indexed | 2024-03-11T08:55:00Z |
publishDate | 2023-02-01 |
publisher | MDPI AG |
record_format | Article |
series | Electronics |
spelling | doaj.art-7831891b36514f118134c695997909ad2023-11-16T20:12:23ZengMDPI AGElectronics2079-92922023-02-0112493910.3390/electronics12040939A Framework for Understanding Unstructured Financial Documents Using RPA and Multimodal ApproachSeongkuk Cho0Jihoon Moon1Junhyeok Bae2Jiwon Kang3Sangwook Lee4AI Unit, Shinhan Bank, Seoul 01056, Republic of KoreaDepartment of AI and Big Data, Soonchunhyang University, Asan 31538, Republic of KoreaAI Unit, Shinhan Bank, Seoul 01056, Republic of KoreaAI Unit, Shinhan Bank, Seoul 01056, Republic of KoreaAI Unit, Shinhan Bank, Seoul 01056, Republic of KoreaThe financial business process worldwide suffers from huge dependencies upon labor and written documents, thus making it tedious and time-consuming. In order to solve this problem, traditional robotic process automation (RPA) has recently been developed into a hyper-automation solution by combining computer vision (CV) and natural language processing (NLP) methods. These solutions are capable of image analysis, such as key information extraction and document classification. However, they could improve on text-rich document images and require much training data for processing multilingual documents. This study proposes a multimodal approach-based intelligent document processing framework that combines a pre-trained deep learning model with traditional RPA used in banks to automate business processes from real-world financial document images. The proposed framework can perform classification and key information extraction on a small amount of training data and analyze multilingual documents. In order to evaluate the effectiveness of the proposed framework, extensive experiments were conducted using Korean financial document images. The experimental results show the superiority of the multimodal approach for understanding financial documents and demonstrate that adequate labeling can improve performance by up to about 15%.https://www.mdpi.com/2079-9292/12/4/939intelligent document processingvisual-rich document understandingoptical character recognitionfinancial document analysiskey information extractionimage classification |
spellingShingle | Seongkuk Cho Jihoon Moon Junhyeok Bae Jiwon Kang Sangwook Lee A Framework for Understanding Unstructured Financial Documents Using RPA and Multimodal Approach Electronics intelligent document processing visual-rich document understanding optical character recognition financial document analysis key information extraction image classification |
title | A Framework for Understanding Unstructured Financial Documents Using RPA and Multimodal Approach |
title_full | A Framework for Understanding Unstructured Financial Documents Using RPA and Multimodal Approach |
title_fullStr | A Framework for Understanding Unstructured Financial Documents Using RPA and Multimodal Approach |
title_full_unstemmed | A Framework for Understanding Unstructured Financial Documents Using RPA and Multimodal Approach |
title_short | A Framework for Understanding Unstructured Financial Documents Using RPA and Multimodal Approach |
title_sort | framework for understanding unstructured financial documents using rpa and multimodal approach |
topic | intelligent document processing visual-rich document understanding optical character recognition financial document analysis key information extraction image classification |
url | https://www.mdpi.com/2079-9292/12/4/939 |
work_keys_str_mv | AT seongkukcho aframeworkforunderstandingunstructuredfinancialdocumentsusingrpaandmultimodalapproach AT jihoonmoon aframeworkforunderstandingunstructuredfinancialdocumentsusingrpaandmultimodalapproach AT junhyeokbae aframeworkforunderstandingunstructuredfinancialdocumentsusingrpaandmultimodalapproach AT jiwonkang aframeworkforunderstandingunstructuredfinancialdocumentsusingrpaandmultimodalapproach AT sangwooklee aframeworkforunderstandingunstructuredfinancialdocumentsusingrpaandmultimodalapproach AT seongkukcho frameworkforunderstandingunstructuredfinancialdocumentsusingrpaandmultimodalapproach AT jihoonmoon frameworkforunderstandingunstructuredfinancialdocumentsusingrpaandmultimodalapproach AT junhyeokbae frameworkforunderstandingunstructuredfinancialdocumentsusingrpaandmultimodalapproach AT jiwonkang frameworkforunderstandingunstructuredfinancialdocumentsusingrpaandmultimodalapproach AT sangwooklee frameworkforunderstandingunstructuredfinancialdocumentsusingrpaandmultimodalapproach |