A Framework for Understanding Unstructured Financial Documents Using RPA and Multimodal Approach

The financial business process worldwide suffers from huge dependencies upon labor and written documents, thus making it tedious and time-consuming. In order to solve this problem, traditional robotic process automation (RPA) has recently been developed into a hyper-automation solution by combining...

Full description

Bibliographic Details
Main Authors: Seongkuk Cho, Jihoon Moon, Junhyeok Bae, Jiwon Kang, Sangwook Lee
Format: Article
Language:English
Published: MDPI AG 2023-02-01
Series:Electronics
Subjects:
Online Access:https://www.mdpi.com/2079-9292/12/4/939
_version_ 1797621300777189376
author Seongkuk Cho
Jihoon Moon
Junhyeok Bae
Jiwon Kang
Sangwook Lee
author_facet Seongkuk Cho
Jihoon Moon
Junhyeok Bae
Jiwon Kang
Sangwook Lee
author_sort Seongkuk Cho
collection DOAJ
description The financial business process worldwide suffers from huge dependencies upon labor and written documents, thus making it tedious and time-consuming. In order to solve this problem, traditional robotic process automation (RPA) has recently been developed into a hyper-automation solution by combining computer vision (CV) and natural language processing (NLP) methods. These solutions are capable of image analysis, such as key information extraction and document classification. However, they could improve on text-rich document images and require much training data for processing multilingual documents. This study proposes a multimodal approach-based intelligent document processing framework that combines a pre-trained deep learning model with traditional RPA used in banks to automate business processes from real-world financial document images. The proposed framework can perform classification and key information extraction on a small amount of training data and analyze multilingual documents. In order to evaluate the effectiveness of the proposed framework, extensive experiments were conducted using Korean financial document images. The experimental results show the superiority of the multimodal approach for understanding financial documents and demonstrate that adequate labeling can improve performance by up to about 15%.
first_indexed 2024-03-11T08:55:00Z
format Article
id doaj.art-7831891b36514f118134c695997909ad
institution Directory Open Access Journal
issn 2079-9292
language English
last_indexed 2024-03-11T08:55:00Z
publishDate 2023-02-01
publisher MDPI AG
record_format Article
series Electronics
spelling doaj.art-7831891b36514f118134c695997909ad2023-11-16T20:12:23ZengMDPI AGElectronics2079-92922023-02-0112493910.3390/electronics12040939A Framework for Understanding Unstructured Financial Documents Using RPA and Multimodal ApproachSeongkuk Cho0Jihoon Moon1Junhyeok Bae2Jiwon Kang3Sangwook Lee4AI Unit, Shinhan Bank, Seoul 01056, Republic of KoreaDepartment of AI and Big Data, Soonchunhyang University, Asan 31538, Republic of KoreaAI Unit, Shinhan Bank, Seoul 01056, Republic of KoreaAI Unit, Shinhan Bank, Seoul 01056, Republic of KoreaAI Unit, Shinhan Bank, Seoul 01056, Republic of KoreaThe financial business process worldwide suffers from huge dependencies upon labor and written documents, thus making it tedious and time-consuming. In order to solve this problem, traditional robotic process automation (RPA) has recently been developed into a hyper-automation solution by combining computer vision (CV) and natural language processing (NLP) methods. These solutions are capable of image analysis, such as key information extraction and document classification. However, they could improve on text-rich document images and require much training data for processing multilingual documents. This study proposes a multimodal approach-based intelligent document processing framework that combines a pre-trained deep learning model with traditional RPA used in banks to automate business processes from real-world financial document images. The proposed framework can perform classification and key information extraction on a small amount of training data and analyze multilingual documents. In order to evaluate the effectiveness of the proposed framework, extensive experiments were conducted using Korean financial document images. The experimental results show the superiority of the multimodal approach for understanding financial documents and demonstrate that adequate labeling can improve performance by up to about 15%.https://www.mdpi.com/2079-9292/12/4/939intelligent document processingvisual-rich document understandingoptical character recognitionfinancial document analysiskey information extractionimage classification
spellingShingle Seongkuk Cho
Jihoon Moon
Junhyeok Bae
Jiwon Kang
Sangwook Lee
A Framework for Understanding Unstructured Financial Documents Using RPA and Multimodal Approach
Electronics
intelligent document processing
visual-rich document understanding
optical character recognition
financial document analysis
key information extraction
image classification
title A Framework for Understanding Unstructured Financial Documents Using RPA and Multimodal Approach
title_full A Framework for Understanding Unstructured Financial Documents Using RPA and Multimodal Approach
title_fullStr A Framework for Understanding Unstructured Financial Documents Using RPA and Multimodal Approach
title_full_unstemmed A Framework for Understanding Unstructured Financial Documents Using RPA and Multimodal Approach
title_short A Framework for Understanding Unstructured Financial Documents Using RPA and Multimodal Approach
title_sort framework for understanding unstructured financial documents using rpa and multimodal approach
topic intelligent document processing
visual-rich document understanding
optical character recognition
financial document analysis
key information extraction
image classification
url https://www.mdpi.com/2079-9292/12/4/939
work_keys_str_mv AT seongkukcho aframeworkforunderstandingunstructuredfinancialdocumentsusingrpaandmultimodalapproach
AT jihoonmoon aframeworkforunderstandingunstructuredfinancialdocumentsusingrpaandmultimodalapproach
AT junhyeokbae aframeworkforunderstandingunstructuredfinancialdocumentsusingrpaandmultimodalapproach
AT jiwonkang aframeworkforunderstandingunstructuredfinancialdocumentsusingrpaandmultimodalapproach
AT sangwooklee aframeworkforunderstandingunstructuredfinancialdocumentsusingrpaandmultimodalapproach
AT seongkukcho frameworkforunderstandingunstructuredfinancialdocumentsusingrpaandmultimodalapproach
AT jihoonmoon frameworkforunderstandingunstructuredfinancialdocumentsusingrpaandmultimodalapproach
AT junhyeokbae frameworkforunderstandingunstructuredfinancialdocumentsusingrpaandmultimodalapproach
AT jiwonkang frameworkforunderstandingunstructuredfinancialdocumentsusingrpaandmultimodalapproach
AT sangwooklee frameworkforunderstandingunstructuredfinancialdocumentsusingrpaandmultimodalapproach