An End-to-End Named Entity Recognition Platform for Vietnamese Real Estate Advertisement Posts and Analytical Applications

The volume and complexity of publicly available real estate data have been snowballing. As a result, information extraction and processing have become increasingly challenging and essential for many PropTech (Property Technology) companies worldwide. The challenges are even more pronounced with lang...

Full description

Bibliographic Details
Main Authors: Binh T. Nguyen, Tung Tran Nguyen Doan, Son Thanh Huynh, Khanh Quoc Tran, An Trong Nguyen, An Tran-Hoai Le, Anh Minh Tran, Nhi Ho, Trung T. Nguyen, Dang T. Huynh
Format: Article
Language:English
Published: IEEE 2022-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9846984/
_version_ 1811183093928689664
author Binh T. Nguyen
Tung Tran Nguyen Doan
Son Thanh Huynh
Khanh Quoc Tran
An Trong Nguyen
An Tran-Hoai Le
Anh Minh Tran
Nhi Ho
Trung T. Nguyen
Dang T. Huynh
author_facet Binh T. Nguyen
Tung Tran Nguyen Doan
Son Thanh Huynh
Khanh Quoc Tran
An Trong Nguyen
An Tran-Hoai Le
Anh Minh Tran
Nhi Ho
Trung T. Nguyen
Dang T. Huynh
author_sort Binh T. Nguyen
collection DOAJ
description The volume and complexity of publicly available real estate data have been snowballing. As a result, information extraction and processing have become increasingly challenging and essential for many PropTech (Property Technology) companies worldwide. The challenges are even more pronounced with languages other than English, such as Vietnamese, where few studies in this field have taken place. This paper presents an end-to-end framework for automatically collecting real estate advertisement posts from different data sources, extracting useful information, and storing computed data into proper data warehouses and data marts for the Vietnamese advertisement posts in real estate. After that, one can serve aggregated data for other descriptive and predictive analytics. We combine two models for constructing the most appropriate extraction step: Noise Filtering and Named Entity Recognition (NER). These models can help process initial input data and extract all helpful information. The experiment results show that using <inline-formula> <tex-math notation="LaTeX">$\text{PhoBERT}_{large}$ </tex-math></inline-formula> can achieve the best performance compared to other approaches. Furthermore, we can obtain the corresponding F1 scores of the Noise filtering module and the NER module as 0.8697 and 0.8996, respectively. Finally, we utilize Superset for implementing analytic dashboards to visualize the predicted results and serve for further analysis and management processes.
first_indexed 2024-04-11T09:41:03Z
format Article
id doaj.art-7de9011ea16144db9e0f9a4038993f16
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-04-11T09:41:03Z
publishDate 2022-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-7de9011ea16144db9e0f9a4038993f162022-12-22T04:31:11ZengIEEEIEEE Access2169-35362022-01-0110876818769710.1109/ACCESS.2022.31954969846984An End-to-End Named Entity Recognition Platform for Vietnamese Real Estate Advertisement Posts and Analytical ApplicationsBinh T. Nguyen0https://orcid.org/0000-0001-5249-9702Tung Tran Nguyen Doan1https://orcid.org/0000-0003-4659-6164Son Thanh Huynh2Khanh Quoc Tran3https://orcid.org/0000-0003-1288-8003An Trong Nguyen4https://orcid.org/0000-0002-7782-8389An Tran-Hoai Le5https://orcid.org/0000-0002-0521-963XAnh Minh Tran6Nhi Ho7Trung T. Nguyen8Dang T. Huynh9Department of Computer Science, Faculty of Mathematics and Computer Science, Vietnam National University Ho Chi Minh City (VNUHCM)&#x2014;University of Science, Ho Chi Minh City, VietnamAISIA Research Laboratory, Ho Chi Minh City, VietnamDepartment of Computer Science, Faculty of Mathematics and Computer Science, Vietnam National University Ho Chi Minh City (VNUHCM)&#x2014;University of Science, Ho Chi Minh City, VietnamVietnam National University Ho Chi Minh City (VNUHCM), Ho Chi Minh City, VietnamVietnam National University Ho Chi Minh City (VNUHCM), Ho Chi Minh City, VietnamVietnam National University Ho Chi Minh City (VNUHCM), Ho Chi Minh City, VietnamDepartment of Computer Science, Faculty of Mathematics and Computer Science, Vietnam National University Ho Chi Minh City (VNUHCM)&#x2014;University of Science, Ho Chi Minh City, VietnamHung Thinh Corporation, Ho Chi Minh City, VietnamHung Thinh Corporation, Ho Chi Minh City, VietnamDepartment of Computer Science, Faculty of Mathematics and Computer Science, Vietnam National University Ho Chi Minh City (VNUHCM)&#x2014;University of Science, Ho Chi Minh City, VietnamThe volume and complexity of publicly available real estate data have been snowballing. As a result, information extraction and processing have become increasingly challenging and essential for many PropTech (Property Technology) companies worldwide. The challenges are even more pronounced with languages other than English, such as Vietnamese, where few studies in this field have taken place. This paper presents an end-to-end framework for automatically collecting real estate advertisement posts from different data sources, extracting useful information, and storing computed data into proper data warehouses and data marts for the Vietnamese advertisement posts in real estate. After that, one can serve aggregated data for other descriptive and predictive analytics. We combine two models for constructing the most appropriate extraction step: Noise Filtering and Named Entity Recognition (NER). These models can help process initial input data and extract all helpful information. The experiment results show that using <inline-formula> <tex-math notation="LaTeX">$\text{PhoBERT}_{large}$ </tex-math></inline-formula> can achieve the best performance compared to other approaches. Furthermore, we can obtain the corresponding F1 scores of the Noise filtering module and the NER module as 0.8697 and 0.8996, respectively. Finally, we utilize Superset for implementing analytic dashboards to visualize the predicted results and serve for further analysis and management processes.https://ieeexplore.ieee.org/document/9846984/Information extractioninformation retrieval and text miningNLP applications
spellingShingle Binh T. Nguyen
Tung Tran Nguyen Doan
Son Thanh Huynh
Khanh Quoc Tran
An Trong Nguyen
An Tran-Hoai Le
Anh Minh Tran
Nhi Ho
Trung T. Nguyen
Dang T. Huynh
An End-to-End Named Entity Recognition Platform for Vietnamese Real Estate Advertisement Posts and Analytical Applications
IEEE Access
Information extraction
information retrieval and text mining
NLP applications
title An End-to-End Named Entity Recognition Platform for Vietnamese Real Estate Advertisement Posts and Analytical Applications
title_full An End-to-End Named Entity Recognition Platform for Vietnamese Real Estate Advertisement Posts and Analytical Applications
title_fullStr An End-to-End Named Entity Recognition Platform for Vietnamese Real Estate Advertisement Posts and Analytical Applications
title_full_unstemmed An End-to-End Named Entity Recognition Platform for Vietnamese Real Estate Advertisement Posts and Analytical Applications
title_short An End-to-End Named Entity Recognition Platform for Vietnamese Real Estate Advertisement Posts and Analytical Applications
title_sort end to end named entity recognition platform for vietnamese real estate advertisement posts and analytical applications
topic Information extraction
information retrieval and text mining
NLP applications
url https://ieeexplore.ieee.org/document/9846984/
work_keys_str_mv AT binhtnguyen anendtoendnamedentityrecognitionplatformforvietnameserealestateadvertisementpostsandanalyticalapplications
AT tungtrannguyendoan anendtoendnamedentityrecognitionplatformforvietnameserealestateadvertisementpostsandanalyticalapplications
AT sonthanhhuynh anendtoendnamedentityrecognitionplatformforvietnameserealestateadvertisementpostsandanalyticalapplications
AT khanhquoctran anendtoendnamedentityrecognitionplatformforvietnameserealestateadvertisementpostsandanalyticalapplications
AT antrongnguyen anendtoendnamedentityrecognitionplatformforvietnameserealestateadvertisementpostsandanalyticalapplications
AT antranhoaile anendtoendnamedentityrecognitionplatformforvietnameserealestateadvertisementpostsandanalyticalapplications
AT anhminhtran anendtoendnamedentityrecognitionplatformforvietnameserealestateadvertisementpostsandanalyticalapplications
AT nhiho anendtoendnamedentityrecognitionplatformforvietnameserealestateadvertisementpostsandanalyticalapplications
AT trungtnguyen anendtoendnamedentityrecognitionplatformforvietnameserealestateadvertisementpostsandanalyticalapplications
AT dangthuynh anendtoendnamedentityrecognitionplatformforvietnameserealestateadvertisementpostsandanalyticalapplications
AT binhtnguyen endtoendnamedentityrecognitionplatformforvietnameserealestateadvertisementpostsandanalyticalapplications
AT tungtrannguyendoan endtoendnamedentityrecognitionplatformforvietnameserealestateadvertisementpostsandanalyticalapplications
AT sonthanhhuynh endtoendnamedentityrecognitionplatformforvietnameserealestateadvertisementpostsandanalyticalapplications
AT khanhquoctran endtoendnamedentityrecognitionplatformforvietnameserealestateadvertisementpostsandanalyticalapplications
AT antrongnguyen endtoendnamedentityrecognitionplatformforvietnameserealestateadvertisementpostsandanalyticalapplications
AT antranhoaile endtoendnamedentityrecognitionplatformforvietnameserealestateadvertisementpostsandanalyticalapplications
AT anhminhtran endtoendnamedentityrecognitionplatformforvietnameserealestateadvertisementpostsandanalyticalapplications
AT nhiho endtoendnamedentityrecognitionplatformforvietnameserealestateadvertisementpostsandanalyticalapplications
AT trungtnguyen endtoendnamedentityrecognitionplatformforvietnameserealestateadvertisementpostsandanalyticalapplications
AT dangthuynh endtoendnamedentityrecognitionplatformforvietnameserealestateadvertisementpostsandanalyticalapplications