Summary: | The patent document has different content for each paragraph, and the length of the document is also very long. Moreover, patent documents are classified hierarchically as multi-labels. Many works have employed deep neural architectures to classify the patent documents. Traditional document classification methods have not well represented the characteristics of entire patent document contents because they usually require a fixed input length. To address this issue, we propose a neural network-based document classification for patent documents by designing a novel multi-stage feature extraction network (MEXN), which comprise of paragraphs encoder and summarizer for all paragraphs. MEXN features analysis of the whole documents hierarchically and providing multi-labels outputs. Furthermore, MEXN preserves computing performance marginally increase. We demonstrate that the proposed method outperforms current state-of-the-art models in patent document classification tasks with multi-label classification experiments for USPD datasets.
|