The observed preprocessing strategies for doing automatic text summarizing

Muhammad Farhan Juna, Mardhiya Hayaty

Abstract


It is challenging for humans to keep up with the rapid creation of digital information due to the explosion of digital information. A written document can be analyzed to extract meaningful information using automatic text summarization. This research proposes 16 different experimental settings in which the model developed by IndoBERT will be applied in order to answer the question of how much of an impact preprocessing has on the quality of summaries produced by automatic text summarization. In order to answer this question, the researchers have devised this study. In this study, we will explicitly talk about preprocessing strategies by conducting tests with different combinations of preprocessing techniques. These techniques include data cleansing, stopwords, stemming, and case folding. After that, the recall-oriented understudy for gisting evaluation (ROUGE) assessment will be used to conduct the measurement of the research results. According to the findings of this research, the optimal level of performance may be accomplished by combining the processes of data cleaning and case folding with scores of 0.78, 0.60, and 0.68 for ROUGE-1, ROUGE-2, and ROUGE-L respectively.

Keywords


Automatic; text summarization; Data cleaning; Rouge; Text preprocessing; Text summarizes

Full Text:

PDF


DOI: https://doi.org/10.11591/csit.v4i2.p119-126

Refbacks

  • There are currently no refbacks.


Computer Science and Information Technologies
ISSN: 2722-323X, e-ISSN: 2722-3221
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).

CSIT Stats

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.