The observed preprocessing strategies for doing automatic text summarizing

Muhammad Farhan Juna; Mardhiya Hayaty

doi:10.11591/csit.v4i2.p119-126

The observed preprocessing strategies for doing automatic text summarizing

Muhammad Farhan Juna, Mardhiya Hayaty

Abstract

It is challenging for humans to keep up with the rapid creation of digital information due to the explosion of digital information. A written document can be analyzed to extract meaningful information using automatic text summarization. This research proposes 16 different experimental settings in which the model developed by IndoBERT will be applied in order to answer the question of how much of an impact preprocessing has on the quality of summaries produced by automatic text summarization. In order to answer this question, the researchers have devised this study. In this study, we will explicitly talk about preprocessing strategies by conducting tests with different combinations of preprocessing techniques. These techniques include data cleansing, stopwords, stemming, and case folding. After that, the recall-oriented understudy for gisting evaluation (ROUGE) assessment will be used to conduct the measurement of the research results. According to the findings of this research, the optimal level of performance may be accomplished by combining the processes of data cleaning and case folding with scores of 0.78, 0.60, and 0.68 for ROUGE-1, ROUGE-2, and ROUGE-L respectively.

Keywords

Automatic; text summarization; Data cleaning; Rouge; Text preprocessing; Text summarizes

Full Text:

PDF

DOI: https://doi.org/10.11591/csit.v4i2.p119-126

Refbacks

There are currently no refbacks.

Computer Science and Information Technologies
p-ISSN: 2722-323X, e-ISSN: 2722-3221
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Universitas Ahmad Dahlan (UAD).

CSIT Visitor Stats

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Username
Password
Remember me