Cosine Similarity-based Plagiarism Detection on Electronic Documents

Authors

  • Lidia Permata Sari Department of Information System

DOI:

https://doi.org/10.70356/josapen.v1i2.14

Keywords:

Plagiarism Detection, Cosine Similarity, Electronic Documents, Similarity Thresholds, Academic theses

Abstract

This study addresses the prevalent issue of plagiarism in academic theses documents, recognizing the potential for undetected similarities within various sections of documents, escaping supervisor oversight. Proposing a solution utilizing the cosine similarity method—a robust technique in natural language processing and document analysis—this research aims to mitigate plagiarism occurrences. The method's benefits, such as independence from document length and high accuracy, advocate for its adoption in plagiarism detection. The study delineates the Waterfall model employed for systematic development, showcasing its structured but inflexible nature in accommodating evolving software requirements. Additionally, the elucidation of cosine similarity mechanics elucidates its pivotal role in quantifying textual resemblance between documents. Practical demonstrations using TF-IDF vectorization and cosine similarity computation offer a step-by-step understanding of the method's implementation. System design, illustrated through UML diagrams and system interface depictions, underscores the comprehensive approach taken in creating a plagiarism detection application. Lastly, successful Black Box testing confirms the application's adherence to functional criteria, validating its efficiency in identifying potential instances of plagiarism. This study contributes significantly to addressing plagiarism concerns through a robust detection mechanism.

Downloads

Download data is not yet available.

References

S. Zouaoui and K. Rezeg, “Multi-Agents Indexing System (MAIS) for Plagiarism Detection,” J. King Saud Univ. - Comput. Inf. Sci., vol. 34, no. 5, pp. 2131–2140, 2022, doi: 10.1016/j.jksuci.2020.06.009.

Z. Liu, J. Zhu, X. Cheng, and Q. Lu, “ScienceDirect Available ScienceDirect ScienceDirect Optimized Algorithm Design for Text similarity Detection Optimized Design for Text similarity Detection Based on Algorithm Artificial Intelligence and Natural Language Based on Artificial Intelligence and Natural Language Processing Processing,” Procedia Comput. Sci., vol. 228, pp. 195–202, 2023, doi: 10.1016/j.procs.2023.11.023.

K. D. Prasetya, Suharjito, and D. Pratama, “Effectiveness Analysis of Distributed Scrum Model Compared to Waterfall approach in Third-Party Application Development,” Procedia Comput. Sci., vol. 179, no. 2019, pp. 103–111, 2021, doi: 10.1016/j.procs.2020.12.014.

T. Thesing, C. Feldmann, and M. Burchardt, “Agile versus Waterfall Project Management: Decision model for selecting the appropriate approach to a project,” Procedia Comput. Sci., vol. 181, pp. 746–756, 2021, doi: 10.1016/j.procs.2021.01.227.

A. A. S. Gunawan, B. Clemons, I. F. Halim, K. Anderson, and M. P. Adianti, “Development of e-butler: Introduction of robot system in hospitality with mobile application,” Procedia Comput. Sci., vol. 216, no. 2019, pp. 67–76, 2022, doi: 10.1016/j.procs.2022.12.112.

G. Bergström et al., “Evaluating the layout quality of UML class diagrams using machine learning,” J. Syst. Softw., vol. 192, p. 111413, 2022, doi: 10.1016/j.jss.2022.111413.

H. Wu, “QMaxUSE: A new tool for verifying UML class diagrams and OCL invariants,” Sci. Comput. Program., vol. 228, p. 102955, 2023, doi: 10.1016/j.scico.2023.102955.

P. Danenas, T. Skersys, and R. Butleris, “Natural language processing-enhanced extraction of SBVR business vocabularies and business rules from UML use case diagrams,” Data Knowl. Eng., vol. 128, no. February, p. 101822, 2020, doi: 10.1016/j.datak.2020.101822.

Meiliana, I. Septian, R. S. Alianto, Daniel, and F. L. Gaol, “Automated Test Case Generation from UML Activity Diagram and Sequence Diagram using Depth First Search Algorithm,” Procedia Comput. Sci., vol. 116, pp. 629–637, 2017, doi: 10.1016/j.procs.2017.10.029.

Z. Daw and R. Cleaveland, “Comparing model checkers for timed UML activity diagrams,” Sci. Comput. Program., vol. 111, no. P2, pp. 277–299, 2015, doi: 10.1016/j.scico.2015.05.008.

F. Chen, L. Zhang, X. Lian, and N. Niu, “Automatically recognizing the semantic elements from UML class diagram images,” J. Syst. Softw., vol. 193, p. 111431, 2022, doi: 10.1016/j.jss.2022.111431.

D. Felicio, J. Simao, and N. Datia, “Rapitest: Continuous black-box testing of restful web apis,” Procedia Comput. Sci., vol. 219, no. 2022, pp. 537–545, 2023, doi: 10.1016/j.procs.2023.01.322.

H. Bostani and V. Moonsamy, “EvadeDroid: A Practical Evasion Attack on Machine Learning for Black-box Android Malware Detection,” Comput. Secur., p. 103676, 2021, doi: 10.1016/j.cose.2023.103676.

F. Pagano, A. Romdhana, D. Caputo, L. Verderame, and A. Merlo, “SEBASTiAn: A static and extensible black-box application security testing tool for iOS and Android applications,” SoftwareX, vol. 23, p. 101448, 2023, doi: 10.1016/j.softx.2023.101448.

C. Cronley et al., “Designing and evaluating a smartphone app to increase underserved communities’ data representation in transportation policy and planning,” Transp. Res. Interdiscip. Perspect., vol. 18, no. January, p. 100763, 2023, doi: 10.1016/j.trip.2023.100763.

Published

2023-07-31

How to Cite

Permata Sari, L. (2023). Cosine Similarity-based Plagiarism Detection on Electronic Documents. Journal of Computer Science Application and Engineering (JOSAPEN), 1(2), 44–48. https://doi.org/10.70356/josapen.v1i2.14