Novel Code Plagiarism Detection Based on Abstract Syntax Tree and Fuzzy Petri Nets

Victor R. L. Shen


DOI: https://doi.org/10.14710/ijee.1.1.46-56

Abstract


Those students who major in computer science and/or engineering are required to design program codes in a variety of programming languages. However, many students submit their source codes they get from the Internet or friends with no or few modifications. Detecting the code plagiarisms done by students is very time-consuming and leads to the problems of unfair learning performance evaluation. This paper proposes a novel method to detect the source code plagiarisms by using a high-level fuzzy Petri net (HLFPN) based on abstract syntax tree (AST). First, the AST of each source code is generated after the lexical and syntactic analyses have been done. Second, token sequence is generated based on the AST. Using the AST can effectively detect the code plagiarism by changing the identifier or program statement order. Finally, the generated token sequences are compared with one another using an HLFPN to determine the code plagiarism. Furthermore, the experimental results have indicated that we can make better determination to detect the code plagiarism.


Keywords


Computer Science Education; Source Code Plagiarism; Lexical Analysis; Syntactic Analysis; Abstract Syntax Tree; Petri Net.

Full Text:

FULL TEXT PDF

References


References

S. Butakov, M. Kim, and S. Kim, “Low RAM footprint algorithm for small scale plagiarism detection projects,” Procs. of the International Conference on Information Science and Applications (ICISA), pp. 1-2, May 2012.

H. Kikuchi, T. Gooto, M. Wakatsuki, and T. Nishino, “A source code plagiarism detecting method using alignment syntax tree elements,” Procs. of the 15th International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), pp. 1-6, Jun. 2014.

S. Nadelson, “Academic misconduct by university students: Faculty perceptions and responses,” Plagiary, vol. 2, no. 2, pp. 1-10, 2007.

F. Culwin and T. Lancaster, “Plagiarism issues for higher education,” Procs. of VINE, vol. 31, no. 2, pp. 36-41, 2001.

J. Wilkinson, “Staff and student perceptions of plagiarism and cheating,” International Journal of Teaching and Learning in Higher Education, vol. 20, no. 2, pp. 98-105, 2009.

M. Paris, “Source code and text plagiarism detection strategies,” Procs. of the 4th Annual LTSN-ICS Conference, pp. 74-78, 2003.

M. Dick, et. al., “Addressing student cheating: Definitions and solutions,” Procs. of Innovation and Technology in Computer Science Education,” pp. 172-184, Jun. 2002.

M. Joy, G. Cosma, J. Y. Yau, and J. Sinclair, “Source code plagiarism-A student perspective,” IEEE Transactions on Education, vol. 54, no. 1, Feb. 2011.

E. Jones, “Metrics based plagiarism monitoring,” Journal of Computing Sciences in Colleges, vol. 16, no. 4, pp. 253-261, 2001.

J. Zhao, K. Xia, Y. Fu, and B. Cui, “An AST-based code plagiarism detection algorithm,” Procs. of the 10th International Conference on Broadband and Wireless Computing, Communications, and Applications (BWCCA), pp. 178-182, Nov. 2015.

C. K. Roy and J. R. Cordy, “A survey on software clone detection research,” Queen’s University Technical Report No. 2007-541, pp. 1-109, Sept. 2007.

B. Baker, “On finding duplication and near-duplication in large software systems,” Procs. of the Second Working Conference on Reverse Engineering, pp. 86-95, Jul. 1995.

J. Johnson, “Substring matching for clone detection and change tracking,” Procs. of the 10th International Conference on Software Maintenance, pp. 120-126, 1994.

T. Kamiya, S. Kusumoto, and K. Inoue, “CCFinder: A multilinguistic token-based code clone detection system for large scale source code,” IEEE Transactions on Software Engineering, vol. 28, no. 7, pp. 654-670, Jul. 2002.

Z. Li, S. Lu, S. Myagmar, and Y. Zhou, “CP-Miner: A tool for finding copy-paste and related bugs in operating system code,” Procs. of the 6th Conference on Symposium on Operation Systems Design & Implementation, pp. 289-302, Dec. 2004.

L. Prechelt, G. Malpohl, and M. Phillipsen, “Finding plagiarisms among a set of programs with JPlag,” Journal of Universal Computer Science, vol. 8, no. 11, pp. 1016-1038, Apr. 2002.

S. Schleimer, D. S. Wilkerson, and A. Aiken, “Winnowing: Local algorithms for document fingerprinting,” SIGMOD ACM, pp. 76-85, Jun. 2003.

R. Koschke, R. Falke, and P. Frenzel, “Clone detection using abstract syntax suffix trees,” Procs. of IEEE 13th Working Conference on Reverse Engineering (WCRD), pp. 253-262, Oct. 2006.

L. P. Zhang and D. S. Liu, “AST-based multi-language plagiarism detection method,” Procs. of IEEE 4th International Conference on Software Engineering and Service Science (ICSESS), pp. 738-742 , May 2013.

J. Feng, B. Cui, and K. Xia, “A code comparison algorithm based on AST for plagiarism detection,” Procs. of the 4th International Conference on Emerging Intelligent Data and Web Technologies (EIDWT), pp. 393-397, Sept. 2013.

L. P. Zhang, D. S. Liu, Y. Li, and M. Zhong, “AST-based Plagiarism Detection Method,” Procs. of the International Workshop on Internet of Things' Technology and Innovative Application Design (IOT Workshop), pp. 611-618, Apr. 2012.

S. Bellon, R. Koschke, G. Antoniol, J. Krinke, and E. Merlo, “Comparison and evaluation of clone detection tools,” IEEE Transactions on Software Engineering, vol. 31, no. 10, pp.804–818, Aug. 2007.

Y. Higo, Y. Ueda, M. Nishino, and S. Kusumoto, “Incremental code clone detection: A PDG-based approach,” Procs. of the 18th IEEE Working Conference on Reverse Engineering, pp. 3-12, Oct. 2011.

M. Balint, T. Girba, and R. Marinescu, “How developers copy,” Procs. of the 14th IEEE International Conference on Program Comprehension, pp. 56–68, Jun. 2006.

R. Komondoor and S. Horwitz, “Semantics-preserving procedure extraction,” Procs. of the 27th ACM SIGPLAN-SIGACT on Principles of Programming Languages, pp. 155–169, Jan. 2000.

J. Krinke, “Identifying similar code with program dependence graphs,” Procs. of the 8th Working Conference on Reverse Engineering, pp. 301–309, Oct. 2001.

Christopher Venters, Cassandra Groen, Lisa D. McNair, and Marie C. Paretti, “Using writing assignments to improve learning in Statics: A mixed methods study”, The International Journal of Engineering Education, vol. 34, no. 1, pp. 119-131, Feb. 2018.

L. Jiang, G. Misherghi, Z. Su, and S. Glondu, “DECKARD: Scalable and accurate tree-based detection of code clones,” Procs. of the 29th International Conference on Software Engineering (ICSE'07), pp. 96-105, May 2007.

M. Datar, N. Immorlica, P. Indyk, and V. S. Mirrokni, “Locality-sensitive hashing scheme based on p-stable distributions,” Procs. of the 20th annual symposium on computational geometry (SoGG'04), pp. 253-262, Jun. 2004.

J. Jones. 2016. Abstract Syntax Tree Implementation Idioms [Online]. University of Alabama. Available: http://www.hillside.net/plop/plop20013/Papers/Jones-ImplementingASTs.pdf.

D. W. Mount, Sequence and Genome Analysis, Cold Spring Harbor Laboratory Press, 2002.

Wiharyanto Oktiawan, Mochtar Hadiwidodo, and Purwono, “Enhancement student understanding through the development of lab module based on constructivistic”, The International Journal of Engineering Education, vol. 1, no. 1, pp. 1-5,Jan. 2016.

T. Akutsu, Mathematical Models and Algorithms in Bioinformatics, Kyoritsu Shuppan, 2007.

T. Murata, “Petri nets: Properties, analysis and applications,” Proceedings of IEEE, vol. 77, no. 4, pp. 541-580, Aug. 1989.

R. Robidoux, H.P. Xu, L.D. Xing, and M.C. Zhou, “Automated modeling of dynamic reliability block diagrams using colored Petri nets,” IEEE Transactions on Systems, Man, Cybernetics-Part A: Systems and Humans, vol. 40, no. 2, pp. 337–351, Nov. 2010.

H. Ogata and Y. Yano, “Knowledge awareness map for computer-supported ubiquitous language-learning,” Procs. of the 2nd IEEE International Workshop on Wireless and Mobile Technologies in Education, pp. 19–25, Mar. 2004.

Ari Wibisono, Wisnu Jatmiko, Hanief Arief Wisesa, Benny Hardjono, and Petrus Mursanto, “Traffic big data prediction and visualization using Fast Incremental Model Tress-Drift Detection (FIMT-DD),” Knowledge-Based Systems, vol. 93, pp. 33–46, Feb. 2016.

Victor R.L. Shen and Cheng-Ying Yang, “An intelligent multiagent tutoring system in artificial intelligence”, The International Journal of Engineering Education, vol. 27, no. 2, pp. 248-256, Apr. 2011.

Massimo Bartoletti, Tiziana Cimoli, and G. Michele Pinna, “Lending Petri nets,” Science of Computer Programming, vol. 112, no. 1, pp. 75–101, Nov. 2015.

Kaile Zhou, and Shanlin Yang, “Exploring the uniform effect of FCM clustering: A data distribution perspective,” Knowledge-Based Systems, vol. 96, pp. 76–83, Mar. 2016.

V. R. L. Shen, H. Y. Lai, and A. F. Lai, “The implementation of a smartphone-based fall detection system using a high-level fuzzy Petri net,” Applied Soft Computing, vol. 26, no. 1, pp. 390-400, Jan. 2015.

V. R. L. Shen, “Knowledge representation using high-level fuzzy Petri nets,” IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, vol. 36, no. 6, pp. 2120-2127, Oct. 2006.

V. R. L. Shen, “Reinforcement learning for high-level fuzzy Petri nets,” IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics, vol. 33, no. 2, pp. 351-362, Mar. 2003.

E. H. Mamdani, “Application of fuzzy logic to approximate reasoning using linguistic systems,” IEEE Transactions on Computers, vol. 26, no. 12, pp. 1182–1191, Dec. 1977.

V. R. L. Shen, “Correctness in hierarchical knowledge-based requirements,” IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics, vol. 30, no. 4, pp. 625-631, Aug. 2000.

V. R. L. Shen, Y. S. Chang, and T. T. Y. Juang, “Supervised and unsupervised learning by using Petri nets,” IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, vol. 40, no. 2, pp. 363-375, Mar. 2010.

R. W. Sebesta, Concepts of Programming Languages, New Jersey: Pearson Education, 2012.





Published by Faculty of Engineering in collaboration with Vocational School, Diponegoro University - Indonesia.