Publication Date
2023
Document Type
Thesis
Committee Members
Lingwei Chen, Ph.D. (Advisor); Meilin Liu, Ph.D. (Committee Member); Junjie Zhang, Ph.D. (Committee Member)
Degree Name
Master of Science in Cyber Security (M.S.C.S.)
Abstract
Identifying the version of the Solidity compiler used to create an Ethereum contract is a challenging task, especially when the contract bytecode is obfuscated and lacks explicit metadata. Ethereum bytecode is highly complex, as it is generated by the Solidity compiler, which translates high-level programming constructs into low-level, stack-based code. Additionally, the Solidity compiler undergoes frequent updates and modifications, resulting in continuous evolution of bytecode patterns. To address this challenge, we propose using deep learning models to analyze Ethereum bytecodes and infer the compiler version that produced them. A large number of Ethereum contracts and the corresponding compiler versions is used to train these models. The dataset includes contracts compiled with various versions of the Solidity compiler. We preprocess the dataset to extract opcode sequences from the bytecode, which serve as inputs for the deep learning models. We use the advanced sequence learning methods such as bidirectional long short-term memory (Bi-LSTM), convolutional neural network (CNN), CNN+Bi-LSTM, Transformer, and Sentence BERT (SBERT) to capture the semantics of the opcode sequences. We analyze each model’s performance using metrics such as accuracy, precision, recall, and F1-score. Our results demonstrate that our developed models excel at identifying the Solidity compiler version used in smart contracts with high accuracy. We also compare our methods with non-sequence learning models, showing that our models outperform them in most cases. This highlights the advantages of our proposed approaches for identifying Solidity compiler versions from Ethereum bytecodes.
Page Count
98
Department or Program
Department of Computer Science and Engineering
Year Degree Awarded
2023
Copyright
Copyright 2023, all rights reserved. My ETD will be available under the "Fair Use" terms of copyright law.