Publication Date


Document Type


Committee Members

Nikolaos Bourbakis (Advisor), Soon Chung (Committee Member), Sukarno Mertoguno (Committee Member), Bin Wang (Committee Member)

Degree Name

Doctor of Philosophy (PhD)


Systems Reverse Engineering has gained great attention over time and is associated with numerous different research areas. The importance of this research derives from several technological necessities. Security analysis and learning purposes are two of them and can greatly benefit from reverse engineering. More specifically, reverse engineering of technical documents for deeper automatic understanding is a research area where reverse engineering can contribute a lot. In this PhD dissertation we develop a novel reverse engineering methodology for deep understanding of architectural description of digital hardware systems that appear in technical documents. Initially, we offer a survey on reverse engineering of electronic or digital systems. We also provide a classification of the research methods within this field, and a maturity metric is presented to highlight weaknesses and strengths of existing methodologies and systems that are currently available. A technical document (TD) is typically composed by several modalities, like natural language (NL) text, system's diagrams, tables, math formulas, graphics, pictures, etc. Thus, for automatic deep understanding of technical documents, a synergistic collaboration among these modalities is necessary. Here we will deal with the synergistic collaboration between NL-text and system's diagrams for a better and deeper understanding of a TD. In particular, a technical document is decomposed into two modalities NL-text and figures of system's diagrams. Then, the NL-text is processed with a Natural Language text Understanding (NLU) method and text sentences are categorized into five categories, by utilizing a Convolutional Neural Network to classify them accordingly. While, a Diagram-Image-Modeling (DIM) method processes the figures by extracting the system's diagrams. More specifically, NLU processes the text from the document and determines the associations among the nouns and their interactions, by creating their stochastic Petri-net (SPN) graph model. DIM performs processing/analysis of figures to transform the diagram into a graph model that holds all relevant information appearing in the diagram. Then, we combine (associate) these models in a synergistic way and create a synergistic SPN graph. From this SPN graph we obtain the functional specifications that form the behavior of the system in a form of pseudocode. In parallel we extract a flowchart to enhance the understanding that the reader could have about the pseudocode and the hardware system as a unity.

Page Count


Department or Program

Department of Computer Science and Engineering

Year Degree Awarded