Publication Date

2007

Document Type

Thesis

Committee Members

Prabhaker Mateti (Advisor)

Degree Name

Master of Science in Computer Engineering (MSCE)

Abstract

Preparation of academic papers involves not only the creative processes but also the more mechanical tasks such as adjusting the form and style to suit the demands of the publishing journal or conference. Among several packages that help in these rather tedious mechanical tasks, the TEX + LATEX + BibTEX combination is extremely popular. This thesis is about tools that help in the necessary task of citing related work accurately. It focuses on three aspects of this larger bibliography frame work: (i) a survey of existing bibliography formats and tools, (ii) a database view of BibTEX files and functionality that ensues, and (iii) processing references given as free style pieces of text. Numerous tools that ease the citation task have been developed in the last five years. The thesis reviews thoroughly the 65 open source, and freeware tools, and somewhat less thoroughly the 18 commercial tools because of limitations of trial ware. These tools range from small stand-alone utilities of a couple of thousand lines of code to large suites of tools that evolved out of the research work of teams over a few years. Their functionality includes the collection of references and searching the various on-line bibliographies for full details and prepare them for inclusion in the references section typically found at the end of papers. We identify a few voids in functionality, especially dealing with free style references, and contribute new tools. The second focus of the thesis is on the maintenance of bibliographies by individuals. In this context, we contribute several new tools: (i) LoadBibTeX stores bibliographic entries as a MySQL database of BibTEX fields as tables as opposed to storing them as plain text .bib files. (ii) BibSearch allows authors to search the database of BibTEX entries based on multiple keywords that can be matched in multiple fields and the resulting output may be saved as a standard .bib file. (iii) Normalization is a feature incorporated into the above tools to bring about normalization of equivalent BibTEX entries. (iv) Duplicate discovery as a feature of LoadBibTeX detects duplicates in a bibliography database in a reliable way. The third focus of the thesis is on the extraction and conversion of references from free style plain text into bibliographic entries expressed in the formal syntax of BibTEX. Often an author collects references as a file of copied-and-pasted pieces of text. We developed a tool that converts such clippings in free style text to bibliographic entries in BibTEX format. Being free style, author names, titles of papers, names of journals and conferences, page numbers, etc. may not appear in a guaranteed order. Recognition of these fields is driven by heuristics. Our tool provides feedback to the authors with (i) a confidence number indicating the correctness of the recognition of a field, and (ii) a colorized HTML version of the input free style text indicating the results of the translation. An extension of this tool extracts the references section of papers published as PDF and translates them into BibTEX entries. We developed an API as a Java package to allow other developers to incorporate the free style to BibTEX conversion functionality into their applications. As an example, we integrate into Aigaion, a highly effective web-based bibliographic tool, both translating free style references, and extracting references from PDF files.

Page Count

249

Department or Program

Department of Computer Science and Engineering

Year Degree Awarded

2008


Share

COinS