Publication Date

2016

Document Type

Thesis

Committee Members

Tanvi Banerjee (Committee Member), Derek Doran (Committee Chair), John Gallagher (Committee Member)

Degree Name

Master of Science (MS)

Abstract

With an ever increasing amount of data that is shared and posted on the Web, the desire and necessity to automatically glean this information has led to an increase in the sophistication and volume of software agents called web robots or crawlers. Recent measurements, including our own across the entire logs of Wright State University Web servers over the past two years, suggest that at least 60\% of all requests originate from robots rather than humans. Web robots display different statistical and behavioral patterns in their traffic compared to humans, yet present Web server optimizations presume that traffic exhibits predominantly human-like characteristics. Robots may thus be silently degrading the performance and scalability of our web systems. This thesis investigates a new take on a classic performance tool, namely web caches, to mitigate the impact of robot traffic on web server operations. It proposes a cache system architecture that:~(i) services robot and human traffic in separate physical memory stores, with separate polices;~(ii) uses an adaptable policy for admitting robot related resources;~(iii) combines a deep neural network with Bayesian models to improve request prediction. Experiments with real data demonstrate (i) significant reduction in bandwidth usage for prefetching and (ii) improvements in hit rate for human driven traffic compared to a number of baselines, especially in configurations where web caches have limited size.

Page Count

60

Department or Program

Department of Computer Science and Engineering

Year Degree Awarded

2016


Share

COinS