Computer Science and Engineering Faculty Publications

Web Robot Detection Techniques: Overview and Limitations

Derek Doran, Wright State University - Main CampusFollow
Swapna S. Gokhale

Document Type

Article

Publication Date

1-2011

Abstract

Most modern Web robots that crawl the Internet to support value-added services and technologies possess sophisticated data collection and analysis capabilities. Some of these robots, however, may be ill-behaved or malicious, and hence, may impose a significant strain on a Web server. It is thus necessary to detect Web robots in order to block undesirable ones from accessing the server. Such detection is also essential to ensure that the robot traffic is considered appropriately in the performance and capacity planning of Web servers. Despite a variety of Web robot detection techniques, there is no consensus regarding a single technique, or even a specific “type” of technique, that performs well in practice. Therefore, to aid in the development of a practically applicable robot detection technique, this survey presents a critical analysis and comparison of the prevalent detection approaches. We propose a framework to classify the existing detection techniques into four categories based on their underlying detection philosophy. We compare the different classes to gain insights into those characteristics that make up an effective robot detection scheme. Finally, we discuss why the contemporary techniques fail to offer a general solution to the robot detection problem and propose a set of key ingredients necessary for strong Web robot detection.

Repository Citation

Doran, D., & Gokhale, S. S. (2011). Web Robot Detection Techniques: Overview and Limitations. Data Mining and Knowledge Discovery, 22 (1-2), 183-210.
https://corescholar.libraries.wright.edu/cse/251

DOI

10.1007/s10618-010-0180-z

Find in your library

Off-Campus WSU Users

Find in your library

COinS

Computer Science and Engineering Faculty Publications

Web Robot Detection Techniques: Overview and Limitations

Document Type

Publication Date

Abstract

Repository Citation

DOI

Search

Browse

About

SelectedWorks Sites

Computer Science and Engineering Faculty Publications

Web Robot Detection Techniques: Overview and Limitations

Authors

Document Type

Publication Date

Abstract

Repository Citation

DOI

Share

Search

Browse

About

SelectedWorks Sites