Classifying Web Robots by K-Means Clustering

Document Type

Conference Proceeding

Publication Date



Sophisticated Web robots, sporting a variety of functionality and unique traffic characteristics, constitute a significant percentage of request and bandwidth volume serviced by a Web server. To adequately prepare Web servers for this continuous rise in Web robots, it is necessary to gain deeper insights into their traffic properties. In this paper, we propose to classify Web robots according to their workload characteristics, using K-means clustering as the underlying partitioning technique. We demonstrate how our approach can allow an examination of Web robot traffic from new perspectives by applying it to classify Web robots extracted from a year-long server log collected from the Univ. of Connecticut School of Engineering domain.


Presented at the Twenty-First International Conference on Software Engineering and Knowledge Engineering, Boston, MA, July 1-3, 2009.