Identifying spammers by their resource usage patterns
Kevin S. Xu, Mark Kliger, and Alfred O. Hero III
Abstract
Most studies on spam thus far have focused on its content or source. These types of studies, however, reveal little about the behavioral characteristics of spammers. In addition, privacy issues may prevent wide access to email content. In this paper, we try to identify spammers by investigating their resource usage patterns. Specifically, we look at usage patterns of harvesters, the bots that are used to acquire email addresses, and spam servers, the email servers being used to send the spam emails. We perform spectral biclustering on both harvesters and servers to reveal groups of resources that are used together, which we believe correspond to individual spammers or groups of spammers. We make several interesting discoveries including a division into phishing and non-phishing spammers and a group of harvesters with highly correlated behavior that have IP addresses belonging to a rogue Internet service provider.
Paper
K. S. Xu, M. Kliger, A. O. Hero III, “Identifying spammers by their resource usage patterns,” Collaboration, Electronic Messaging, Anti-Abuse and Spam Conf. (CEAS), 2010. (.pdf)
Biclustering results
The Cytoscape visualization shows the bicluster interaction network. Each bicluster is represented by two vertices: a circular vertex corresponding to a cluster of harvesters and a triangular vertex corresponding to a cluster of servers. The size of a vertex is representative of the number of harvesters or servers in that particular cluster, and the color of a vertex corresponds to the average phishing level of the harvesters or servers in the bicluster. Only the biclusters in the giant connected component (GCC) are displayed! In some months, such as March 2006, all of the phishing biclusters were disconnected from the GCC, so they are excluded from the visualization. The smaller connected components are, however, included in the contingency tables.
The contingency tables show the distribution of phishing and non-phishing harvesters and servers in phishing and non-phishing biclusters.
Month | Visualization | Contingency tables | |||||
---|---|---|---|---|---|---|---|
2006-01 | clu_2006_01.png | Pharvester | Nharvester | Pserver | Nserver | ||
Pcluster | 120 | 4 | Pcluster | 155 | 8 | ||
Ncluster | 16 | 466 | Pcluster | 21 | 3077 | ||
2006-02 | clu_2006_02.png | Pharvester | Nharvester | Pserver | Nserver | ||
Pcluster | 127 | 1 | Pcluster | 202 | 3 | ||
Ncluster | 14 | 557 | Pcluster | 25 | 3745 | ||
2006-03 | clu_2006_03.png | Pharvester | Nharvester | Pserver | Nserver | ||
Pcluster | 209 | 25 | Pcluster | 299 | 51 | ||
Ncluster | 18 | 639 | Pcluster | 28 | 5290 | ||
2006-04 | clu_2006_04.png | Pharvester | Nharvester | Pserver | Nserver | ||
Pcluster | 244 | 26 | Pcluster | 331 | 34 | ||
Ncluster | 19 | 586 | Pcluster | 22 | 4389 | ||
2006-05 | clu_2006_05.png | Pharvester | Nharvester | Pserver | Nserver | ||
Pcluster | 222 | 21 | Pcluster | 255 | 24 | ||
Ncluster | 24 | 697 | Pcluster | 34 | 4723 | ||
2006-06 | clu_2006_06.png | Pharvester | Nharvester | Pserver | Nserver | ||
Pcluster | 187 | 11 | Pcluster | 232 | 20 | ||
Ncluster | 20 | 788 | Pcluster | 52 | 17639 | ||
2006-07 | clu_2006_07.png | Pharvester | Nharvester | Pserver | Nserver | ||
Pcluster | 188 | 10 | Pcluster | 238 | 18 | ||
Ncluster | 33 | 864 | Pcluster | 43 | 22431 | ||
2006-08 | clu_2006_08.png | Pharvester | Nharvester | Pserver | Nserver | ||
Pcluster | 193 | 14 | Pcluster | 214 | 24 | ||
Ncluster | 19 | 879 | Pcluster | 71 | 29738 | ||
2006-09 | clu_2006_09.png | Pharvester | Nharvester | Pserver | Nserver | ||
Pcluster | 165 | 9 | Pcluster | 215 | 19 | ||
Ncluster | 25 | 965 | Pcluster | 108 | 37554 | ||
2006-10 | clu_2006_10.png | Pharvester | Nharvester | Pserver | Nserver | ||
Pcluster | 172 | 7 | Pcluster | 231 | 11 | ||
Ncluster | 20 | 1333 | Pcluster | 1465 | 73748 | ||
2006-11 | clu_2006_11.png | Pharvester | Nharvester | Pserver | Nserver | ||
Pcluster | 147 | 10 | Pcluster | 192 | 16 | ||
Ncluster | 44 | 1287 | Pcluster | 832 | 61970 | ||
2006-12 | clu_2006_12.png | Pharvester | Nharvester | Pserver | Nserver | ||
Pcluster | 135 | 13 | Pcluster | 180 | 11 | ||
Ncluster | 18 | 1316 | Pcluster | 1832 | 59957 | ||
2007-01 | clu_2007_01.png | Pharvester | Nharvester | Pserver | Nserver | ||
Pcluster | 132 | 7 | Pcluster | 171 | 18 | ||
Ncluster | 23 | 1245 | Pcluster | 1502 | 59982 | ||
2007-02 | clu_2007_02.png | Pharvester | Nharvester | Pserver | Nserver | ||
Pcluster | 130 | 6 | Pcluster | 198 | 23 | ||
Ncluster | 37 | 1228 | Pcluster | 1085 | 65320 | ||
2007-03 | clu_2007_03.png | Pharvester | Nharvester | Pserver | Nserver | ||
Pcluster | 110 | 5 | Pcluster | 146 | 24 | ||
Ncluster | 17 | 1131 | Pcluster | 609 | 67203 | ||
2007-04 | clu_2007_04.png | Pharvester | Nharvester | Pserver | Nserver | ||
Pcluster | 117 | 8 | Pcluster | 293 | 21 | ||
Ncluster | 26 | 1760 | Pcluster | 1057 | 79528 | ||
2007-05 | clu_2007_05.png | Pharvester | Nharvester | Pserver | Nserver | ||
Pcluster | 115 | 13 | Pcluster | 258 | 37 | ||
Ncluster | 100 | 1965 | Pcluster | 1380 | 74307 | ||
2007-06 | clu_2007_06.png | Pharvester | Nharvester | Pserver | Nserver | ||
Pcluster | 163 | 18 | Pcluster | 396 | 59 | ||
Ncluster | 103 | 2347 | Pcluster | 1807 | 88705 | ||
2007-07 | clu_2007_07.png | Pharvester | Nharvester | Pserver | Nserver | ||
Pcluster | 168 | 32 | Pcluster | 448 | 105 | ||
Ncluster | 110 | 2700 | Pcluster | 2229 | 147110 | ||
2007-08 | clu_2007_08.png | Pharvester | Nharvester | Pserver | Nserver | ||
Pcluster | 256 | 58 | Pcluster | 891 | 164 | ||
Ncluster | 87 | 2918 | Pcluster | 2667 | 195619 | ||
2007-09 | clu_2007_09.png | Pharvester | Nharvester | Pserver | Nserver | ||
Pcluster | 410 | 73 | Pcluster | 1248 | 372 | ||
Ncluster | 86 | 3938 | Pcluster | 3205 | 236327 | ||
2007-10 | clu_2007_10.png | Pharvester | Nharvester | Pserver | Nserver | ||
Pcluster | 295 | 71 | Pcluster | 990 | 2416 | ||
Ncluster | 122 | 5390 | Pcluster | 258 | 452859 | ||
2007-11 | clu_2007_11.png | Pharvester | Nharvester | Pserver | Nserver | ||
Pcluster | 323 | 84 | Pcluster | 1000 | 297 | ||
Ncluster | 127 | 5983 | Pcluster | 5773 | 740193 | ||
2007-12 | clu_2007_12.png | Pharvester | Nharvester | Pserver | Nserver | ||
Pcluster | 242 | 49 | Pcluster | 698 | 200 | ||
Ncluster | 124 | 7137 | Pcluster | 4232 | 992336 |
Code and Data
To request access to the code and data used in the analysis, please contact Kevin Xu at the address below.