Spammer detection using honeypots and digital forensics
spamming techniques and how to trace spammers. More
Abstract— At present it is very difficult to trace the
specifically, these two sections contrast each other in the
identity of spammers who use identity concealment
sense that the third section looks at how spammers use bot-
techniques. It is difficult to determine the identity of the
networks to conceal their identities, whereas the fourth
spammer by just analysing the electronic trail.
section looks at techniques for tracing the identity of
spammers using bot-networks. The paper then proposes the
This paper proposes the design and implementation of
design of a spammer detection system to detect bot-
a spammer detection system that uses honeypot
networks in section V followed by the conclusion.
techniques to detect abnormal behaviour on a network so as to identify potential spammers.
Unsolicited bulk email otherwise known as spam is an
email sent to a large number of email addresses, where the
nsolicited bulk communication also known as spam is
owners of those addresses have not asked for or consented
the practise of sending unwanted email messages,
to receive the mail . Spam is used to advertise a service
frequently with commercial content, in large quantities to
or a product. An example of spam is an unsolicited email
an indiscriminate set of recipients .
message from an unknown or forged address advertising
The sending of unsolicited bulk communications with the
intention to advertise products and generate sales is
Spam is one of the most significant threats to the
economically viable because senders have no operating
Internet, accounting for around 60% of all email traffic .
costs beyond the management of their mailing lists.
Spam costs consumers and ISPs vast amount of money in
Because the cost of setting up a spamming operation is low
spammers are numerous. Thus the volume of unsolicited
Spammers generally do not pay much for the sending of
bulk communications has increased dramatically over the
spam. They accomplish this by exploiting open mail servers
to do their task for them. The spammer need only send one
The costs of spam which involve lost productivity and
email message to an incorrectly configured mail server to
fraud, these costs are borne by the general public,
reach a vast number of email addresses. Recipients in turn
institutions that store and retrieve email for their employees
need to pay access costs or telephone costs in order to
and by Internet service providers (ISP’s). Institutions and
ISP’s have been forced to add extra capacity to cope with
ISPs have to bear the bulk of the cost for bandwidth
the high volumes of unsolicited bulk communications .
overuse by spammers. This cost is often passed onto the
Anti spamming legislation has been introduced in many
consumer through increased Internet access fees or a
jurisdictions. The problem faced by law enforcement is that
spammers move their operations to jurisdictions that have
The following section describes how spammers use bot-
no or weak anti spamming laws. At present it is very
nets to send spam and how they conceal their identities
difficult to trace the identity of spammers who use identity
from persons who would attempt to identify the source of
concealment techniques. It is difficult to determine the
identity of the spammer by just analysing the electronic trail using standard email tracing techniques. This paper line
III. HOW SPAMMERS CONCEAL THEIR IDENTITIES USING BOT-
proposes the design and implementation of a spammer
detection system that uses artificial intelligence techniques
A Bot-Network consists of a set of machines that have
to detect abnormal behaviour on a network.
been taken over by a spammer using Bot software sent over
The remainder of the paper is structured as follows. The
the internet. This Bot software hides itself on its host
background section defines spam in more detail and also
machine and periodically checks for instructions from its
human Bot-Network administrator. Bot-nets today are often
The next two sections are devoted to the state of the art of
controlled using Internet Relay . The owner of the
computer usually has no idea that his machine has been
Manuscript received June 7, 2009. Ickin Vural is with the
compromised until its internet connection is shut down by
Information and Security Architectures Research Group, Department of Computer science, University of Pretoria, Pretoria,
an ISP. As most ISP’s block bulk mail if they suspect it is
South Africa (e-mail: [email protected]).
spam the spammers who control these Bot-Networks
Prof Hein Venter is the leader of the Information and Security
typically send low volumes of mail at any one time so as not
Architectures Research Group, Department of Computer science,
to arouse suspicions. Thus the spam mail can be traced to
University of Pretoria, Pretoria, South Africa (e-mail:
an innocent individuals network address and not the
While the number of Bot-nets appears to be increasing,
race between spammers and digital forensic investigators
the number of bots in each Bot-net is actually dropping. In
will continue for the foreseeable future.
the past Bot-nets with over 80 000 machines were common
. Currently Bot-nets with a few hundred to a few
This paper proposes the design and implementation of a
thousands infected machines are common. One reason for
system to detect spammers by analysing network traffic for
this is that smaller Bot-nets are more difficult to detect.
abnormal behaviour. The implementation would have to
take into account spam email sending patterns to effectively
IDENTIFYING THE IDENTITY OF SPAMMERS BY USING
identify spammers. The implementation could make use of
artificial intelligence to learn behaviour and thus detect
A honeypot is a closely monitored computing resource that
is intended to be compromised . A honeypot computer
The proposal would be to model a network as a graph
can be applied to Bot-networks, open proxies and open proxies. Thus by setting up a computer to imitate a Bot-
and then train an artificial intelligence agent to learn
network, investigators can attempt to trap the spammers
expected and unexpected behaviour so as to detect a
machine that could possibly have been taken over by a bot-
One way of identifying spammers is to set up a computer
to pretend that it is part of a Bot-network . By allowing the honeypot computer to become part of the Bot-network
we can obtain the Bot-network software used by the
This paper outlines the challenges facing digital forensic
spammer. Once this has been done the honeypot waits for
investigators when attempting to identify spammers. The
the spammer to send new instructions and then identifies
paper promotes the idea of Spammer identification as
the network address of the sender. The problem with this
opposed to Spam identification system to halt spam.
approach is that spammers send the instructions over open relays and open proxies thus it may be impossible to
discover the identity of the spammer’s network address in
The Authors would like to thank the ICSA research
An open proxy is a machine that allows computers to
connect through it to other computers on the internet. Open
proxies exist because they enable unhindered internet usage
 Spamhaus. 2009 The Definition of spam. Available:
in countries that restrict access to certain sites for political
http://www.spamhaus.org/definition.html [April 2009].
or social reasons. An internet user in a country that restricts
 Email Metrics Program. 2007. ‘The Network operators
internet access can access blocked sites by using an open
perspective’ , messaging Anti-Abuse working group.
proxy in a country that does not restrict internet access.
Spammers use open proxies to hide their network
addresses. The recipient of a spammers email will not see
the spammers’ network address on the email but the open
 Europa. 2009 Data protection: "Junk" e-mail costs
proxy’s network address. It is estimated that sixty percent
internet users 10 billion a year worldwide.Available:
of all spam is sent using an open proxy . Thus the
spammer will use an open proxy to send instructions to the
machines on their bot-network to avoid detection.
Available: http://www.ispa.org.za/spam/whatisspam.shtml. [April A. Is Spammer identification possible?
This paper outlines the challenges facing digital forensic
 Evan Cooke, Farnam Jahanian, Danny McPherson.
2005 . The advanced computing systems association.
investigators when attempting to identify spammers using
[Online] The Zombie Roundup Understanding,
bot-networks in conjunction with open proxies. The use of
bot-networks means that even if the source of the machine
sending the spam is identified the person owning the
machine may not be the one responsible for sending spam.
 Niels Provos. 2004. The advanced computing systems
The use of untraceable internet connections and open
association. [Online]. A Virtual Honeypot Framework.
proxies to communicate instructions to bot-networks makes
the use of Honeypots unlikely to succeed.
Thus any success in tracing spammers will be matched
by spammers using increasingly sophisticated techniques to
 Boneh, Dan. 2004. The Difficulties of Tracing Spam
evade detection. Greater responsibility will have to shift to
Email. Department of Computer Science Stanford
ISP’s in monitoring connections to open proxies as well as
attempting to shut down open relays. Nevertheless an arms
Brain Formula ERHALTEN SIE IHRE GEISTIGEN FÄHIGKEITEN ◊ Wie verbessert Brain Formula meine geistige Gesundheit? Das letzte Jahrzehnt des 20. Jahrhunderts erlebte eine Explosion der Forschungim Bereich der Neurowissenschaften. Mittlerweile liegen fundierteAnhaltspunkte dazu vor, dass die speziellen Nährstoffe und die einzigartigenPflanzenextrakte von Brain Formula die Neuronen (
allegato 3 Caro Ottavio e Amici: Soltanto scrivo per dire: GRAZIE per tutto. Mi ha piacutto molto fare questo corso a distanza ed anche poter condividere con voi questa esperienza. Salutti e bacci. Clara - Rosario - Argentina. ---------------------------------------------------------Grazie per tuttodienica, 10 luglio 2011, 18:51Caro Ottavio e amici,grazie per questo corso, che me ha aiutato m