Details
Originalsprache | Englisch |
---|---|
Titel des Sammelwerks | Proceeding of the 17th International Conference on World Wide Web 2008, WWW'08 |
Herausgeber (Verlag) | Association for Computing Machinery (ACM) |
Seiten | 497-506 |
Seitenumfang | 10 |
ISBN (Print) | 9781605580852 |
Publikationsstatus | Veröffentlicht - 21 Apr. 2008 |
Veranstaltung | 17th International Conference on World Wide Web 2008, WWW'08 - Beijing, China Dauer: 21 Apr. 2008 → 25 Apr. 2008 |
Publikationsreihe
Name | Proceeding of the 17th International Conference on World Wide Web 2008, WWW'08 |
---|
Abstract
Email spam is a much studied topic, but even though current email spam detecting software has been gaining a competitive edge against text based email spam, new advances in spam generation have posed a new challenge: image-based spam. Image based spam is email which includes embedded images containing the spam messages, but in binary format. In this paper, we study the characteristics of image spam to propose two solutions for detecting image-based spam, while drawing a comparison with the existing techniques. The first solution, which uses the visual features for classification, offers an accuracy of about 98%, i.e. an improvement of at least 6% compared to existing solutions. SVMs (Support Vector Machines) are used to train classifiers using judiciously decided color, texture and shape features. The second solution offers a novel approach for near duplication detection in images. It involves clustering of image GMMs (Gaussian Mixture Models) based on the Agglomerative Information Bottleneck (AIB) principle, using Jensen-Shannon divergence (JS) as the distance measure.
ASJC Scopus Sachgebiete
- Informatik (insg.)
- Computernetzwerke und -kommunikation
Zitieren
- Standard
- Harvard
- Apa
- Vancouver
- BibTex
- RIS
Proceeding of the 17th International Conference on World Wide Web 2008, WWW'08. Association for Computing Machinery (ACM), 2008. S. 497-506 (Proceeding of the 17th International Conference on World Wide Web 2008, WWW'08).
Publikation: Beitrag in Buch/Bericht/Sammelwerk/Konferenzband › Aufsatz in Konferenzband › Forschung › Peer-Review
}
TY - GEN
T1 - Detecting image spam using visual features and near duplicate detection
AU - Mehta, Bhaskar
AU - Nangia, Saurabh
AU - Gupta, Manish
AU - Nejdl, Wolfgang
PY - 2008/4/21
Y1 - 2008/4/21
N2 - Email spam is a much studied topic, but even though current email spam detecting software has been gaining a competitive edge against text based email spam, new advances in spam generation have posed a new challenge: image-based spam. Image based spam is email which includes embedded images containing the spam messages, but in binary format. In this paper, we study the characteristics of image spam to propose two solutions for detecting image-based spam, while drawing a comparison with the existing techniques. The first solution, which uses the visual features for classification, offers an accuracy of about 98%, i.e. an improvement of at least 6% compared to existing solutions. SVMs (Support Vector Machines) are used to train classifiers using judiciously decided color, texture and shape features. The second solution offers a novel approach for near duplication detection in images. It involves clustering of image GMMs (Gaussian Mixture Models) based on the Agglomerative Information Bottleneck (AIB) principle, using Jensen-Shannon divergence (JS) as the distance measure.
AB - Email spam is a much studied topic, but even though current email spam detecting software has been gaining a competitive edge against text based email spam, new advances in spam generation have posed a new challenge: image-based spam. Image based spam is email which includes embedded images containing the spam messages, but in binary format. In this paper, we study the characteristics of image spam to propose two solutions for detecting image-based spam, while drawing a comparison with the existing techniques. The first solution, which uses the visual features for classification, offers an accuracy of about 98%, i.e. an improvement of at least 6% compared to existing solutions. SVMs (Support Vector Machines) are used to train classifiers using judiciously decided color, texture and shape features. The second solution offers a novel approach for near duplication detection in images. It involves clustering of image GMMs (Gaussian Mixture Models) based on the Agglomerative Information Bottleneck (AIB) principle, using Jensen-Shannon divergence (JS) as the distance measure.
KW - Email spam
KW - Image analysis
KW - Machine learning
UR - http://www.scopus.com/inward/record.url?scp=57349173529&partnerID=8YFLogxK
U2 - 10.1145/1367497.1367565
DO - 10.1145/1367497.1367565
M3 - Conference contribution
AN - SCOPUS:57349173529
SN - 9781605580852
T3 - Proceeding of the 17th International Conference on World Wide Web 2008, WWW'08
SP - 497
EP - 506
BT - Proceeding of the 17th International Conference on World Wide Web 2008, WWW'08
PB - Association for Computing Machinery (ACM)
T2 - 17th International Conference on World Wide Web 2008, WWW'08
Y2 - 21 April 2008 through 25 April 2008
ER -