Detecting image spam using visual features and near duplicate detection

Bhaskar Mehta; Saurabh Nangia; Manish Gupta; Wolfgang Nejdl

doi:10.1145/1367497.1367565

Details

Originalsprache	Englisch
Titel des Sammelwerks	Proceeding of the 17th International Conference on World Wide Web 2008, WWW'08
Herausgeber (Verlag)	Association for Computing Machinery (ACM)
Seiten	497-506
Seitenumfang	10
ISBN (Print)	9781605580852
Publikationsstatus	Veröffentlicht - 21 Apr. 2008
Veranstaltung	17th International Conference on World Wide Web 2008, WWW'08 - Beijing, China Dauer: 21 Apr. 2008 → 25 Apr. 2008

Publikationsreihe

Name	Proceeding of the 17th International Conference on World Wide Web 2008, WWW'08

Abstract

Email spam is a much studied topic, but even though current email spam detecting software has been gaining a competitive edge against text based email spam, new advances in spam generation have posed a new challenge: image-based spam. Image based spam is email which includes embedded images containing the spam messages, but in binary format. In this paper, we study the characteristics of image spam to propose two solutions for detecting image-based spam, while drawing a comparison with the existing techniques. The first solution, which uses the visual features for classification, offers an accuracy of about 98%, i.e. an improvement of at least 6% compared to existing solutions. SVMs (Support Vector Machines) are used to train classifiers using judiciously decided color, texture and shape features. The second solution offers a novel approach for near duplication detection in images. It involves clustering of image GMMs (Gaussian Mixture Models) based on the Agglomerative Information Bottleneck (AIB) principle, using Jensen-Shannon divergence (JS) as the distance measure.

ASJC Scopus Sachgebiete

Informatik (insg.)
Computernetzwerke und -kommunikation

Zitieren

Detecting image spam using visual features and near duplicate detection. / Mehta, Bhaskar; Nangia, Saurabh; Gupta, Manish et al.
Proceeding of the 17th International Conference on World Wide Web 2008, WWW'08. Association for Computing Machinery (ACM), 2008. S. 497-506 (Proceeding of the 17th International Conference on World Wide Web 2008, WWW'08).

Publikation: Beitrag in Buch/Bericht/Sammelwerk/Konferenzband › Aufsatz in Konferenzband › Forschung › Peer-Review

Mehta, B, Nangia, S, Gupta, M & Nejdl, W 2008, Detecting image spam using visual features and near duplicate detection. in Proceeding of the 17th International Conference on World Wide Web 2008, WWW'08. Proceeding of the 17th International Conference on World Wide Web 2008, WWW'08, Association for Computing Machinery (ACM), S. 497-506, 17th International Conference on World Wide Web 2008, WWW'08, Beijing, China, 21 Apr. 2008. https://doi.org/10.1145/1367497.1367565

Mehta, B., Nangia, S., Gupta, M., & Nejdl, W. (2008). Detecting image spam using visual features and near duplicate detection. In Proceeding of the 17th International Conference on World Wide Web 2008, WWW'08 (S. 497-506). (Proceeding of the 17th International Conference on World Wide Web 2008, WWW'08). Association for Computing Machinery (ACM). https://doi.org/10.1145/1367497.1367565

Mehta B, Nangia S, Gupta M, Nejdl W. Detecting image spam using visual features and near duplicate detection. in Proceeding of the 17th International Conference on World Wide Web 2008, WWW'08. Association for Computing Machinery (ACM). 2008. S. 497-506. (Proceeding of the 17th International Conference on World Wide Web 2008, WWW'08). doi: 10.1145/1367497.1367565

Mehta, Bhaskar ; Nangia, Saurabh ; Gupta, Manish et al. / Detecting image spam using visual features and near duplicate detection. Proceeding of the 17th International Conference on World Wide Web 2008, WWW'08. Association for Computing Machinery (ACM), 2008. S. 497-506 (Proceeding of the 17th International Conference on World Wide Web 2008, WWW'08).

Download

@inproceedings{a1bcc9b69ecb44cabf630f2b1f071c02,

title = "Detecting image spam using visual features and near duplicate detection",

abstract = "Email spam is a much studied topic, but even though current email spam detecting software has been gaining a competitive edge against text based email spam, new advances in spam generation have posed a new challenge: image-based spam. Image based spam is email which includes embedded images containing the spam messages, but in binary format. In this paper, we study the characteristics of image spam to propose two solutions for detecting image-based spam, while drawing a comparison with the existing techniques. The first solution, which uses the visual features for classification, offers an accuracy of about 98%, i.e. an improvement of at least 6% compared to existing solutions. SVMs (Support Vector Machines) are used to train classifiers using judiciously decided color, texture and shape features. The second solution offers a novel approach for near duplication detection in images. It involves clustering of image GMMs (Gaussian Mixture Models) based on the Agglomerative Information Bottleneck (AIB) principle, using Jensen-Shannon divergence (JS) as the distance measure.",

keywords = "Email spam, Image analysis, Machine learning",

author = "Bhaskar Mehta and Saurabh Nangia and Manish Gupta and Wolfgang Nejdl",

year = "2008",

month = apr,

day = "21",

doi = "10.1145/1367497.1367565",

language = "English",

isbn = "9781605580852",

series = "Proceeding of the 17th International Conference on World Wide Web 2008, WWW'08",

publisher = "Association for Computing Machinery (ACM)",

pages = "497--506",

booktitle = "Proceeding of the 17th International Conference on World Wide Web 2008, WWW'08",

address = "United States",

note = "17th International Conference on World Wide Web 2008, WWW'08 ; Conference date: 21-04-2008 Through 25-04-2008",

}

Download

TY - GEN

T1 - Detecting image spam using visual features and near duplicate detection

AU - Mehta, Bhaskar

AU - Nangia, Saurabh

AU - Gupta, Manish

AU - Nejdl, Wolfgang

PY - 2008/4/21

Y1 - 2008/4/21

N2 - Email spam is a much studied topic, but even though current email spam detecting software has been gaining a competitive edge against text based email spam, new advances in spam generation have posed a new challenge: image-based spam. Image based spam is email which includes embedded images containing the spam messages, but in binary format. In this paper, we study the characteristics of image spam to propose two solutions for detecting image-based spam, while drawing a comparison with the existing techniques. The first solution, which uses the visual features for classification, offers an accuracy of about 98%, i.e. an improvement of at least 6% compared to existing solutions. SVMs (Support Vector Machines) are used to train classifiers using judiciously decided color, texture and shape features. The second solution offers a novel approach for near duplication detection in images. It involves clustering of image GMMs (Gaussian Mixture Models) based on the Agglomerative Information Bottleneck (AIB) principle, using Jensen-Shannon divergence (JS) as the distance measure.

AB - Email spam is a much studied topic, but even though current email spam detecting software has been gaining a competitive edge against text based email spam, new advances in spam generation have posed a new challenge: image-based spam. Image based spam is email which includes embedded images containing the spam messages, but in binary format. In this paper, we study the characteristics of image spam to propose two solutions for detecting image-based spam, while drawing a comparison with the existing techniques. The first solution, which uses the visual features for classification, offers an accuracy of about 98%, i.e. an improvement of at least 6% compared to existing solutions. SVMs (Support Vector Machines) are used to train classifiers using judiciously decided color, texture and shape features. The second solution offers a novel approach for near duplication detection in images. It involves clustering of image GMMs (Gaussian Mixture Models) based on the Agglomerative Information Bottleneck (AIB) principle, using Jensen-Shannon divergence (JS) as the distance measure.

KW - Email spam

KW - Image analysis

KW - Machine learning

UR - http://www.scopus.com/inward/record.url?scp=57349173529&partnerID=8YFLogxK

U2 - 10.1145/1367497.1367565

DO - 10.1145/1367497.1367565

M3 - Conference contribution

AN - SCOPUS:57349173529

SN - 9781605580852

T3 - Proceeding of the 17th International Conference on World Wide Web 2008, WWW'08

SP - 497

EP - 506

BT - Proceeding of the 17th International Conference on World Wide Web 2008, WWW'08

PB - Association for Computing Machinery (ACM)

T2 - 17th International Conference on World Wide Web 2008, WWW'08

Y2 - 21 April 2008 through 25 April 2008

ER -

Research@Leibniz University

Detecting image spam using visual features and near duplicate detection

Autorschaft

Organisationseinheiten

Externe Organisationen

Details

Publikationsreihe

Abstract

ASJC Scopus Sachgebiete

Zitieren

Von denselben Autoren

Adaptive Dispatching of Mobile Charging Stations using Multi-Agent Graph Convolutional Cooperative-Competitive Reinforcement Learning

Robust Fusion of Time Series and Image Data for Improved Multimodal Clinical Prediction

Harnessing Empathy and Ethics for Relevance Detection and Information Categorization in Climate and COVID-19 Tweets

Open benchmark for filtering techniques in entity resolution

Beyond Accuracy: Investigating Error Types in GPT-4 Responses to USMLE Questions