How to exploit twitter for public health monitoring?

Research output: Contribution to journalArticleResearchpeer review

Authors

  • Kerstin Denecke
  • M. Krieck
  • L. Otrusina
  • P. Smrz
  • P. Dolog
  • W. Nejdl
  • E. Velasco

Research Organisations

External Research Organisations

  • Innovation Center Computer Assisted Surgery (ICCAS)
  • Niedersächsisches Landesgesundheitsamt
  • Brno University of Technology
  • Aalborg University
  • Robert Koch Institute (RKI)
View graph of relations

Details

Original languageEnglish
Pages (from-to)326-339
Number of pages14
JournalMethods of information in medicine
Volume52
Issue number4
Publication statusPublished - 13 Aug 2013

Abstract

Objectives: Detecting hints to public health threats as early as possible is crucial to prevent harm from the population. However, many disease surveillance strategies rely upon data whose collection requires explicit reporting (data transmitted from hospitals, laboratories or physicians). Collecting reports takes time so that the reaction time grows. Moreover, context information on individual cases is often lost in the collection process. This paper describes a system that tries to address these limitations by processing social media for identifying information on public health threats. The primary objective is to study the usefulness of the approach for supporting the monitoring of a population's health status. Methods: The developed system works in three main steps: Data from Twitter, blogs, and forums as well as from TV and radio channels are continuously collected and filtered by means of keyword lists. Sentences of relevant texts are classified relevant or irrelevant using a binary classifier based on support vector machines. By means of statistical methods known from biosurveillance, the relevant sentences are further analyzed and signals are generated automatically when unexpected behavior is detected. From the generated signals a subset is selected for presentation to a user by matching with user queries or profiles. In a set of evaluation experiments, public health experts assessed the generated signals with respect to correctness and relevancy. In particular, it was assessed how many relevant and irrelevant signals are generated during a specific time period. Results: The experiments show that the system provides information on health events identified in social media. Signals are mainly generated from Twitter messages posted by news agencies. Personal tweets, i.e. tweets from persons observing some symptoms, only play a minor role for signal generation given a limited volume of relevant messages. Relevant signals referring to real world outbreaks were generated by the system and monitored by epidemiologists for example during the European football championship. But, the number of relevant signals among generated signals is still very small: The different experiments yielded a proportion between 5 and 20% of signals regarded as "relevant" by the users. Vaccination or education campaigns communicated via Twitter as well as use of medical terms in other contexts than for outbreak reporting led to the generation of irrelevant signals. Conclusions: The aggregation of information into signals results in a reduction of monitoring effort compared to other existing systems. Against expectations, only few messages are of personal nature, reporting on personal symptoms. Instead, media reports are distributed over social media channels. Despite the high percentage of irrelevant signals generated by the system, the users reported that the effort in monitoring aggregated information in form of signals is less demanding than monitoring huge social-media data streams manually. It remains for the future to develop strategies for reducing false alarms.

Keywords

    Epidemic intelligence, Medicine 2.0, Population surveillance, Public health, Textmining, Web science

ASJC Scopus subject areas

Sustainable Development Goals

Cite this

How to exploit twitter for public health monitoring? / Denecke, Kerstin; Krieck, M.; Otrusina, L. et al.
In: Methods of information in medicine, Vol. 52, No. 4, 13.08.2013, p. 326-339.

Research output: Contribution to journalArticleResearchpeer review

Denecke, K, Krieck, M, Otrusina, L, Smrz, P, Dolog, P, Nejdl, W & Velasco, E 2013, 'How to exploit twitter for public health monitoring?', Methods of information in medicine, vol. 52, no. 4, pp. 326-339. https://doi.org/10.3414/ME12-02-0010
Denecke, K., Krieck, M., Otrusina, L., Smrz, P., Dolog, P., Nejdl, W., & Velasco, E. (2013). How to exploit twitter for public health monitoring? Methods of information in medicine, 52(4), 326-339. https://doi.org/10.3414/ME12-02-0010
Denecke K, Krieck M, Otrusina L, Smrz P, Dolog P, Nejdl W et al. How to exploit twitter for public health monitoring? Methods of information in medicine. 2013 Aug 13;52(4):326-339. doi: 10.3414/ME12-02-0010
Denecke, Kerstin ; Krieck, M. ; Otrusina, L. et al. / How to exploit twitter for public health monitoring?. In: Methods of information in medicine. 2013 ; Vol. 52, No. 4. pp. 326-339.
Download
@article{dee9f84fc39c43c8a5a4f086515d186d,
title = "How to exploit twitter for public health monitoring?",
abstract = "Objectives: Detecting hints to public health threats as early as possible is crucial to prevent harm from the population. However, many disease surveillance strategies rely upon data whose collection requires explicit reporting (data transmitted from hospitals, laboratories or physicians). Collecting reports takes time so that the reaction time grows. Moreover, context information on individual cases is often lost in the collection process. This paper describes a system that tries to address these limitations by processing social media for identifying information on public health threats. The primary objective is to study the usefulness of the approach for supporting the monitoring of a population's health status. Methods: The developed system works in three main steps: Data from Twitter, blogs, and forums as well as from TV and radio channels are continuously collected and filtered by means of keyword lists. Sentences of relevant texts are classified relevant or irrelevant using a binary classifier based on support vector machines. By means of statistical methods known from biosurveillance, the relevant sentences are further analyzed and signals are generated automatically when unexpected behavior is detected. From the generated signals a subset is selected for presentation to a user by matching with user queries or profiles. In a set of evaluation experiments, public health experts assessed the generated signals with respect to correctness and relevancy. In particular, it was assessed how many relevant and irrelevant signals are generated during a specific time period. Results: The experiments show that the system provides information on health events identified in social media. Signals are mainly generated from Twitter messages posted by news agencies. Personal tweets, i.e. tweets from persons observing some symptoms, only play a minor role for signal generation given a limited volume of relevant messages. Relevant signals referring to real world outbreaks were generated by the system and monitored by epidemiologists for example during the European football championship. But, the number of relevant signals among generated signals is still very small: The different experiments yielded a proportion between 5 and 20% of signals regarded as {"}relevant{"} by the users. Vaccination or education campaigns communicated via Twitter as well as use of medical terms in other contexts than for outbreak reporting led to the generation of irrelevant signals. Conclusions: The aggregation of information into signals results in a reduction of monitoring effort compared to other existing systems. Against expectations, only few messages are of personal nature, reporting on personal symptoms. Instead, media reports are distributed over social media channels. Despite the high percentage of irrelevant signals generated by the system, the users reported that the effort in monitoring aggregated information in form of signals is less demanding than monitoring huge social-media data streams manually. It remains for the future to develop strategies for reducing false alarms.",
keywords = "Epidemic intelligence, Medicine 2.0, Population surveillance, Public health, Textmining, Web science",
author = "Kerstin Denecke and M. Krieck and L. Otrusina and P. Smrz and P. Dolog and W. Nejdl and E. Velasco",
year = "2013",
month = aug,
day = "13",
doi = "10.3414/ME12-02-0010",
language = "English",
volume = "52",
pages = "326--339",
journal = "Methods of information in medicine",
issn = "0026-1270",
publisher = "Schattauer GmbH",
number = "4",

}

Download

TY - JOUR

T1 - How to exploit twitter for public health monitoring?

AU - Denecke, Kerstin

AU - Krieck, M.

AU - Otrusina, L.

AU - Smrz, P.

AU - Dolog, P.

AU - Nejdl, W.

AU - Velasco, E.

PY - 2013/8/13

Y1 - 2013/8/13

N2 - Objectives: Detecting hints to public health threats as early as possible is crucial to prevent harm from the population. However, many disease surveillance strategies rely upon data whose collection requires explicit reporting (data transmitted from hospitals, laboratories or physicians). Collecting reports takes time so that the reaction time grows. Moreover, context information on individual cases is often lost in the collection process. This paper describes a system that tries to address these limitations by processing social media for identifying information on public health threats. The primary objective is to study the usefulness of the approach for supporting the monitoring of a population's health status. Methods: The developed system works in three main steps: Data from Twitter, blogs, and forums as well as from TV and radio channels are continuously collected and filtered by means of keyword lists. Sentences of relevant texts are classified relevant or irrelevant using a binary classifier based on support vector machines. By means of statistical methods known from biosurveillance, the relevant sentences are further analyzed and signals are generated automatically when unexpected behavior is detected. From the generated signals a subset is selected for presentation to a user by matching with user queries or profiles. In a set of evaluation experiments, public health experts assessed the generated signals with respect to correctness and relevancy. In particular, it was assessed how many relevant and irrelevant signals are generated during a specific time period. Results: The experiments show that the system provides information on health events identified in social media. Signals are mainly generated from Twitter messages posted by news agencies. Personal tweets, i.e. tweets from persons observing some symptoms, only play a minor role for signal generation given a limited volume of relevant messages. Relevant signals referring to real world outbreaks were generated by the system and monitored by epidemiologists for example during the European football championship. But, the number of relevant signals among generated signals is still very small: The different experiments yielded a proportion between 5 and 20% of signals regarded as "relevant" by the users. Vaccination or education campaigns communicated via Twitter as well as use of medical terms in other contexts than for outbreak reporting led to the generation of irrelevant signals. Conclusions: The aggregation of information into signals results in a reduction of monitoring effort compared to other existing systems. Against expectations, only few messages are of personal nature, reporting on personal symptoms. Instead, media reports are distributed over social media channels. Despite the high percentage of irrelevant signals generated by the system, the users reported that the effort in monitoring aggregated information in form of signals is less demanding than monitoring huge social-media data streams manually. It remains for the future to develop strategies for reducing false alarms.

AB - Objectives: Detecting hints to public health threats as early as possible is crucial to prevent harm from the population. However, many disease surveillance strategies rely upon data whose collection requires explicit reporting (data transmitted from hospitals, laboratories or physicians). Collecting reports takes time so that the reaction time grows. Moreover, context information on individual cases is often lost in the collection process. This paper describes a system that tries to address these limitations by processing social media for identifying information on public health threats. The primary objective is to study the usefulness of the approach for supporting the monitoring of a population's health status. Methods: The developed system works in three main steps: Data from Twitter, blogs, and forums as well as from TV and radio channels are continuously collected and filtered by means of keyword lists. Sentences of relevant texts are classified relevant or irrelevant using a binary classifier based on support vector machines. By means of statistical methods known from biosurveillance, the relevant sentences are further analyzed and signals are generated automatically when unexpected behavior is detected. From the generated signals a subset is selected for presentation to a user by matching with user queries or profiles. In a set of evaluation experiments, public health experts assessed the generated signals with respect to correctness and relevancy. In particular, it was assessed how many relevant and irrelevant signals are generated during a specific time period. Results: The experiments show that the system provides information on health events identified in social media. Signals are mainly generated from Twitter messages posted by news agencies. Personal tweets, i.e. tweets from persons observing some symptoms, only play a minor role for signal generation given a limited volume of relevant messages. Relevant signals referring to real world outbreaks were generated by the system and monitored by epidemiologists for example during the European football championship. But, the number of relevant signals among generated signals is still very small: The different experiments yielded a proportion between 5 and 20% of signals regarded as "relevant" by the users. Vaccination or education campaigns communicated via Twitter as well as use of medical terms in other contexts than for outbreak reporting led to the generation of irrelevant signals. Conclusions: The aggregation of information into signals results in a reduction of monitoring effort compared to other existing systems. Against expectations, only few messages are of personal nature, reporting on personal symptoms. Instead, media reports are distributed over social media channels. Despite the high percentage of irrelevant signals generated by the system, the users reported that the effort in monitoring aggregated information in form of signals is less demanding than monitoring huge social-media data streams manually. It remains for the future to develop strategies for reducing false alarms.

KW - Epidemic intelligence

KW - Medicine 2.0

KW - Population surveillance

KW - Public health

KW - Textmining

KW - Web science

UR - http://www.scopus.com/inward/record.url?scp=84881244211&partnerID=8YFLogxK

U2 - 10.3414/ME12-02-0010

DO - 10.3414/ME12-02-0010

M3 - Article

C2 - 23877537

AN - SCOPUS:84881244211

VL - 52

SP - 326

EP - 339

JO - Methods of information in medicine

JF - Methods of information in medicine

SN - 0026-1270

IS - 4

ER -

By the same author(s)