A comparison study on modeling of clustered and overdispersed count data for multiple comparisons

Research output: Contribution to journalReview articleResearchpeer review

Authors

  • Jochen Kruppa
  • Ludwig Hothorn

Research Organisations

External Research Organisations

  • Charité - Universitätsmedizin Berlin
View graph of relations

Details

Original languageEnglish
Pages (from-to)1-13
Number of pages13
JournalJournal of applied statistics
Volume47
Issue number16
Publication statusPublished - 3 Jul 2020

Abstract

Data collected in various scientific fields are count data. One way to analyze such data is to compare the individual levels of the factor treatment using multiple comparisons. However, the measured individuals are often clustered–e.g. according to litter or rearing. This must be considered when estimating the parameters by a repeated measurement model. In addition, ignoring the overdispersion to which count data is prone leads to an increase of the type one error rate. We carry out simulation studies using several different data settings and compare different multiple contrast tests with parameter estimates from generalized estimation equations and generalized linear mixed models in order to observe coverage and rejection probabilities. We generate overdispersed, clustered count data in small samples as can be observed in many biological settings. We have found that the generalized estimation equations outperform generalized linear mixed models if the variance-sandwich estimator is correctly specified. Furthermore, generalized linear mixed models show problems with the convergence rate under certain data settings, but there are model implementations with lower implications exists. Finally, we use an example of genetic data to demonstrate the application of the multiple contrast test and the problems of ignoring strong overdispersion.

Keywords

    Generalized estimation equations, generalized linear mixed models, overdispersion, repeated measurements, simultaneous contrast tests

ASJC Scopus subject areas

Cite this

A comparison study on modeling of clustered and overdispersed count data for multiple comparisons. / Kruppa, Jochen; Hothorn, Ludwig.
In: Journal of applied statistics, Vol. 47, No. 16, 03.07.2020, p. 1-13.

Research output: Contribution to journalReview articleResearchpeer review

Download
@article{2cead18def2c4cfeb5668a2c72688e1e,
title = "A comparison study on modeling of clustered and overdispersed count data for multiple comparisons",
abstract = "Data collected in various scientific fields are count data. One way to analyze such data is to compare the individual levels of the factor treatment using multiple comparisons. However, the measured individuals are often clustered–e.g. according to litter or rearing. This must be considered when estimating the parameters by a repeated measurement model. In addition, ignoring the overdispersion to which count data is prone leads to an increase of the type one error rate. We carry out simulation studies using several different data settings and compare different multiple contrast tests with parameter estimates from generalized estimation equations and generalized linear mixed models in order to observe coverage and rejection probabilities. We generate overdispersed, clustered count data in small samples as can be observed in many biological settings. We have found that the generalized estimation equations outperform generalized linear mixed models if the variance-sandwich estimator is correctly specified. Furthermore, generalized linear mixed models show problems with the convergence rate under certain data settings, but there are model implementations with lower implications exists. Finally, we use an example of genetic data to demonstrate the application of the multiple contrast test and the problems of ignoring strong overdispersion.",
keywords = "Generalized estimation equations, generalized linear mixed models, overdispersion, repeated measurements, simultaneous contrast tests",
author = "Jochen Kruppa and Ludwig Hothorn",
note = "Funding Information: We would like to thank Prof. Dr Thomas Debener (Institute for Plant Genetics, Leibniz Universit{\"a}t Hannover, Hannover, Germany) for the provision of the sample genetic data set. ",
year = "2020",
month = jul,
day = "3",
doi = "10.1080/02664763.2020.1788518",
language = "English",
volume = "47",
pages = "1--13",
journal = "Journal of applied statistics",
issn = "0266-4763",
publisher = "Routledge",
number = "16",

}

Download

TY - JOUR

T1 - A comparison study on modeling of clustered and overdispersed count data for multiple comparisons

AU - Kruppa, Jochen

AU - Hothorn, Ludwig

N1 - Funding Information: We would like to thank Prof. Dr Thomas Debener (Institute for Plant Genetics, Leibniz Universität Hannover, Hannover, Germany) for the provision of the sample genetic data set.

PY - 2020/7/3

Y1 - 2020/7/3

N2 - Data collected in various scientific fields are count data. One way to analyze such data is to compare the individual levels of the factor treatment using multiple comparisons. However, the measured individuals are often clustered–e.g. according to litter or rearing. This must be considered when estimating the parameters by a repeated measurement model. In addition, ignoring the overdispersion to which count data is prone leads to an increase of the type one error rate. We carry out simulation studies using several different data settings and compare different multiple contrast tests with parameter estimates from generalized estimation equations and generalized linear mixed models in order to observe coverage and rejection probabilities. We generate overdispersed, clustered count data in small samples as can be observed in many biological settings. We have found that the generalized estimation equations outperform generalized linear mixed models if the variance-sandwich estimator is correctly specified. Furthermore, generalized linear mixed models show problems with the convergence rate under certain data settings, but there are model implementations with lower implications exists. Finally, we use an example of genetic data to demonstrate the application of the multiple contrast test and the problems of ignoring strong overdispersion.

AB - Data collected in various scientific fields are count data. One way to analyze such data is to compare the individual levels of the factor treatment using multiple comparisons. However, the measured individuals are often clustered–e.g. according to litter or rearing. This must be considered when estimating the parameters by a repeated measurement model. In addition, ignoring the overdispersion to which count data is prone leads to an increase of the type one error rate. We carry out simulation studies using several different data settings and compare different multiple contrast tests with parameter estimates from generalized estimation equations and generalized linear mixed models in order to observe coverage and rejection probabilities. We generate overdispersed, clustered count data in small samples as can be observed in many biological settings. We have found that the generalized estimation equations outperform generalized linear mixed models if the variance-sandwich estimator is correctly specified. Furthermore, generalized linear mixed models show problems with the convergence rate under certain data settings, but there are model implementations with lower implications exists. Finally, we use an example of genetic data to demonstrate the application of the multiple contrast test and the problems of ignoring strong overdispersion.

KW - Generalized estimation equations

KW - generalized linear mixed models

KW - overdispersion

KW - repeated measurements

KW - simultaneous contrast tests

UR - http://www.scopus.com/inward/record.url?scp=85087635451&partnerID=8YFLogxK

U2 - 10.1080/02664763.2020.1788518

DO - 10.1080/02664763.2020.1788518

M3 - Review article

AN - SCOPUS:85087635451

VL - 47

SP - 1

EP - 13

JO - Journal of applied statistics

JF - Journal of applied statistics

SN - 0266-4763

IS - 16

ER -