A comparison study on modeling of clustered and overdispersed count data for multiple comparisons

Jochen Kruppa; Ludwig Hothorn

doi:10.1080/02664763.2020.1788518

Details

Original language	English
Pages (from-to)	1-13
Number of pages	13
Journal	Journal of applied statistics
Volume	47
Issue number	16
Publication status	Published - 3 Jul 2020

Abstract

Data collected in various scientific fields are count data. One way to analyze such data is to compare the individual levels of the factor treatment using multiple comparisons. However, the measured individuals are often clustered–e.g. according to litter or rearing. This must be considered when estimating the parameters by a repeated measurement model. In addition, ignoring the overdispersion to which count data is prone leads to an increase of the type one error rate. We carry out simulation studies using several different data settings and compare different multiple contrast tests with parameter estimates from generalized estimation equations and generalized linear mixed models in order to observe coverage and rejection probabilities. We generate overdispersed, clustered count data in small samples as can be observed in many biological settings. We have found that the generalized estimation equations outperform generalized linear mixed models if the variance-sandwich estimator is correctly specified. Furthermore, generalized linear mixed models show problems with the convergence rate under certain data settings, but there are model implementations with lower implications exists. Finally, we use an example of genetic data to demonstrate the application of the multiple contrast test and the problems of ignoring strong overdispersion.

Keywords

Generalized estimation equations, generalized linear mixed models, overdispersion, repeated measurements, simultaneous contrast tests

ASJC Scopus subject areas

Mathematics(all)
Statistics and Probability
Decision Sciences(all)
Statistics, Probability and Uncertainty

Cite this

A comparison study on modeling of clustered and overdispersed count data for multiple comparisons. / Kruppa, Jochen; Hothorn, Ludwig.
In: Journal of applied statistics, Vol. 47, No. 16, 03.07.2020, p. 1-13.

Research output: Contribution to journal › Review article › Research › peer review

Kruppa, J & Hothorn, L 2020, 'A comparison study on modeling of clustered and overdispersed count data for multiple comparisons', Journal of applied statistics, vol. 47, no. 16, pp. 1-13. https://doi.org/10.1080/02664763.2020.1788518

Kruppa, J., & Hothorn, L. (2020). A comparison study on modeling of clustered and overdispersed count data for multiple comparisons. Journal of applied statistics, 47(16), 1-13. https://doi.org/10.1080/02664763.2020.1788518

Kruppa J, Hothorn L. A comparison study on modeling of clustered and overdispersed count data for multiple comparisons. Journal of applied statistics. 2020 Jul 3;47(16):1-13. doi: 10.1080/02664763.2020.1788518

Kruppa, Jochen ; Hothorn, Ludwig. / A comparison study on modeling of clustered and overdispersed count data for multiple comparisons. In: Journal of applied statistics. 2020 ; Vol. 47, No. 16. pp. 1-13.

Download

@article{2cead18def2c4cfeb5668a2c72688e1e,

title = "A comparison study on modeling of clustered and overdispersed count data for multiple comparisons",

abstract = "Data collected in various scientific fields are count data. One way to analyze such data is to compare the individual levels of the factor treatment using multiple comparisons. However, the measured individuals are often clustered–e.g. according to litter or rearing. This must be considered when estimating the parameters by a repeated measurement model. In addition, ignoring the overdispersion to which count data is prone leads to an increase of the type one error rate. We carry out simulation studies using several different data settings and compare different multiple contrast tests with parameter estimates from generalized estimation equations and generalized linear mixed models in order to observe coverage and rejection probabilities. We generate overdispersed, clustered count data in small samples as can be observed in many biological settings. We have found that the generalized estimation equations outperform generalized linear mixed models if the variance-sandwich estimator is correctly specified. Furthermore, generalized linear mixed models show problems with the convergence rate under certain data settings, but there are model implementations with lower implications exists. Finally, we use an example of genetic data to demonstrate the application of the multiple contrast test and the problems of ignoring strong overdispersion.",

keywords = "Generalized estimation equations, generalized linear mixed models, overdispersion, repeated measurements, simultaneous contrast tests",

author = "Jochen Kruppa and Ludwig Hothorn",

note = "Funding Information: We would like to thank Prof. Dr Thomas Debener (Institute for Plant Genetics, Leibniz Universit{\"a}t Hannover, Hannover, Germany) for the provision of the sample genetic data set. ",

year = "2020",

month = jul,

day = "3",

doi = "10.1080/02664763.2020.1788518",

language = "English",

volume = "47",

pages = "1--13",

journal = "Journal of applied statistics",

issn = "0266-4763",

publisher = "Routledge",

number = "16",

}

Download

TY - JOUR

T1 - A comparison study on modeling of clustered and overdispersed count data for multiple comparisons

AU - Kruppa, Jochen

AU - Hothorn, Ludwig

N1 - Funding Information: We would like to thank Prof. Dr Thomas Debener (Institute for Plant Genetics, Leibniz Universität Hannover, Hannover, Germany) for the provision of the sample genetic data set.

PY - 2020/7/3

Y1 - 2020/7/3

N2 - Data collected in various scientific fields are count data. One way to analyze such data is to compare the individual levels of the factor treatment using multiple comparisons. However, the measured individuals are often clustered–e.g. according to litter or rearing. This must be considered when estimating the parameters by a repeated measurement model. In addition, ignoring the overdispersion to which count data is prone leads to an increase of the type one error rate. We carry out simulation studies using several different data settings and compare different multiple contrast tests with parameter estimates from generalized estimation equations and generalized linear mixed models in order to observe coverage and rejection probabilities. We generate overdispersed, clustered count data in small samples as can be observed in many biological settings. We have found that the generalized estimation equations outperform generalized linear mixed models if the variance-sandwich estimator is correctly specified. Furthermore, generalized linear mixed models show problems with the convergence rate under certain data settings, but there are model implementations with lower implications exists. Finally, we use an example of genetic data to demonstrate the application of the multiple contrast test and the problems of ignoring strong overdispersion.

AB - Data collected in various scientific fields are count data. One way to analyze such data is to compare the individual levels of the factor treatment using multiple comparisons. However, the measured individuals are often clustered–e.g. according to litter or rearing. This must be considered when estimating the parameters by a repeated measurement model. In addition, ignoring the overdispersion to which count data is prone leads to an increase of the type one error rate. We carry out simulation studies using several different data settings and compare different multiple contrast tests with parameter estimates from generalized estimation equations and generalized linear mixed models in order to observe coverage and rejection probabilities. We generate overdispersed, clustered count data in small samples as can be observed in many biological settings. We have found that the generalized estimation equations outperform generalized linear mixed models if the variance-sandwich estimator is correctly specified. Furthermore, generalized linear mixed models show problems with the convergence rate under certain data settings, but there are model implementations with lower implications exists. Finally, we use an example of genetic data to demonstrate the application of the multiple contrast test and the problems of ignoring strong overdispersion.

KW - Generalized estimation equations

KW - generalized linear mixed models

KW - overdispersion

KW - repeated measurements

KW - simultaneous contrast tests

UR - http://www.scopus.com/inward/record.url?scp=85087635451&partnerID=8YFLogxK

U2 - 10.1080/02664763.2020.1788518

DO - 10.1080/02664763.2020.1788518

M3 - Review article

AN - SCOPUS:85087635451

VL - 47

SP - 1

EP - 13

JO - Journal of applied statistics

JF - Journal of applied statistics

SN - 0266-4763

IS - 16

ER -

Research@Leibniz University

A comparison study on modeling of clustered and overdispersed count data for multiple comparisons

Authors

Research Organisations

External Research Organisations

Details

Abstract

Keywords

ASJC Scopus subject areas

Cite this