Details
Original language | English |
---|---|
Pages (from-to) | 1-13 |
Number of pages | 13 |
Journal | Journal of applied statistics |
Volume | 47 |
Issue number | 16 |
Publication status | Published - 3 Jul 2020 |
Abstract
Data collected in various scientific fields are count data. One way to analyze such data is to compare the individual levels of the factor treatment using multiple comparisons. However, the measured individuals are often clustered–e.g. according to litter or rearing. This must be considered when estimating the parameters by a repeated measurement model. In addition, ignoring the overdispersion to which count data is prone leads to an increase of the type one error rate. We carry out simulation studies using several different data settings and compare different multiple contrast tests with parameter estimates from generalized estimation equations and generalized linear mixed models in order to observe coverage and rejection probabilities. We generate overdispersed, clustered count data in small samples as can be observed in many biological settings. We have found that the generalized estimation equations outperform generalized linear mixed models if the variance-sandwich estimator is correctly specified. Furthermore, generalized linear mixed models show problems with the convergence rate under certain data settings, but there are model implementations with lower implications exists. Finally, we use an example of genetic data to demonstrate the application of the multiple contrast test and the problems of ignoring strong overdispersion.
Keywords
- Generalized estimation equations, generalized linear mixed models, overdispersion, repeated measurements, simultaneous contrast tests
ASJC Scopus subject areas
- Mathematics(all)
- Statistics and Probability
- Decision Sciences(all)
- Statistics, Probability and Uncertainty
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
In: Journal of applied statistics, Vol. 47, No. 16, 03.07.2020, p. 1-13.
Research output: Contribution to journal › Review article › Research › peer review
}
TY - JOUR
T1 - A comparison study on modeling of clustered and overdispersed count data for multiple comparisons
AU - Kruppa, Jochen
AU - Hothorn, Ludwig
N1 - Funding Information: We would like to thank Prof. Dr Thomas Debener (Institute for Plant Genetics, Leibniz Universität Hannover, Hannover, Germany) for the provision of the sample genetic data set.
PY - 2020/7/3
Y1 - 2020/7/3
N2 - Data collected in various scientific fields are count data. One way to analyze such data is to compare the individual levels of the factor treatment using multiple comparisons. However, the measured individuals are often clustered–e.g. according to litter or rearing. This must be considered when estimating the parameters by a repeated measurement model. In addition, ignoring the overdispersion to which count data is prone leads to an increase of the type one error rate. We carry out simulation studies using several different data settings and compare different multiple contrast tests with parameter estimates from generalized estimation equations and generalized linear mixed models in order to observe coverage and rejection probabilities. We generate overdispersed, clustered count data in small samples as can be observed in many biological settings. We have found that the generalized estimation equations outperform generalized linear mixed models if the variance-sandwich estimator is correctly specified. Furthermore, generalized linear mixed models show problems with the convergence rate under certain data settings, but there are model implementations with lower implications exists. Finally, we use an example of genetic data to demonstrate the application of the multiple contrast test and the problems of ignoring strong overdispersion.
AB - Data collected in various scientific fields are count data. One way to analyze such data is to compare the individual levels of the factor treatment using multiple comparisons. However, the measured individuals are often clustered–e.g. according to litter or rearing. This must be considered when estimating the parameters by a repeated measurement model. In addition, ignoring the overdispersion to which count data is prone leads to an increase of the type one error rate. We carry out simulation studies using several different data settings and compare different multiple contrast tests with parameter estimates from generalized estimation equations and generalized linear mixed models in order to observe coverage and rejection probabilities. We generate overdispersed, clustered count data in small samples as can be observed in many biological settings. We have found that the generalized estimation equations outperform generalized linear mixed models if the variance-sandwich estimator is correctly specified. Furthermore, generalized linear mixed models show problems with the convergence rate under certain data settings, but there are model implementations with lower implications exists. Finally, we use an example of genetic data to demonstrate the application of the multiple contrast test and the problems of ignoring strong overdispersion.
KW - Generalized estimation equations
KW - generalized linear mixed models
KW - overdispersion
KW - repeated measurements
KW - simultaneous contrast tests
UR - http://www.scopus.com/inward/record.url?scp=85087635451&partnerID=8YFLogxK
U2 - 10.1080/02664763.2020.1788518
DO - 10.1080/02664763.2020.1788518
M3 - Review article
AN - SCOPUS:85087635451
VL - 47
SP - 1
EP - 13
JO - Journal of applied statistics
JF - Journal of applied statistics
SN - 0266-4763
IS - 16
ER -