Details
Originalsprache | Englisch |
---|---|
Seiten (von - bis) | 1-13 |
Seitenumfang | 13 |
Fachzeitschrift | Journal of applied statistics |
Jahrgang | 47 |
Ausgabenummer | 16 |
Publikationsstatus | Veröffentlicht - 3 Juli 2020 |
Abstract
Data collected in various scientific fields are count data. One way to analyze such data is to compare the individual levels of the factor treatment using multiple comparisons. However, the measured individuals are often clustered–e.g. according to litter or rearing. This must be considered when estimating the parameters by a repeated measurement model. In addition, ignoring the overdispersion to which count data is prone leads to an increase of the type one error rate. We carry out simulation studies using several different data settings and compare different multiple contrast tests with parameter estimates from generalized estimation equations and generalized linear mixed models in order to observe coverage and rejection probabilities. We generate overdispersed, clustered count data in small samples as can be observed in many biological settings. We have found that the generalized estimation equations outperform generalized linear mixed models if the variance-sandwich estimator is correctly specified. Furthermore, generalized linear mixed models show problems with the convergence rate under certain data settings, but there are model implementations with lower implications exists. Finally, we use an example of genetic data to demonstrate the application of the multiple contrast test and the problems of ignoring strong overdispersion.
ASJC Scopus Sachgebiete
- Mathematik (insg.)
- Statistik und Wahrscheinlichkeit
- Entscheidungswissenschaften (insg.)
- Statistik, Wahrscheinlichkeit und Ungewissheit
Zitieren
- Standard
- Harvard
- Apa
- Vancouver
- BibTex
- RIS
in: Journal of applied statistics, Jahrgang 47, Nr. 16, 03.07.2020, S. 1-13.
Publikation: Beitrag in Fachzeitschrift › Übersichtsarbeit › Forschung › Peer-Review
}
TY - JOUR
T1 - A comparison study on modeling of clustered and overdispersed count data for multiple comparisons
AU - Kruppa, Jochen
AU - Hothorn, Ludwig
N1 - Funding Information: We would like to thank Prof. Dr Thomas Debener (Institute for Plant Genetics, Leibniz Universität Hannover, Hannover, Germany) for the provision of the sample genetic data set.
PY - 2020/7/3
Y1 - 2020/7/3
N2 - Data collected in various scientific fields are count data. One way to analyze such data is to compare the individual levels of the factor treatment using multiple comparisons. However, the measured individuals are often clustered–e.g. according to litter or rearing. This must be considered when estimating the parameters by a repeated measurement model. In addition, ignoring the overdispersion to which count data is prone leads to an increase of the type one error rate. We carry out simulation studies using several different data settings and compare different multiple contrast tests with parameter estimates from generalized estimation equations and generalized linear mixed models in order to observe coverage and rejection probabilities. We generate overdispersed, clustered count data in small samples as can be observed in many biological settings. We have found that the generalized estimation equations outperform generalized linear mixed models if the variance-sandwich estimator is correctly specified. Furthermore, generalized linear mixed models show problems with the convergence rate under certain data settings, but there are model implementations with lower implications exists. Finally, we use an example of genetic data to demonstrate the application of the multiple contrast test and the problems of ignoring strong overdispersion.
AB - Data collected in various scientific fields are count data. One way to analyze such data is to compare the individual levels of the factor treatment using multiple comparisons. However, the measured individuals are often clustered–e.g. according to litter or rearing. This must be considered when estimating the parameters by a repeated measurement model. In addition, ignoring the overdispersion to which count data is prone leads to an increase of the type one error rate. We carry out simulation studies using several different data settings and compare different multiple contrast tests with parameter estimates from generalized estimation equations and generalized linear mixed models in order to observe coverage and rejection probabilities. We generate overdispersed, clustered count data in small samples as can be observed in many biological settings. We have found that the generalized estimation equations outperform generalized linear mixed models if the variance-sandwich estimator is correctly specified. Furthermore, generalized linear mixed models show problems with the convergence rate under certain data settings, but there are model implementations with lower implications exists. Finally, we use an example of genetic data to demonstrate the application of the multiple contrast test and the problems of ignoring strong overdispersion.
KW - Generalized estimation equations
KW - generalized linear mixed models
KW - overdispersion
KW - repeated measurements
KW - simultaneous contrast tests
UR - http://www.scopus.com/inward/record.url?scp=85087635451&partnerID=8YFLogxK
U2 - 10.1080/02664763.2020.1788518
DO - 10.1080/02664763.2020.1788518
M3 - Review article
AN - SCOPUS:85087635451
VL - 47
SP - 1
EP - 13
JO - Journal of applied statistics
JF - Journal of applied statistics
SN - 0266-4763
IS - 16
ER -