Details
Original language | English |
---|---|
Article number | 178149 |
Journal | Science of the Total Environment |
Volume | 959 |
Early online date | 24 Dec 2024 |
Publication status | Published - 10 Jan 2025 |
Abstract
With the beginning of the COVID-19 pandemic, wastewater-based epidemiology (WBE), which according to Larsen et al. (2021), describes the science of linking pathogens and chemicals found in wastewater to population-level health, received an enormous boost worldwide. The basic procedure in WBE is to analyse pathogen concentrations and to relate these measurements to cases from clinical data. This prediction of cases is subject to large errors, due to various factors such as dilution effects, decay or wastewater matrix and inhibitors. In this study we used different models to identify the most important, what we call, wastewater-based epidemiologically relevant parameters (WBERP) to describe these errors. We used linear regression and random forest regression as base models for predicting cases and random forest regression also to analyse the importance of different WBERP. Two catchments, one with a large proportion of combined sewers and one with separate sewers, served as study areas. Our results show that the most important information to be included in any model are the variants of concern (VOCs), a time-variable parameter. The performance for both catchments is improved by ~30 % in terms of root mean square error when the VOCs are used as additional information. For practical applications, this is a real drawback as it means that every time a new pathogen variant becomes dominant, we need to know the specific behaviour of the variant in the wastewater and its detection in order to interpret the WBE data correctly. This limits the predictive capabilities of such systems, perhaps not in terms of dynamics but for quantitative statements. The addition of other physicochemical parameters and faecal markers only marginally improved the results. Furthermore, there were differences in the importance of the parameters between the catchments, which limits the generalisability of the conclusions. The results show that more complex wastewater matrices (high proportion of combined sewer system) influence the relationship between pathogen concentration and medical cases more than those of less complex wastewater matrices (separate sewer system).
Keywords
- Machine learning, Random forest, SARS-CoV-2, Wastewater-based epidemiology
ASJC Scopus subject areas
- Environmental Science(all)
- Environmental Engineering
- Environmental Science(all)
- Environmental Chemistry
- Environmental Science(all)
- Waste Management and Disposal
- Environmental Science(all)
- Pollution
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
In: Science of the Total Environment, Vol. 959, 178149, 10.01.2025.
Research output: Contribution to journal › Article › Research › peer review
}
TY - JOUR
T1 - A multivariate analysis to explain residue errors in pathogen concentration in wastewater-based epidemiology
AU - Wallner, Markus
AU - Müller, Omar V.
AU - Goméz, Andrea A.
AU - Joost, Ingeborg
AU - Düker, Urda
AU - Klawonn, Frank
AU - Nogueira, Regina
N1 - Publisher Copyright: © 2024
PY - 2025/1/10
Y1 - 2025/1/10
N2 - With the beginning of the COVID-19 pandemic, wastewater-based epidemiology (WBE), which according to Larsen et al. (2021), describes the science of linking pathogens and chemicals found in wastewater to population-level health, received an enormous boost worldwide. The basic procedure in WBE is to analyse pathogen concentrations and to relate these measurements to cases from clinical data. This prediction of cases is subject to large errors, due to various factors such as dilution effects, decay or wastewater matrix and inhibitors. In this study we used different models to identify the most important, what we call, wastewater-based epidemiologically relevant parameters (WBERP) to describe these errors. We used linear regression and random forest regression as base models for predicting cases and random forest regression also to analyse the importance of different WBERP. Two catchments, one with a large proportion of combined sewers and one with separate sewers, served as study areas. Our results show that the most important information to be included in any model are the variants of concern (VOCs), a time-variable parameter. The performance for both catchments is improved by ~30 % in terms of root mean square error when the VOCs are used as additional information. For practical applications, this is a real drawback as it means that every time a new pathogen variant becomes dominant, we need to know the specific behaviour of the variant in the wastewater and its detection in order to interpret the WBE data correctly. This limits the predictive capabilities of such systems, perhaps not in terms of dynamics but for quantitative statements. The addition of other physicochemical parameters and faecal markers only marginally improved the results. Furthermore, there were differences in the importance of the parameters between the catchments, which limits the generalisability of the conclusions. The results show that more complex wastewater matrices (high proportion of combined sewer system) influence the relationship between pathogen concentration and medical cases more than those of less complex wastewater matrices (separate sewer system).
AB - With the beginning of the COVID-19 pandemic, wastewater-based epidemiology (WBE), which according to Larsen et al. (2021), describes the science of linking pathogens and chemicals found in wastewater to population-level health, received an enormous boost worldwide. The basic procedure in WBE is to analyse pathogen concentrations and to relate these measurements to cases from clinical data. This prediction of cases is subject to large errors, due to various factors such as dilution effects, decay or wastewater matrix and inhibitors. In this study we used different models to identify the most important, what we call, wastewater-based epidemiologically relevant parameters (WBERP) to describe these errors. We used linear regression and random forest regression as base models for predicting cases and random forest regression also to analyse the importance of different WBERP. Two catchments, one with a large proportion of combined sewers and one with separate sewers, served as study areas. Our results show that the most important information to be included in any model are the variants of concern (VOCs), a time-variable parameter. The performance for both catchments is improved by ~30 % in terms of root mean square error when the VOCs are used as additional information. For practical applications, this is a real drawback as it means that every time a new pathogen variant becomes dominant, we need to know the specific behaviour of the variant in the wastewater and its detection in order to interpret the WBE data correctly. This limits the predictive capabilities of such systems, perhaps not in terms of dynamics but for quantitative statements. The addition of other physicochemical parameters and faecal markers only marginally improved the results. Furthermore, there were differences in the importance of the parameters between the catchments, which limits the generalisability of the conclusions. The results show that more complex wastewater matrices (high proportion of combined sewer system) influence the relationship between pathogen concentration and medical cases more than those of less complex wastewater matrices (separate sewer system).
KW - Machine learning
KW - Random forest
KW - SARS-CoV-2
KW - Wastewater-based epidemiology
UR - http://www.scopus.com/inward/record.url?scp=85212832034&partnerID=8YFLogxK
U2 - 10.1016/j.scitotenv.2024.178149
DO - 10.1016/j.scitotenv.2024.178149
M3 - Article
AN - SCOPUS:85212832034
VL - 959
JO - Science of the Total Environment
JF - Science of the Total Environment
SN - 0048-9697
M1 - 178149
ER -