A multivariate analysis to explain residue errors in pathogen concentration in wastewater-based epidemiology

Research output: Contribution to journalArticleResearchpeer review

Authors

  • Markus Wallner
  • Omar V. Müller
  • Andrea A. Goméz
  • Ingeborg Joost
  • Urda Düker
  • Frank Klawonn
  • Regina Nogueira

External Research Organisations

  • Ostfalia University of Applied Sciences
  • Universidad Nacional del Litoral
View graph of relations

Details

Original languageEnglish
Article number178149
JournalScience of the Total Environment
Volume959
Early online date24 Dec 2024
Publication statusPublished - 10 Jan 2025

Abstract

With the beginning of the COVID-19 pandemic, wastewater-based epidemiology (WBE), which according to Larsen et al. (2021), describes the science of linking pathogens and chemicals found in wastewater to population-level health, received an enormous boost worldwide. The basic procedure in WBE is to analyse pathogen concentrations and to relate these measurements to cases from clinical data. This prediction of cases is subject to large errors, due to various factors such as dilution effects, decay or wastewater matrix and inhibitors. In this study we used different models to identify the most important, what we call, wastewater-based epidemiologically relevant parameters (WBERP) to describe these errors. We used linear regression and random forest regression as base models for predicting cases and random forest regression also to analyse the importance of different WBERP. Two catchments, one with a large proportion of combined sewers and one with separate sewers, served as study areas. Our results show that the most important information to be included in any model are the variants of concern (VOCs), a time-variable parameter. The performance for both catchments is improved by ~30 % in terms of root mean square error when the VOCs are used as additional information. For practical applications, this is a real drawback as it means that every time a new pathogen variant becomes dominant, we need to know the specific behaviour of the variant in the wastewater and its detection in order to interpret the WBE data correctly. This limits the predictive capabilities of such systems, perhaps not in terms of dynamics but for quantitative statements. The addition of other physicochemical parameters and faecal markers only marginally improved the results. Furthermore, there were differences in the importance of the parameters between the catchments, which limits the generalisability of the conclusions. The results show that more complex wastewater matrices (high proportion of combined sewer system) influence the relationship between pathogen concentration and medical cases more than those of less complex wastewater matrices (separate sewer system).

Keywords

    Machine learning, Random forest, SARS-CoV-2, Wastewater-based epidemiology

ASJC Scopus subject areas

Cite this

A multivariate analysis to explain residue errors in pathogen concentration in wastewater-based epidemiology. / Wallner, Markus; Müller, Omar V.; Goméz, Andrea A. et al.
In: Science of the Total Environment, Vol. 959, 178149, 10.01.2025.

Research output: Contribution to journalArticleResearchpeer review

Wallner M, Müller OV, Goméz AA, Joost I, Düker U, Klawonn F et al. A multivariate analysis to explain residue errors in pathogen concentration in wastewater-based epidemiology. Science of the Total Environment. 2025 Jan 10;959:178149. Epub 2024 Dec 24. doi: 10.1016/j.scitotenv.2024.178149
Download
@article{78f1a4d39e30457f95da2f94f3294d88,
title = "A multivariate analysis to explain residue errors in pathogen concentration in wastewater-based epidemiology",
abstract = "With the beginning of the COVID-19 pandemic, wastewater-based epidemiology (WBE), which according to Larsen et al. (2021), describes the science of linking pathogens and chemicals found in wastewater to population-level health, received an enormous boost worldwide. The basic procedure in WBE is to analyse pathogen concentrations and to relate these measurements to cases from clinical data. This prediction of cases is subject to large errors, due to various factors such as dilution effects, decay or wastewater matrix and inhibitors. In this study we used different models to identify the most important, what we call, wastewater-based epidemiologically relevant parameters (WBERP) to describe these errors. We used linear regression and random forest regression as base models for predicting cases and random forest regression also to analyse the importance of different WBERP. Two catchments, one with a large proportion of combined sewers and one with separate sewers, served as study areas. Our results show that the most important information to be included in any model are the variants of concern (VOCs), a time-variable parameter. The performance for both catchments is improved by ~30 % in terms of root mean square error when the VOCs are used as additional information. For practical applications, this is a real drawback as it means that every time a new pathogen variant becomes dominant, we need to know the specific behaviour of the variant in the wastewater and its detection in order to interpret the WBE data correctly. This limits the predictive capabilities of such systems, perhaps not in terms of dynamics but for quantitative statements. The addition of other physicochemical parameters and faecal markers only marginally improved the results. Furthermore, there were differences in the importance of the parameters between the catchments, which limits the generalisability of the conclusions. The results show that more complex wastewater matrices (high proportion of combined sewer system) influence the relationship between pathogen concentration and medical cases more than those of less complex wastewater matrices (separate sewer system).",
keywords = "Machine learning, Random forest, SARS-CoV-2, Wastewater-based epidemiology",
author = "Markus Wallner and M{\"u}ller, {Omar V.} and Gom{\'e}z, {Andrea A.} and Ingeborg Joost and Urda D{\"u}ker and Frank Klawonn and Regina Nogueira",
note = "Publisher Copyright: {\textcopyright} 2024",
year = "2025",
month = jan,
day = "10",
doi = "10.1016/j.scitotenv.2024.178149",
language = "English",
volume = "959",
journal = "Science of the Total Environment",
issn = "0048-9697",
publisher = "Elsevier",

}

Download

TY - JOUR

T1 - A multivariate analysis to explain residue errors in pathogen concentration in wastewater-based epidemiology

AU - Wallner, Markus

AU - Müller, Omar V.

AU - Goméz, Andrea A.

AU - Joost, Ingeborg

AU - Düker, Urda

AU - Klawonn, Frank

AU - Nogueira, Regina

N1 - Publisher Copyright: © 2024

PY - 2025/1/10

Y1 - 2025/1/10

N2 - With the beginning of the COVID-19 pandemic, wastewater-based epidemiology (WBE), which according to Larsen et al. (2021), describes the science of linking pathogens and chemicals found in wastewater to population-level health, received an enormous boost worldwide. The basic procedure in WBE is to analyse pathogen concentrations and to relate these measurements to cases from clinical data. This prediction of cases is subject to large errors, due to various factors such as dilution effects, decay or wastewater matrix and inhibitors. In this study we used different models to identify the most important, what we call, wastewater-based epidemiologically relevant parameters (WBERP) to describe these errors. We used linear regression and random forest regression as base models for predicting cases and random forest regression also to analyse the importance of different WBERP. Two catchments, one with a large proportion of combined sewers and one with separate sewers, served as study areas. Our results show that the most important information to be included in any model are the variants of concern (VOCs), a time-variable parameter. The performance for both catchments is improved by ~30 % in terms of root mean square error when the VOCs are used as additional information. For practical applications, this is a real drawback as it means that every time a new pathogen variant becomes dominant, we need to know the specific behaviour of the variant in the wastewater and its detection in order to interpret the WBE data correctly. This limits the predictive capabilities of such systems, perhaps not in terms of dynamics but for quantitative statements. The addition of other physicochemical parameters and faecal markers only marginally improved the results. Furthermore, there were differences in the importance of the parameters between the catchments, which limits the generalisability of the conclusions. The results show that more complex wastewater matrices (high proportion of combined sewer system) influence the relationship between pathogen concentration and medical cases more than those of less complex wastewater matrices (separate sewer system).

AB - With the beginning of the COVID-19 pandemic, wastewater-based epidemiology (WBE), which according to Larsen et al. (2021), describes the science of linking pathogens and chemicals found in wastewater to population-level health, received an enormous boost worldwide. The basic procedure in WBE is to analyse pathogen concentrations and to relate these measurements to cases from clinical data. This prediction of cases is subject to large errors, due to various factors such as dilution effects, decay or wastewater matrix and inhibitors. In this study we used different models to identify the most important, what we call, wastewater-based epidemiologically relevant parameters (WBERP) to describe these errors. We used linear regression and random forest regression as base models for predicting cases and random forest regression also to analyse the importance of different WBERP. Two catchments, one with a large proportion of combined sewers and one with separate sewers, served as study areas. Our results show that the most important information to be included in any model are the variants of concern (VOCs), a time-variable parameter. The performance for both catchments is improved by ~30 % in terms of root mean square error when the VOCs are used as additional information. For practical applications, this is a real drawback as it means that every time a new pathogen variant becomes dominant, we need to know the specific behaviour of the variant in the wastewater and its detection in order to interpret the WBE data correctly. This limits the predictive capabilities of such systems, perhaps not in terms of dynamics but for quantitative statements. The addition of other physicochemical parameters and faecal markers only marginally improved the results. Furthermore, there were differences in the importance of the parameters between the catchments, which limits the generalisability of the conclusions. The results show that more complex wastewater matrices (high proportion of combined sewer system) influence the relationship between pathogen concentration and medical cases more than those of less complex wastewater matrices (separate sewer system).

KW - Machine learning

KW - Random forest

KW - SARS-CoV-2

KW - Wastewater-based epidemiology

UR - http://www.scopus.com/inward/record.url?scp=85212832034&partnerID=8YFLogxK

U2 - 10.1016/j.scitotenv.2024.178149

DO - 10.1016/j.scitotenv.2024.178149

M3 - Article

AN - SCOPUS:85212832034

VL - 959

JO - Science of the Total Environment

JF - Science of the Total Environment

SN - 0048-9697

M1 - 178149

ER -

By the same author(s)