Employing Hybrid AI Systems to Trace and Document Bias in ML Pipelines

Mayra Russo; Yasharajsinh Chudasama; Disha Purohit; Sammy Sawischa; Maria Esther Vidal

doi:10.1109/ACCESS.2024.3427388

Details

Original language	English
Pages (from-to)	96821-96847
Number of pages	27
Journal	IEEE ACCESS
Volume	12
Publication status	Published - 12 Jul 2024

Abstract

Artificial Intelligence (AI) systems can introduce biases that lead to unreliable outcomes and, in the worst-case scenarios, perpetuate systemic and discriminatory results when deployed in the real world. While significant efforts have been made to create bias detection methods, developing reliable and comprehensive documentation artifacts also makes for valuable resources that address bias and aid in minimizing the harms associated with AI systems. Based on compositional design patterns, this paper introduces a documentation approach using a hybrid AI system to prompt the identification and traceability of bias in datasets and predictive AI models. To demonstrate the effectiveness of our approach, we instantiate our pattern in two implementations of a hybrid AI system. One follows an integrated approach and performs fine-grained tracing and documentation of the AI model. In contrast, the other hybrid system follows a principled approach and enables the documentation and comparison of bias in the input data and the predictions generated by the model. Through a use-case based on Fake News detection and an empirical evaluation, we show how biases detected during data ingestion steps (e.g., label, over-representation, activity bias) affect the training and predictions of the classification models. Concretely, we report a stark skewness in the distribution of input variables towards the Fake News label, we uncover how a predictive variable leads to more constraints in the learning process, and highlight open challenges of training models with unbalanced datasets. A video summarizing this work is available online (https://youtu.be/v2GfIQPAy_4?si=BXtWOf97cLiZavyu),and the implementation is publicly available on GitHub (https://github.com/SDM-TIB/DocBiasKG).

Keywords

Bias, hybrid AI systems, knowledge graphs, tracing

ASJC Scopus subject areas

Computer Science(all)
General Computer Science
Materials Science(all)
General Materials Science
Engineering(all)
General Engineering

Cite this

Employing Hybrid AI Systems to Trace and Document Bias in ML Pipelines. / Russo, Mayra; Chudasama, Yasharajsinh; Purohit, Disha et al.
In: IEEE ACCESS, Vol. 12, 12.07.2024, p. 96821-96847.

Research output: Contribution to journal › Article › Research › peer review

Russo, M, Chudasama, Y, Purohit, D, Sawischa, S & Vidal, ME 2024, 'Employing Hybrid AI Systems to Trace and Document Bias in ML Pipelines', IEEE ACCESS, vol. 12, pp. 96821-96847. https://doi.org/10.1109/ACCESS.2024.3427388

Russo, M., Chudasama, Y., Purohit, D., Sawischa, S., & Vidal, M. E. (2024). Employing Hybrid AI Systems to Trace and Document Bias in ML Pipelines. IEEE ACCESS, 12, 96821-96847. https://doi.org/10.1109/ACCESS.2024.3427388

Russo M, Chudasama Y, Purohit D, Sawischa S, Vidal ME. Employing Hybrid AI Systems to Trace and Document Bias in ML Pipelines. IEEE ACCESS. 2024 Jul 12;12:96821-96847. doi: 10.1109/ACCESS.2024.3427388

Russo, Mayra ; Chudasama, Yasharajsinh ; Purohit, Disha et al. / Employing Hybrid AI Systems to Trace and Document Bias in ML Pipelines. In: IEEE ACCESS. 2024 ; Vol. 12. pp. 96821-96847.

Download

@article{51a664fa43724b009ed6fb1b8abf5cc2,

title = "Employing Hybrid AI Systems to Trace and Document Bias in ML Pipelines",

abstract = "Artificial Intelligence (AI) systems can introduce biases that lead to unreliable outcomes and, in the worst-case scenarios, perpetuate systemic and discriminatory results when deployed in the real world. While significant efforts have been made to create bias detection methods, developing reliable and comprehensive documentation artifacts also makes for valuable resources that address bias and aid in minimizing the harms associated with AI systems. Based on compositional design patterns, this paper introduces a documentation approach using a hybrid AI system to prompt the identification and traceability of bias in datasets and predictive AI models. To demonstrate the effectiveness of our approach, we instantiate our pattern in two implementations of a hybrid AI system. One follows an integrated approach and performs fine-grained tracing and documentation of the AI model. In contrast, the other hybrid system follows a principled approach and enables the documentation and comparison of bias in the input data and the predictions generated by the model. Through a use-case based on Fake News detection and an empirical evaluation, we show how biases detected during data ingestion steps (e.g., label, over-representation, activity bias) affect the training and predictions of the classification models. Concretely, we report a stark skewness in the distribution of input variables towards the Fake News label, we uncover how a predictive variable leads to more constraints in the learning process, and highlight open challenges of training models with unbalanced datasets. A video summarizing this work is available online (https://youtu.be/v2GfIQPAy_4?si=BXtWOf97cLiZavyu),and the implementation is publicly available on GitHub (https://github.com/SDM-TIB/DocBiasKG).",

keywords = "Bias, hybrid AI systems, knowledge graphs, tracing",

author = "Mayra Russo and Yasharajsinh Chudasama and Disha Purohit and Sammy Sawischa and Vidal, {Maria Esther}",

note = "Publisher Copyright: {\textcopyright} 2013 IEEE.",

year = "2024",

month = jul,

day = "12",

doi = "10.1109/ACCESS.2024.3427388",

language = "English",

volume = "12",

pages = "96821--96847",

journal = "IEEE ACCESS",

issn = "2169-3536",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

Download

TY - JOUR

T1 - Employing Hybrid AI Systems to Trace and Document Bias in ML Pipelines

AU - Russo, Mayra

AU - Chudasama, Yasharajsinh

AU - Purohit, Disha

AU - Sawischa, Sammy

AU - Vidal, Maria Esther

PY - 2024/7/12

Y1 - 2024/7/12

N2 - Artificial Intelligence (AI) systems can introduce biases that lead to unreliable outcomes and, in the worst-case scenarios, perpetuate systemic and discriminatory results when deployed in the real world. While significant efforts have been made to create bias detection methods, developing reliable and comprehensive documentation artifacts also makes for valuable resources that address bias and aid in minimizing the harms associated with AI systems. Based on compositional design patterns, this paper introduces a documentation approach using a hybrid AI system to prompt the identification and traceability of bias in datasets and predictive AI models. To demonstrate the effectiveness of our approach, we instantiate our pattern in two implementations of a hybrid AI system. One follows an integrated approach and performs fine-grained tracing and documentation of the AI model. In contrast, the other hybrid system follows a principled approach and enables the documentation and comparison of bias in the input data and the predictions generated by the model. Through a use-case based on Fake News detection and an empirical evaluation, we show how biases detected during data ingestion steps (e.g., label, over-representation, activity bias) affect the training and predictions of the classification models. Concretely, we report a stark skewness in the distribution of input variables towards the Fake News label, we uncover how a predictive variable leads to more constraints in the learning process, and highlight open challenges of training models with unbalanced datasets. A video summarizing this work is available online (https://youtu.be/v2GfIQPAy_4?si=BXtWOf97cLiZavyu),and the implementation is publicly available on GitHub (https://github.com/SDM-TIB/DocBiasKG).

AB - Artificial Intelligence (AI) systems can introduce biases that lead to unreliable outcomes and, in the worst-case scenarios, perpetuate systemic and discriminatory results when deployed in the real world. While significant efforts have been made to create bias detection methods, developing reliable and comprehensive documentation artifacts also makes for valuable resources that address bias and aid in minimizing the harms associated with AI systems. Based on compositional design patterns, this paper introduces a documentation approach using a hybrid AI system to prompt the identification and traceability of bias in datasets and predictive AI models. To demonstrate the effectiveness of our approach, we instantiate our pattern in two implementations of a hybrid AI system. One follows an integrated approach and performs fine-grained tracing and documentation of the AI model. In contrast, the other hybrid system follows a principled approach and enables the documentation and comparison of bias in the input data and the predictions generated by the model. Through a use-case based on Fake News detection and an empirical evaluation, we show how biases detected during data ingestion steps (e.g., label, over-representation, activity bias) affect the training and predictions of the classification models. Concretely, we report a stark skewness in the distribution of input variables towards the Fake News label, we uncover how a predictive variable leads to more constraints in the learning process, and highlight open challenges of training models with unbalanced datasets. A video summarizing this work is available online (https://youtu.be/v2GfIQPAy_4?si=BXtWOf97cLiZavyu),and the implementation is publicly available on GitHub (https://github.com/SDM-TIB/DocBiasKG).

KW - Bias

KW - hybrid AI systems

KW - knowledge graphs

KW - tracing

UR - http://www.scopus.com/inward/record.url?scp=85199265384&partnerID=8YFLogxK

U2 - 10.1109/ACCESS.2024.3427388

DO - 10.1109/ACCESS.2024.3427388

M3 - Article

AN - SCOPUS:85199265384

VL - 12

SP - 96821

EP - 96847

JO - IEEE ACCESS

JF - IEEE ACCESS

SN - 2169-3536

ER -

Research@Leibniz University

Employing Hybrid AI Systems to Trace and Document Bias in ML Pipelines

Authors

Research Organisations

External Research Organisations

Details

Abstract

Keywords

ASJC Scopus subject areas

Cite this