MULTI-AGENT APPROACH TO DATA PIPELINE STABILITY USING DISTRIBUTED STORAGE COORDINATION

Authors

DOI:

https://doi.org/10.28925/2663-4023.2026.32.1104

Keywords:

Keywords: analytical systems; data processing pipelines; stability; multi-agent systems; distributed storage; Big Data; NoSQL; machine learning.

Abstract

The article analyzes approaches to ensuring the stability of data processing pipelines in analytical systems, which is a critical aspect of modern Big Data and machine learning solutions. The relevance of the study is driven by the rapid growth of data volumes and heterogeneity, increasing architectural complexity of analytical platforms, and rising requirements for reliability and reproducibility under dynamic workloads and distributed computing. Instability in pipelines manifests through cascading failures, performance degradation, data inconsistency, and variability of results, which is particularly critical in Big Data environments. The limitations of centralized orchestration approaches are examined, highlighting their reduced adaptability and single points of failure. The paper substantiates the transition to decentralized multi-agent control mechanisms that provide component autonomy, fault localization, and improved resilience. Special attention is given to the role of distributed storage systems, which serve not only as repositories but also as coordination environments for agent interaction. The use of document-oriented, graph, object, and streaming storage systems enables consistent state management, reproducibility of control actions, and asynchronous coordination of processing workflows. The study summarizes approaches to organizing analytical and predictive mechanisms for stability support, including pipeline monitoring, workload and failure risk forecasting, and adaptive adjustment of processing parameters based on accumulated experience. A concept of a continuous evaluation–forecasting–adaptation cycle is proposed, enabling proactive stability management. A case study in e-commerce confirms the effectiveness of the approach, demonstrating reduced recovery time, increased automation of interventions, and improved adaptability of pipelines to workload and data quality changes. The results can be applied in the design and modernization of Big Data and machine learning systems aimed at enhancing stability, reliability, and reproducibility of analytical processes.

Downloads

Download data is not yet available.

References

Foidl, H., Golendukhina, V., Ramler, R., & Felderer, M. (2023). Data pipeline quality: Influencing factors, root causes of data-related issues, and processing problem areas for developers. Journal of Systems and Software, 207. https://doi.org/10.1016/j.jss.2023.111855

Moskovitch, Y., & Jagadish, H. V. (2022). Reliability at multiple stages in a data analysis pipeline. Communications of the ACM, 65(7), 118–128. https://doi.org/10.1145/3500923

Anderson, W., Bhatnagar, R., Scollick, K., Schito, M., Walls, R., & Podichetty, J. T. (2024). Real-world evidence in the cloud: Tutorial on developing an end-to-end data and analytics pipeline using Amazon Web Services resources. Clinical and Translational Science, 17: e70078. https://doi.org/10.1111/cts.70078

Terletska, K. (2025). Data Consistency in Distributed Multi-Stage Event Processing Pipelines. The American Journal of Engineering and Technology, 7(6), 127-134. https://doi.org/10.37547/tajet/volume07issue06-14

Kiar, G., Chatelain, Y., de Oliveira Castro, P., Petit, É., Rokem, A., Varoquaux, et al. (2021). Numerical uncertainty in analytical pipelines lead to impactful variability in brain networks. PLoS ONE, 16(11): e0250755. https://doi.org/10.1371/journal.pone.0250755

Залеський, В., Івановський, П., & Федорченко, В. (2024). Сучасні інструменти оркестрації даних для побудови конвеєрів автоматичної обробки даних. Системи управління, навігації та зв'язку, 2(76), 95–98. https://doi.org/10.26906/SUNZ.2024.2.095

Albers, T., Lazovik, E., & Yousef, M. H. N. (2021). Adaptive On-the-Fly Changes in Distributed Processing Pipelines. Frontiers in Big Data, 4: 666174. https://doi.org/10.3389/fdata.2021.666174

Geldenhuys, M., Pfister, B., Scheinert, D., Kao, O., & Thamsen, L. (2021). Khaos: Dynamically Optimizing Checkpointing for Dependable Distributed Stream Processing. 2022 17th Conference on Computer Science and Intelligence Systems (FedCSIS), 553-561. https://api.semanticscholar.org/CorpusID:237420406

Uzoagu, U. U. (2025). Designing resilient, low-latency data pipelines for streaming big data analytics using Apache Kafka and Spark ecosystems. World Journal of Advanced Research and Reviews, 27(03), 1856-1873. https://doi.org/10.30574/wjarr.2025.27.3.3369

Omolayo, O., Ugboko, R., Oyeyemi, D. O., Oloruntoba, O., & Fakunle, S. O. (2025). Optimizing Data Pipelines for Real-Time Healthcare Analytics in Distributed Systems: Architectural Strategies, Performance Trade-offs, and Emerging Paradigms. International Journal of Scientific and Management Research, 8(7), 89–99. https://doi.org/10.37502/ijsmr.2025.8708

Deepthi, B. G., Rani, K. S., Krishna, P. V., Obaidat, M. S., Ramalakshmi, K. (2024). An efficient architecture for processing real-time traffic data streams using apache flink. Multimedia Tools and Applications, 83, 37369–37385. https://doi.org/10.1007/s11042-023-17151-6

Singh, N., Singh, D. P., Pant, B., Seth, A., & Elhoseny, M. (2021). BIGMSA-Microservice-Based Model for Big Data Knowledge Discovery: Thinking Beyond the Monoliths. Wireless Personal Communications, 116, 2819–2833. https://doi.org/10.1007/s11277-020-07822-0

Polimeno, A., Braghin, C., Anisetti, M., & Ardagna, C. (2025). Maximizing data quality while ensuring data protection in service-based data pipelines. Journal of Big Data, 12, 62. https://doi.org/10.1186/s40537-025-01118-5

Dadeboe, A., Mansourifard, F., & Sugrim, S. (2025). Uncertainty Quantification and Data Provenance for Data Pipeline Security Analysis. Proceedings of the 7th Workshop on Design Automation for CPS and IoT, 6, 1–6. https://doi.org/10.1145/3722573.3727831

Anisha, S., & Thiyagarajan, S. (2025). An Explainable Deep Learning Based Data Mining Framework for Automated Data Loading Optimization and Pipeline Evaluation Using Sentiment Analysis. International Journal of Environmental Sciences, 11(7), 584–602. https://doi.org/10.64252/yx8fzb84

Трофименко, О.Г., Лобода, Ю.Г., Гура, В.І., Дика, А.І., Стрілець, М.І. (2024). Інструменти штучного інтелекту для системного аналізу. Вісник Херсонського національного технічного університету, 4, 349-357. https://doi.org/10.35546/kntu2078-4481.2024.4.46.

Janev, V. (2020). Ecosystem of Big Data. In V. Janev, D. Graux, H. Jabeen, & E. Sallinger (Eds.). Knowledge Graphs and Big Data Processing. Lecture Notes in Computer Science, 12072. https://doi.org/10.1007/978-3-030-53199-7_1

Demchenko, Y., de Laat, C., & Membrey, P. (2014). Defining architecture components of the Big Data Ecosystem. 2014 International Conference on Collaboration Technologies and Systems (CTS), 104–112. https://doi.org/10.1109/CTS.2014.6867550

Khan, W., Kumar, T., Zhang, C., Raj, K., & Roy, A. (2023). SQL and NoSQL Database Software Architecture Performance Analysis and Assessments - A Systematic Literature Review. Big Data and Cognitive Computing, 7(2), 97. https://doi.org/10.3390/bdcc7020097

Лобода, Ю., Кричун, О. (2025) Інтелектуальні агенти як засіб адаптивного керування конвеєрами даних у середовищах Big Data. Інформаційні технології: моделі, алгоритми, системи (ITMAS – 2025): VI Міжнародна науково-практична інтернет конференція. 456-458. https://itconf.nuos.edu.ua/2025/publications/intelligent-agents-as-a-means-of-adaptive-control-of-data-pipelines-in-big-data-environments/

M, A. H., Simamora, R., & Ulwi, K. (2024). Implementation of Agent Systems in Big Data Management: Integrating Artificial Intelligence for Data Mining Optimization. Journal of Computer Science Advancements, 2(1), 33–47. https://doi.org/10.70177/jsca.v2i1.1210

Chakraborty, S. (2025). Beyond ETL: How AI Agents Are Building Self-Healing Data Pipelines. Journal of Computer Science and Technology Studies, 7(3), 741–756. https://doi.org/10.32996/jcsts.2025.7.3.81

Kodi, D. (2024). Designing Real-time Data Pipelines for Predictive Analytics in Large-scale Systems. Transactions on Sustainable Computing Systems, 2(4), 178–188. https://doi.org/10.69888/ftscs.2024.000294

Alva, L. (2025). Generative AI for self-optimizing and autonomous data pipelines. World Journal of Advanced Research and Reviews, 26(2), 1071–1079. https://doi.org/10.30574/wjarr.2025.26.2.1667

Trirat, P., Jeong, W., & Hwang, S. J. (2024). AutoML-Agent: A multi-agent LLM framework for full-pipeline AutoML. arXiv preprint arXiv:2410.02958. https://doi.org/10.48550/arXiv.2410.02958

Gandhi, H., & Solanki, S. (2025). Advanced CI/CD Pipelines for Testing Big Data Job Orchestrators. Journal of Quantum Science and Technology, 2(1), 131–149. https://doi.org/10.63345/jqst.v2i1.155

Sinha, R. K. (2025). Architecting resilient data pipelines: A framework for enterprise analytics in cloud environments. World Journal of Advanced Engineering Technology and Sciences, 15(3), 1099–1105. https://doi.org/10.30574/wjaets.2025.15.3.0942

Li, Y., Wu, B., Huang, Y., & Luan, S. (2024). Developing trustworthy artificial intelligence: insights from research on interpersonal, human-automation, and human-AI trust. Frontiers in Psychology, 15. https://doi.org/10.3389/fpsyg.2024.1382693

Zhang L., Zhai Y., Tong Jia, Huang X., Duan C., & Li Y. (2025). AgentFM: Role-aware failure management for distributed databases with LLM-driven multi-agents. Proceedings of the 33rd ACM International Conference on the Foundations of Software Engineering (FSE Companion '25), 525–529. https://doi.org/10.1145/3696630.3728492

Malhotra, S., Yashu, F., Saqib, M., Mehta, D., Jangid, J., & Dixit, S. (2025). Evaluating fault tolerance and scalability in distributed file systems: A case study of gfs, hdfs, and minio. arXiv preprint arXiv:2502.01981. https://arxiv.org/pdf/2502.01981

Downloads


Abstract views: 0

Published

2026-03-26

How to Cite

Loboda, Y., Trofymenko , O., Mykheliev, I., Haidaienko, O., & Vorona, M. (2026). MULTI-AGENT APPROACH TO DATA PIPELINE STABILITY USING DISTRIBUTED STORAGE COORDINATION. Electronic Professional Scientific Journal «Cybersecurity: Education, Science, Technique», 4(32), 198–213. https://doi.org/10.28925/2663-4023.2026.32.1104

Most read articles by the same author(s)

1 2 > >>