ADAPTIVE REAL-TIME CLEANING OF HETEROGENEOUS SENSOR DATA IN SMART HOME SYSTEMS BASED ON NOISE CLASSIFICATION
DOI:
https://doi.org/10.28925/2663-4023.2025.28.844Keywords:
data cleaning; smart home; Internet of Things; adaptive anomaly removal; noise classification; machine learning.Abstract
Modern smart home systems generate substantial volumes of data, the quality of which is critically important for effective management, analysis, and forecasting. However, raw data streams often contain complex combinations of noise and anomalies, including outliers, concept drift, and periods of stagnant values. Such artifacts significantly reduce data reliability, potentially leading to incorrect operation of intelligent systems and flawed decision-making. Existing data cleaning methods often demonstrate limited effectiveness when dealing with heterogeneous types of noise in real-time, particularly when processing sensor data with varying physical characteristics.
This paper presents a novel method — ACRA (Adaptive Classification-based Real-time Anomaly cleaning) — designed for adaptive cleaning of heterogeneous sensor time series data in the context of smart homes. The ACRA method incorporates a classifier based on an ensemble of decision trees (Random Forest) to identify specific types of noise within a sliding window of incoming data. The classification results are supplemented by a heuristic rule that leverages window variance analysis to more accurately detect low-noise periods. Based on the combination of the classifier’s output and the rule, an adaptive strategy module is activated to dynamically select the most appropriate operator for correcting the current data value. To evaluate the effectiveness of the proposed method, an experimental study was conducted using real-world time series of temperature, humidity, and total energy consumption collected in a residential environment, with synthetically injected controlled noise types.
The experimental results demonstrated that the ACRA method provides significantly better cleaning quality compared to common baseline methods such as the moving median and Kalman filter across all three types of sensor data studied. The proposed ACRA method is a robust and flexible tool for enhancing the quality of sensor data in smart home systems, laying the foundation for the development of more accurate and efficient intelligent applications.
Downloads
References
Gubbi, J., Buyya, R., Marusic, S., & Palaniswami, M. (2013). Internet of Things (IoT): A vision, architectural elements, and future directions. Future Generation Computer Systems, 29(7), 1645–1660. https://doi.org/10.1016/j.future.2013.01.010
Karkouch, A., Mousannif, H., Al Moatassime, H., & Noel, T. (2016). Data quality in internet of things: A state-of-the-art survey. Journal of Network and Computer Applications, 73, 57–81. https://doi.org/10.1016/j.jnca.2016.08.002
Sethi, S. (2024). Data Governance in Smart Home Systems: The S.H.I.E.L.D. Framework. International Journal For Multidisciplinary Research, 6(1). https://doi.org/10.36948/ijfmr.2024.v06i01.39041
Wang, X., & Wang, C. (2020). Time Series Data Cleaning: A Survey. IEEE Access, 8, 1866–1881. https://doi.org/10.1109/access.2019.2962152
Phan, M. Q., Vicario, F., Longman, R. W., & Betti, R. (2017). State-Space Model and Kalman Filter Gain Identification by a Kalman Filter of a Kalman Filter. Journal of Dynamic Systems, Measurement, and Control, 140(3). https://doi.org/10.1115/1.4037778
S., Mohanavalli, S., Sripriya, N., & Poornima, S. (2018). Outlier Detection using Clustering Techniques. International Journal of Engineering & Technology, 7(3.12), 813. https://doi.org/10.14419/ijet.v7i3.12.16508
Corrales, D., Corrales, J., & Ledezma, A. (2018). How to Address the Data Quality Issues in Regression Models: A Guided Process for Data Cleaning. Symmetry, 10(4), 99. https://doi.org/10.3390/sym10040099
Ibebuchi, C. C. (2024). Fuzzy time series clustering using autoencoders neural network. AIMS Geosciences, 10(3), 524–539. https://doi.org/10.3934/geosci.2024027
Kirichenko, L., Koval, Y., Yakovlev, S., & Chumachenko, D. (2024). Anomaly Detection in Fractal Time Series with LSTM Autoencoders. Mathematics, 12(19), 3079. https://doi.org/10.3390/math12193079
Raju, K. H. P., Sandhya, N., & Mehra, R. (2017). Supervised SVM Classification of Rainfall Datasets. Indian Journal of Science and Technology, 10(15), 1–6. https://doi.org/10.17485/ijst/2017/v10i15/106115
Zamani Sabzi, H., Abudu, S., Alizadeh, R., Soltanisehat, L., Dilekli, N., & King, J. P. (2018). Integration of time series forecasting in a dynamic decision support system for multiple reservoir management to conserve water sources. Energy Sources, Part A: Recovery, Utilization, and Environmental Effects, 40(11), 1398–1416. https://doi.org/10.1080/15567036.2018.1476934
Aguirre-Fraire, B., Beltrán, J., & Soto-Mendoza, V. (2024). A Comprehensive Dataset Integrating Household Energy Consumption and Weather Conditions in a North-eastern Mexican Urban City. Data in Brief, 110452. https://doi.org/10.1016/j.dib.2024.110452
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Дмитро Ніщеменко, Олена Волощук

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.