ADAPTIVE REAL-TIME CLEANING OF HETEROGENEOUS SENSOR DATA IN SMART HOME SYSTEMS BASED ON NOISE CLASSIFICATION

Authors

DOI:

https://doi.org/10.28925/2663-4023.2025.28.844

Keywords:

data cleaning; smart home; Internet of Things; adaptive anomaly removal; noise classification; machine learning.

Abstract

Modern smart home systems generate substantial volumes of data, the quality of which is critically important for effective management, analysis, and forecasting. However, raw data streams often contain complex combinations of noise and anomalies, including outliers, concept drift, and periods of stagnant values. Such artifacts significantly reduce data reliability, potentially leading to incorrect operation of intelligent systems and flawed decision-making. Existing data cleaning methods often demonstrate limited effectiveness when dealing with heterogeneous types of noise in real-time, particularly when processing sensor data with varying physical characteristics.

This paper presents a novel method — ACRA (Adaptive Classification-based Real-time Anomaly cleaning) — designed for adaptive cleaning of heterogeneous sensor time series data in the context of smart homes. The ACRA method incorporates a classifier based on an ensemble of decision trees (Random Forest) to identify specific types of noise within a sliding window of incoming data. The classification results are supplemented by a heuristic rule that leverages window variance analysis to more accurately detect low-noise periods. Based on the combination of the classifier’s output and the rule, an adaptive strategy module is activated to dynamically select the most appropriate operator for correcting the current data value. To evaluate the effectiveness of the proposed method, an experimental study was conducted using real-world time series of temperature, humidity, and total energy consumption collected in a residential environment, with synthetically injected controlled noise types.

The experimental results demonstrated that the ACRA method provides significantly better cleaning quality compared to common baseline methods such as the moving median and Kalman filter across all three types of sensor data studied. The proposed ACRA method is a robust and flexible tool for enhancing the quality of sensor data in smart home systems, laying the foundation for the development of more accurate and efficient intelligent applications.

Downloads

Download data is not yet available.

References

Gubbi, J., Buyya, R., Marusic, S., & Palaniswami, M. (2013). Internet of Things (IoT): A vision, architectural elements, and future directions. Future Generation Computer Systems, 29(7), 1645–1660. https://doi.org/10.1016/j.future.2013.01.010

Karkouch, A., Mousannif, H., Al Moatassime, H., & Noel, T. (2016). Data quality in internet of things: A state-of-the-art survey. Journal of Network and Computer Applications, 73, 57–81. https://doi.org/10.1016/j.jnca.2016.08.002

Sethi, S. (2024). Data Governance in Smart Home Systems: The S.H.I.E.L.D. Framework. International Journal For Multidisciplinary Research, 6(1). https://doi.org/10.36948/ijfmr.2024.v06i01.39041

Wang, X., & Wang, C. (2020). Time Series Data Cleaning: A Survey. IEEE Access, 8, 1866–1881. https://doi.org/10.1109/access.2019.2962152

Phan, M. Q., Vicario, F., Longman, R. W., & Betti, R. (2017). State-Space Model and Kalman Filter Gain Identification by a Kalman Filter of a Kalman Filter. Journal of Dynamic Systems, Measurement, and Control, 140(3). https://doi.org/10.1115/1.4037778

S., Mohanavalli, S., Sripriya, N., & Poornima, S. (2018). Outlier Detection using Clustering Techniques. International Journal of Engineering & Technology, 7(3.12), 813. https://doi.org/10.14419/ijet.v7i3.12.16508

Corrales, D., Corrales, J., & Ledezma, A. (2018). How to Address the Data Quality Issues in Regression Models: A Guided Process for Data Cleaning. Symmetry, 10(4), 99. https://doi.org/10.3390/sym10040099

Ibebuchi, C. C. (2024). Fuzzy time series clustering using autoencoders neural network. AIMS Geosciences, 10(3), 524–539. https://doi.org/10.3934/geosci.2024027

Kirichenko, L., Koval, Y., Yakovlev, S., & Chumachenko, D. (2024). Anomaly Detection in Fractal Time Series with LSTM Autoencoders. Mathematics, 12(19), 3079. https://doi.org/10.3390/math12193079

Raju, K. H. P., Sandhya, N., & Mehra, R. (2017). Supervised SVM Classification of Rainfall Datasets. Indian Journal of Science and Technology, 10(15), 1–6. https://doi.org/10.17485/ijst/2017/v10i15/106115

Zamani Sabzi, H., Abudu, S., Alizadeh, R., Soltanisehat, L., Dilekli, N., & King, J. P. (2018). Integration of time series forecasting in a dynamic decision support system for multiple reservoir management to conserve water sources. Energy Sources, Part A: Recovery, Utilization, and Environmental Effects, 40(11), 1398–1416. https://doi.org/10.1080/15567036.2018.1476934

Aguirre-Fraire, B., Beltrán, J., & Soto-Mendoza, V. (2024). A Comprehensive Dataset Integrating Household Energy Consumption and Weather Conditions in a North-eastern Mexican Urban City. Data in Brief, 110452. https://doi.org/10.1016/j.dib.2024.110452

Downloads


Abstract views: 0

Published

2025-06-26

How to Cite

Nishchemenko, D., & Volochchuk, O. (2025). ADAPTIVE REAL-TIME CLEANING OF HETEROGENEOUS SENSOR DATA IN SMART HOME SYSTEMS BASED ON NOISE CLASSIFICATION. Electronic Professional Scientific Journal «Cybersecurity: Education, Science, Technique», 4(28), 740–750. https://doi.org/10.28925/2663-4023.2025.28.844