An overview of the existing methods and their limitations.
Open access peer-reviewed article
This Article is part of Artificial Intelligence Section
Article metrics overview
51 Article Downloads
View Full Metrics
Article Type: Research Paper
Date of acceptance: May 2025
Date of publication: June 2025
DoI: 10.5772/acrt.20250009
copyright: ©2025 The Author(s), Licensee IntechOpen, License: CC BY 4.0
Cardiovascular disease (CVD) diagnosis often faces challenges due to poor data quality, noisy signals and ineffective feature extraction from clinical and electrocardiogram (ECG) data. The processing methods currently used for diagnosing time-series data containing outliers produce imprecise results as they do not handle high dimensions effectively. This study proposes a modern CVD diagnostic system through advanced implementation of pre- processing methods, extraction algorithms, and model optimization tools. To guarantee data quality and consistency, clinical data and ECG signals were first pre-processed using “missing value imputation, one-hot encoding, and z-score normalization”. ECG signals were further denoised using a “Finite Impulse Response (FIR) filter” to eliminate unwanted noise and artifacts. Feature extraction from pre-processed clinical data was performed using a “Dense-Assisted Parallel Attention-Based MobileNet (DPAM)” model, which efficiently captured intricate patterns in the data. For ECG data, features were derived using “Legendre Multi-Wavelet Transform-Based Feature Decomposition (LMT-FD)” model, enabling effective signal decomposition into meaningful components. A complete dataset was produced by fusing the characteristics that were collected from clinical and ECG data. “Optimized Progressive Attention-Based Bidirectional Encoder Enclosed Transformer Network (OPA-BETN)” was used for CVD diagnosis. The “Leaf in Wind Optimization (LWO)” approach was used for hyperparameter tuning of the model in order to maximize its performance. The PTB-XL dataset, which included clinical and ECG signal data, was used in this study. Till feature extraction, data underwent individual processing. Following fusion, the features from both data types were integrated. For improved analysis, two characteristics, heart rate and caloric intake, were included in the clinical data from Apple Watch Fitbit dataset. The experimental results showed that the PTB-XL dataset attained higher accuracy, precision, recall, F1-Score, Specificity, TPR, TNR, FPR and FNR values of 99.0%, 97.2%, 97.2%, 97.2%, 99.3%, 97.2, 99.3, 0.006 and 0.027, respectively.
dense parallel attention
internet of things
missing value imputation
multi-wavelet surveillance
remote ECG monitoring
Author information
The Internet of Things (IoT) facilitates global connectivity by connecting gadgets in healthcare, industry, transportation, and intelligent cities. IoT health applications analyze vital signals such as heart rate, oxygen levels, and ECG to detect chronic diseases early on [1]. However, many existing techniques rely heavily on cloud-based processing. Early detection and treatment of cardiac problems are difficult in developing nations due to insufficient diagnostic centers, experienced doctors, and relevant resources [2]. Although the healthcare industry integrates IoT technology, IoT sensor data often has more noise and missing values compared to traditional datasets [3].
Technological assistance or clinical professionals are needed to remotely monitor and detect cardiovascular diseases (CVD) in patients. A resilient CVD monitoring system is a critical IoT application that employs expert systems and large healthcare data [4]. Machine learning (ML) algorithms for predicting CVD struggle with high-dimensional data due to a lack of a unified smart framework. Diagnosis accuracy relies heavily on doctors’ experiences, and the numerous factors associated with CVD increase their complexity [5]. Researchers have used a small dataset of 13 features from a larger set of 74, but it was inadequate. Various strategies have supported resilient systems, but they still struggle to attain high performance in disease forecasting and evaluation [6].
Maintaining massive data becomes a cost center for hospital units. Remote ECG monitoring has been made possible through an IoT-enabled, cloud-centric system with the goal of saving lives in rural regions [7]. Patients’ health scores are determined using data mining techniques that analyze biological data from smart medical IoT devices. Considering the limited resources in IoT platforms, it is important to protect sensitive patient data by using a lightweight, secure block encryption technique [8]. CVDs are on the rise primarily due to rapid health transitions in developing nations. People have strived to prolong their lives throughout history, but technology has not yet succeeded in lowering death rates [9].
Identifying CVDs is challenging due to varying symptoms, and manually processing substantial patient data in a digital healthcare environment is a challenge for physicians. IoT effectively manages data and communication, integrating with the cloud to enhance the quality of life [10]. Intelligent health surveillance as well as diagnosis systems, are being created for patients in critical care, particularly those suffering from serious heart diseases. Hospitals around the world use advanced measurement technologies to detect significant heart problems [11]. To diagnose CVDs, physicians typically utilize a mix of medical history, physical examination, and diagnostic procedures such as ECG, echocardiograms as well as stress tests [12].
Early diagnosis and treatment improve the prognosis of CVDs. IoT simplifies energy generation, wealth creation, and time-saving through smart environments, utilizing ML or deep learning (DL) techniques to enhance performance and development [13]. Thus, researchers have developed software applications to aid doctors in predicting and diagnosing CVDs. DL techniques and algorithms are applied across numerous medical datasets to automate the analysis of large and complex data [14]. As a result, IoT- based models have shown to be inexpensive, lightweight, and accessible in remote places without doctors, allowing for real-time heart sound identification and evaluation [15].
The current diagnostic practices face multiple difficulties because it must contend with data quality issues, unreliable time-series patterns, and inadequate feature extraction techniques, leading to subpar diagnosis. The application of clinical and ECG data for diagnosis often contains disturbances and missing information that interfere with accurate analysis.
Complex data patterns cannot be extracted by existing models because of inconsistent time- series data points, outlying elements, and high-dimensional data. The fundamental features present in ECG signals often remain irreparably altered by the traditional signal denoising approaches that use bandpass filters. This study presents an enhanced approach to CVD diagnosis by combining hospital reports with electrocardiogram information. The application of high-end data pre-processing methods, such as missing value imputation, one-hot encoding and z-score normalization, leads to consistent high-quality data. The signal noise necessitates the application of a finite impulse response (FIR) filter to preserve essential features that remain visible. The performance of the system improves through the combination of dense-assisted parallel attention-based MobileNet (DPAM) model for clinical data and Legendre multi- wavelet transform-based feature decomposition (LMT-FD) model for ECG signals to detect significant physiological patterns. Optimized progressive attention-based bidirectional encoder enclosed transformer network (OPA-BETN)’ serves as a diagnostic tool because it integrates both datasets through a model that delivers robust predictions efficiently. The proposed model uses new data processing features by combining embedding encoding with adaptive FIR filtering, and utilizing advanced feature extraction through DPAM and LMT-FD models to analyze clinical and ECG data. OPA-BETN serves as an accuracy-enhancing diagnostic system in combination with the efficient leaf in wind optimization (LWO) approach algorithm functionality for hyperparameter optimization. The highlights of this study are as follows:
To ensure data quality and consistency, the clinical data and ECG data underwent pre-processing using missing value imputation, one-hot encoding, and z-score normalization.
To remove any unwanted noise from the signals, FIR filter was employed for signal denoising.
DPAM model was employed to remove features from the pre-processed clinical data, effectively identifying complex patterns within the data.
LMT-FD was used to break down the ECG signals into meaningful components, successfully finding pertinent features.
Using sophisticated attention processes to concatenate and evaluate the aspects of both clinical and ECG data, OPA-BETN was utilized to diagnose cardiovascular illness.
To tune the proposed model, LWO technique ensured high accuracy and reliability in the identification of CVD.
Section 2 presents relevant current studies and in Section 3, the proposed method and foundations are described. Section 4 discusses the experimental results and Section 5 concludes the study.
Yenurkar et al. [16] proposed an IoT-based prototype that uses an ML-based prediction model to offer preventative measures for heart-related issues. It included an IoT device that recorded real-time data from the user’s body, such as heart rate and ECG. This information was used to develop an ML-based anticipated model for heart-related issues, which alerts the user to potential risks. The prototype attempted to provide a realistic solution for early identification of heart-related disorders, and it achieved a high accuracy of 98.3% in reducing death rates from CVD when compared to alternative methods. The results showed that the IoT-based prototype was more effective than other ML methods at identifying various CVDs and forecasting future mortality rates.
Mishra et al. [17] utilized DL to investigate and forecast heart disease likelihood using ECG signals. Initially, IoT-based ECG readings from healthy and cardiac disease patients were gathered and pre-processed using an FIR algorithm. P-wave, ST-segment, R-peak locations, heart rate variability, PQ-segment, T-wave, QRS complex duration, continuous wavelet transform, and improved mutual information were among the features collected for the study. Optimal features were selected using a hybrid optimization model. Heart disease detection utilized a three-layer framework including CNN, BiLSTM, and RNN. The methodology’s performance was assessed by utilizing metrics like accuracy, sensitivity, specificity, precision, recall, FPR, and FNR that were implemented on MATLAB.
Hannan et al. [18] implemented a wearable smart system for early heart attack diagnosis, employing a decentralized computational architecture to achieve minimal latency and rapid response time in detecting preliminary heart attack stages in home-bound patients. The system monitored and analyzed the patient’s prevailing heart status using parametric sensors integrated into the body and an android application. Three models, SVM, AdaBoost, and RF, were developed for classification. Performance metrics such as accuracy, error rate, and response time were used to evaluate the system. Research findings suggested its potential implementation in monitoring heart health remotely for patients at risk of heart attack, aiming to prevent sudden cardiac events without disrupting daily life.
Ullah et al. [19] developed a scalable ML-based architecture for early recognition of CVD, aiming to transform healthcare by facilitating timely diagnosis and treatment to reduce CVD- related fatalities. Initially, features were extracted from ECG signals, and optimal feature selection algorithms such as FCBF, MRMR, Relief, and PSO-optimization were applied. Classifiers like Extra Tree and RF achieved notable performance, each reaching 100% accuracy on selected features. The suggested method was compared to cutting-edge methodologies on both small and big datasets, and revealed its potential to transform patient care by considerably lowering CVD-related death rates and improving quality of life.
Naeem et al. [20] described a novel strategy for identifying people with heart disease using feature extraction methods and signal processing technologies. It used 10 metal oxide semiconductor sensors and an ANN to identify patterns in humans using scanned and extracted sensor data. Each participant underwent scanning for 1000 different characteristics across multiple investigations involving varying group sizes (5, 10, 15, and 20 people) over different time periods. Sensor signals were initially received in analog form and converted to digital form using Arduino. The dataset was utilized for training an architecture, and the model’s performance was assessed using metrics like sensitivity, f-measure, accuracy, and specificity for detecting human odors. Experimental results showed that the model attained an accuracy of more than 85% in most cases, demonstrating the efficacy of feature extraction strategies in improving individual identity and human odor recognition capabilities.
The rapidity, quality and speed of ECG diagnosis are crucial immediate tasks that affect people’s lives and health. In order to solve the challenge of identifying abnormal ECG data segment morphology, Razin et al. [21] examined the application of DL as a universal tool. The ability to predict the network was improved by utilizing thresholding and replacements. It was also proven that learned DL models may be used with different ensembles. Furthermore, adding artificial models enhanced the ensemble’s classification capabilities. However, this model had high computing power and required more memory space.
CVD refers to disorders that involve constricted or obstructed blood vessels that lead to heart attack, chest pain, or stroke. The medical condition is predicted by the ML classifier based on the patient’s side effect state. In order to predict CVD, Kumar et al. [22] examined the presentation of ML tree classifiers. By employing a random forest ML classifier, the existing method exceeded all the classifiers examined in the classification of patients with cardiovascular disease, achieving higher accuracy of 85.71% with a ROC AUC score of 0.8675.
An ML-based cardiovascular disease diagnosis system (MaLCaDD) was presented by Rahim et al. [23] to accurately estimate CVDs. The framework specifically addressed data imbalance by using the synthetic minority oversampling technique (SMOTE) and missing values utilizing the mean replacement technique. The feature selection process then utilized the feature importance approach. For a more accurate prediction, a combination of kNN and LR classifiers was suggested. Three benchmark datasets, namely Cleveland, Framingham and heart disease, were used to validate the framework. The accuracies determined were 95.5%, 98.0% and 99.1%, respectively. However, this model had high complexity issues.
The ability of DL approaches to predict four main cardiac disorders was first presented by Abubaker et al. [24]. These included aberrant heartbeats, myocardial infarction, previous myocardial infarction, as well as typical individual classes. The study used a public ECG imaging dataset of cardiac patients. Initially, low-scale pretrained deep neural networks were employed to study the transfer learning (TL) method. Moreover, to accurately predict cardiac abnormality, an enhanced CNN was introduced. The previously described pretrained models were employed as ML algorithm feature extraction tools. Experimental findings showed that the suggested CNN model performed better than the existing works in terms of accuracy, recall, precision, and F1 score, achieving 98.23%, 98.22%, and 98.31%, respectively. Table 1 summarizes the highlights of these studies.
Author & Reference | Year | Diagnosis model | Dataset used | Results | Merits | Demerits |
---|---|---|---|---|---|---|
Yenurkar et al. [16] | 2024 | IoT-based ML | 1988 Heart disease dataset on Kaggle | Accuracy 98.3% | Early detection, high accuracy. | Dataset limitation, limited pre-processing impact. |
Mishra et al. [17] | 2024 | DL-based framework with IoT | Multi-variate dataset including Cleveland, Hungary | Accuracy 97.8%, high precision, recall | Robust performance flexibility. | Algorithmic limitations and scalability issues. |
Hannan et al. [18] | 2024 | IoT-based SEHAD-HC system | 1000 instances, Asian ethnicity | 97% accuracy, high reliability | Portability, resource efficiency, real-time monitoring. | Limited scalability. |
Ullah et al. [19] | 2024 | ML framework | HHDD, UCI ML Repository and Kaggle, | 78% accuracy with FCBF | Enhanced performance and relevant feature selection. | Dataset dependency, complexity in feature engineering. |
Naeem et al. [20] | 2024 | Artificial Neural Network (ANN) | Sensor data, Dataset of 39 patients | Accuracy over 85% | High accuracy in odour identification. | Sensitivity to external factors. |
Razin et al. [21] | 2023 | DL | PTB-XL | - | Early detection. | Additional memory space required. |
Kumar et al. [22] | 2020 | ML classifiers | Cardiovascular disease | Precision-85%, ROC AUC-0.8675, Execution time-1.09 sec. | Better performance and resource efficiency. | High computational load. |
Rahim et al. [23] | 2021 | LR and kNN | Cardiovascular disease | Accuracy-99.1%, 98.0%, 95.5% | Effective Prediction, Highly reliable. | High complexity issues. |
Abubaker et al. [24] | 2022 | Lightweight CNN | Cardiovascular disease | Accuracy-98.23%, Recall-98.22%, Precision-98.31%, F1-Score-98.21% | Early detection, Better Performance. | High data dependency. |
An overview of the existing methods and their limitations.
Table 1 elucidates the significant aspects of five existing systems. Yenurkar et al. [16], achieved high accuracy in detecting heart-related issues with their IoT-based prototype, but scalability and generalizability to diverse populations was a concern. Mishra et al. [17] demonstrated effective heart disease prediction using DL and ECG signals. Yet, the complexity of the three-layer framework and computational requirements posed challenges for real-time applications in resource- constrained environments. Hannan et al. [18] developed a wearable smart system for early heart attack diagnosis with promising results. However, potential limitations in user acceptance and observance of continuous monitoring protocols constrained its widespread adoption. Ullah et al. [19] suggested a scalable ML-based architecture for CVD detection, achieving high accuracy. However, the robustness of the model in handling noisy data and real-world variability remains to be fully evaluated. Naeem et al. [20] introduced a novel technique for human odour identification using sensor data and ANN. Yet, concerns about sensor sensitivity and environmental factors affecting detection accuracy could influenced its reliability in practical settings. To overcome the above-mentioned limitations, this study concentrated on developing a novel based IoT-enabled Wireless Body Area Network (WBAN) system for smart cardiovascular disease diagnosis.
The dataset used in this model is accessible on Kaggle. Two features, namely heart rate and calories, were extracted from the Apple Watch and Fitbit data. The extracted features are added as a new column of clinical data and saved as a CSV file. Finally, the combined clinical data and ECG signal data were passed into the pre-processing stage. Missing value imputation, one-hot encoding, and z-score normalization were performed at this initial stage of pre- processing to ensure data quality and consistency. The signals were denoised using an FIR filter to remove any unwanted noise and artefacts. The features from the pre-processed clinical data were extracted using a DPAM model, which efficiently captured intricate patterns in the data. For ECG data, features were extracted using LMT-FD model, which effectively decomposed the signal into meaningful components. These extracted features from both clinical and ECG data were then fused to create a comprehensive dataset. In this model, the signal features (from Bidirectional LSTM layers) and clinical features (from the Transformer Encoder block) were combined using a concatenate layer, merging them into a unified feature vector along the last axis. This combined representation underwent further processing through fully connected layers and a softmax output for classification. CVD diagnosis was performed using a hybrid DL model, OPA-BETN, which leveraged advanced attention mechanisms to concatenate the features of both signal and clinical data. Hyperparameter tuning was carried out using LWO algorithm to optimize the model’s performance, ensuring high accuracy and reliability in detecting cardiovascular conditions. This integrated approach significantly enhanced the diagnostic capabilities, leading to better patient outcomes in cardiovascular disease management. The block diagram of the proposed methodology is shown in Figure 1.
Block diagram of the proposed methodology.
There are several techniques for imputation of missing data [25] in univariate variables that employ past observed data to incorporate the missing values. These include advanced techniques like Kalman filter-based approaches and hybrid methods, as well as more basic and direct techniques like logistic regression, polynomial or spline interpolation, and last observation carried forward (LOCF). When there are several missing data points, simplistic methods such as imputing a fixed value that is frequently selected is the sample average or LOCF.
In one-hot encoding, the original feature vector is extended into a multi-dimensional matrix, with every dimension representing a distinct state as well as the matrix’s dimension equal to the amount of states in the feature. For a given state, only one dimension of the feature matrix is asserted to be 1, while all other state dimensions are 0. When digitizing a classified-type feature, one-hot encoding [26] has the benefit of removing the impact of variations in the digitized value on the model’s training effect. During the model training, a feature value encoded as “1000” can have a higher weight than the one encoded as ‘1’. One-hot encoding is used in the coding process to obtain these types of negative effects. Additionally, one-hot encoding may address missing data issues by adding missing values as a new dimension, completing the missing dataset.
Z-score normalization was utilized to ensure data quality and consistency. Z-score normalization uses mean and standard deviation to generate normalized values or a range of data from the original unstructured data [27]. Z-score parameter used to normalize the unstructured data is expressed in Equation (1).
It allows the models to find patterns by balancing the data and reducing bias from high values. It also regulates the input data, accelerating the convergence of DL and ML models, resulting in more reliable and effective training.
In order to eliminate unwanted noise present in clinical and ECG data, an FIR filter was utilized [28]. To determine the discrete time state space technique, it utilizes finite observations on the most recent time period or receding horizon. The use of FIR based techniques provides multiple benefits for real-time applications. Even in the presence of inherent uncertainties that impact power system analysis, they yield more accurate predictions than other approaches. By lowering the estimate errors, the minimum variance harmonic (MVH-FIR) technique may also be used, which enhances filter accuracy. The time variance of inertia may be seen as a linear discrete time state space model by expressing the time q as q = h𝛥 q. It is mathematically expressed in Equations (2) and (3).
Certain frequency bands may be accurately suppressed or passed by FIR filters, which can be used to isolate baseline drift or noise in ECG readings. These filters are easy to integrate into digital systems and can be employed for real-time processing in embedded or portable cardiovascular monitoring equipment.
The MobileNet model and additional DenseNet layers were utilized in this study to predict CVD. The two phases of MobileNet were pointwise and depthwise convolutions [29]. After the downsampling of every feature map was completed, both of these phases were carried out. In terms of quantity for input channels, depthwise convolution applied a single convolution. MobileNet integrated 1X1 filter convolution into its processes, and pointwise convolution was used in the depthwise layer to provide a linear output combination. The MobileNet architecture utilizes ReLU and batch normalization (BN) instead of a single 3 × 3 convolution. This represents the main difference between the MobileNet and traditional CNN. Two steps comprise the traditional convolution method: filtering is the first step, and combining inputs into a new set of outputs is the second stage. This factorization significantly decreases the model’s computation and size.
For transparency, Dense-MobileNet utilizes depthwise separable convolution. This block is both depthwise and has a double layer of convolution. The collection of output from the earlier depthwise separable convolution layers becomes the first layer’s input data. The structure of DPAM is shown in Figure 2.
Detailed architecture of dense-assisted parallel attention based MobileNet module.
The processing of clinical input features used a parallel attention component combined with the depthwise and pointwise convolution layers from MobileNet. The attention block operated in parallel with the model to improve the detection of essential clinical patterns from the input data before the application of two depthwise separable convolution sequences, which downsampled and maintained spatial characteristics. The series of pointwise convolution operations after parallel attention blocks created compact medical output features which improved the classification of CVDs.
The input data received by the convolution layer is expressed as:
The traditional convolution’s computational cost is mathematically expressed in Equation (5).
The parallel attention mechanism was integrated into three modules, where two modules concentrated on detection and one focused on the improvement of the model. The dynamic relation within each parameter and the target CVD detection was adaptively captured by the first attention module. It is illustrated in Equations (6)–(7).
The mathematical representation of the second attention module is represented in Equations (8)–(9).
The dynamic relationship of CVD detection was concatenated with all observed parameters by utilizing the final attention module. It is mathematically expressed in Equations (10)–(12).
By utilizing dense connections to enhance feature propagation and reuse, the dense-assisted architecture enhanced the extraction of important clinical characteristics. By concentrating geographical and channel-wise information, the parallel attention process improved the model’s capacity to recognize minor signs of cardiovascular illness. The model’s performance in detecting abnormalities was improved by the combination of dense connections and attention processes, which lowered false positives and negatives.
ECG data were extracted based on frequency domain features. The energy distribution of signal over different frequency bands was examined through frequency-domain characteristics. These characteristics were very helpful for studying heart rate variability (HRV) and comprehending how both parasympathetic and sympathetic nervous system activities are balanced.
In order to extract characteristics from ECG data, the LMT-FD model [30] efficiently split the signal into significant components. The statistical and texture features were initially chosen to extract the defect features from the LW decomposition frequency domain sparsely. These characteristics consisted of Hu’s invariant moments, skewness, kurtosis, standard deviation, root mean square (RMS) and entropy. It was determined that the statistical and textural properties were effective and that sparse metrics can be easily calculated, which made them useful. This eliminated the need for an intricate feature extraction process that relied on optimum algorithms and extensive expert knowledge.
The subsequent detailed description of a + 1 → a resolution level of the LW decomposition technique is based on multi-resolution analysis theory and literature. It is expressed in Equations (13)–(16).
At the resolution level of m, the coefficient matrices is denoted by syy′ mgg′, 𝛼yy′ . mgg′, 𝛽yy′ . mgg′ and 𝜒yy′ . mgg′. The decomposition feature maps obtained through the LMT effectively extract the prominent defect characteristics of CVD detection.
LMT provides better time-frequency localization, and it is possible to precisely extract minor aspects from ECG signals that are essential for the diagnosis of CVD. Even in noisy conditions, it ensures accurate feature extraction by effectively denoising ECG data while maintaining key properties.
OPA-BETN demonstrated superior performance over basic transformer architectures because its progressive attention methods combined with bidirectional encoding boosted clinical and ECG data feature relationships. Unlike standard transformer models that struggle with localized temporal dependencies and multimodal fusion, OPA-BETN effectively captured fine-grained temporal patterns through its BiLSTM layer while leveraging transformer encoders for global context modeling. Additionally, the progressive attention mechanism selectively focused on the most salient features at each stage, reducing irrelevant noise and enhancing interpretability. This hybrid design allowed OPA-BETN to achieve superior performance in terms of accuracy, recall, and specificity, demonstrating its capability to handle the complexity and variability inherent in CVD.
OPA-BETN, a hybrid DL model, uses advanced attention techniques to concatenate the characteristics of clinical data and signals. The transformer technique is designed similarly to vision transformer (ViT). The detailed structure of OPA-BETN is illustrated in Figure 3.
Workflow of the optimized progressive attention based bidirectional encoder enclosed transformer network.
To ensure that the input values are not too large to analyze, layer normalization was applied prior to multihead self-attention (MSA) and multilayer perceptron (MLP). After applying the transformer encoder with L layers on the token sequence y0, a series of contextualized encodings yL was generated. The MSA and MLP blocks formed a transformer layer. In order to effectively model and collect global context information, the MSA module was utilized. The transformer obtained non-linear transformation capabilities from the MLP module. The transformer improved feature expression by switching on functionality. The mathematical representation for the MLP module is represented in Equation (17).
An LSTM network that transmits learning data from the beginning to the end is called a BiLSTM. Furthermore, the data is provided from start to end, allowing for improved learning at every data timestep. A forward and a backward LSTM network form the BiLSTM network. The forward LSTM hidden layer facilitates the extraction of forward features, while the backward one handles backward feature extraction. The mathematical representation of the forward and backward output of BiLSTM is expressed in Equations (20)–(22).
The proposed system merged ECG signal features along with clinical features by applying transformer encoders and BiLSTM processing layers. The first step involved adding positional information to embedded data before subjecting it to a multi-head attention mechanism to discover global dependencies. The layer normalization, together with a multilayer perceptron (MLP), was built on top of learned representations. Progressive attention within the BiLSTM layer captured future and past temporal relations together with contextual information from the input. Softmax classification took place through an output aggregation powered by the combined output information.
A CNN was used in the progressive attention technique to focus on target objects of various sizes and shapes. The use of many CNN layers gradually eliminated portions of a picture that were insignificant. Progressive attention contained intra-modal attention, map sequenced attention and cross-modal attention. In intra-modal attention, to collect and improve important intra-modal aspects of B and C, respectively, two distinct transformer decoders were utilized and is illustrated in Equation (23).
It is normal for differences with many distinct frames to be more significant than others due to the diversity of temporal relationships. In order to squeeze N, a frame-wise attention mechanism that determines the weights of each element in Na using feature Aa of Ia is suggested.
Using appearance or motion characteristics alone to differentiate generic event borders is challenging due to their complexity and diversity. This problem can be resolved by combining LSTM and BiLSTM. However, earlier fusion techniques are unable to fully use feature complementarity and simultaneously learn features across modalities. Cross-modal feature aggregation was therefore carried out to take advantage of the interdependence between two modalities and is illustrated in Equation (25).
For CVD detection, OPA-BETN has several advantages. More accurate and dependable illness detection results are seen due to increased bidirectional context awareness, effective feature extraction through progressive attention processes, and higher classification accuracy.
The LWO method was chosen for hyperparameter tuning due to its dynamic and adaptive search strategy. LWO method is suitable for hyperparameter adjustment, as its dynamic search approach emulates the randomly directed leaf movement during wind phenomena to perform effective exploration while efficiently exploiting the solution space possibilities. LWO method’s results are superior to grid search and random search because its introduction of breeze and wind-driven movement randomness achieves quicker global optimum attainment without getting trapped in suboptimal solutions. OPA-BETN models benefit from this nature-based method because it optimizes the complex and extensive hyperparameter space for improved CVD detection reliability.
Leaf in Wind Optimization (LWO) [31] was employed to enhance the performance of the proposed model. The movement of leaves inspires the techniques for this optimization presented in this section. The domain for leaf motion is the solution space, where each leaf indicates a separate optimization process. Let M be the total number of leaves and Y be the set of all leaves. E stands for the dimensionality of the leaf motion space, which is the same as the dimensionality of the optimization problem’s solution vector. The connection between those three variables is shown in Equation (26).
The wind force affects every leaf equally when there is a light breeze. It is mathematically expressed in Equation (27).
The spatial coordinates of the leaves are modified using Equation (28) to accurately capture the intrinsic unpredictability in each leaf’s motion.
The mathematical representation of the signifying ath leaf is explained in Equations (29) and (30).
It is proposed that the effects of strong winds on leaves appear as the modification of a particular unidimensional spatial position, as opposed to the influence of mild breezes on foliage. At first, the ith leaf’s new location is maintained in its original state. It is mathematically expressed in Equation (34).
The location of the leaves then undergo unidirectional displacement as a result of dimension a that is randomly selected to be affected by severe winds. The mathematical representation for the resulting movement is illustrated in Equations (35)–(36).
These equations simulate the impact of strong wind, which drives leaves in a single dimension a. The current position of the ith leaf in dimension a is denoted as , and the best population in dimension a is indicated as . The spiral motion scaling factor and force applied to the leaf in dimension a is denoted as s2 and . This formulation balances convergence pressure with exploration by using exponential wind scaling while preserving diversity across dimensions via randomized selection.
The pseudocode for the LWO is elucidated in Algorithm 1.
A uniformly distributed random number is denoted by s2. Another randomly chosen leaf is represented as Ym3 . As shown in Equation (36), a probability mechanism is used to randomly reset the position in the ath dimension in order to guarantee the maintenance of leaf position variety in the face of strong wind and is expressed in Equation (37).
Fitness value is used to determine the final update method for the leaf position and is shown in Equation (38).
During initialization, the complexity and fitness functions are represented as O (M) and M, respectively. The complexity of O (M × Tmax) for the revised solution results from the number of function computations caused by M leaves inside each iteration of Tmax.
The nature-inspired methodology of LWO speeds up convergence, enabling the identification of ideal hyperparameters more quickly without compromising accuracy. In order to provide a balanced search over the hyperparameter space, avoid local minima, and find optimum solutions, the algorithm’s dynamic movement simulates the natural motion of leaves in the wind. Reducing false positives and improving detection accuracy are the two objectives that LWO can handle, and are necessary for a precise medical diagnosis.
In this study, an enhanced hybrid DL approach with an effective feature extraction mechanism was proposed for CVD detection in an IoT environment. The performance of the proposed model was evaluated by using two datasets, namely, PTB-XL and cardiovascular disease. The hyperparameter details of the proposed model are given in Table 2.
Parameters | Values |
---|---|
Epoch | 300 |
batch size | 32 |
mlp_dropout | 0.4 |
head_size | 256 |
activation | Softmax |
Hyperparameter details of the proposed model.
The Physikalisch-Technische Bundesanstalt (PTB) employed a long-term initiative to curate and transform the records into an organized database. The database was utilized in several studies [1, 2], although access is still limited. Two characteristics, namely heart rate and calorie intake, were extracted from the Fitbit dataset and Apple watch. These characteristics are part of the PTB-XL dataset. The following features were then combined with the proposed dataset: ecg_id, patient_id, age, sex, height, weight, nurse, ap_hi, ap_lo, cholesterol, gluc, smoke, alco, active, site, device, recording_date, report, scp_codes, heart_axis, infarction_stadium1, infarction_stadium2, validated_by, second_opinion, initial_autogenerated_report, validated_by_human, baseline_drift, static_noise, burst_noise, electrodes_problems, extra_beats, pacemaker, strat_fold, filename_lr, filename_hr, and diagnostic_class. The extracted heart rate and calorie features were appended to the proposed dataset, and the updated dataset was saved as a CSV file.
Commercial wearable technology has significant potential for population-level physical activity measurement. This study’s objective was to find out whether commercial wearable technology can accurately predict sitting, laying, and different levels of physical activity using a lab-based methodology. According to the findings, scalable physical activity type categorization at the population level may be achieved by combining ML techniques with minute-by-minute data from Fitbit and Apple Watch.
In order to detect CVD, the proposed model was compared with few existing techniques, such as convolutional neural networks (CNN), CNN- long short-term memory (LSTM), residual network (ResNet-50) and vision transformer (ViT).
Evaluation of accuracy and precision of the proposed and existing models is shown in Figures 4(a)–(b). Compared with other existing models, the proposed model achieved higher accuracy and precision values of 99.0% and 97.2%, respectively. As shown in Figure 4(a), the existing CNN and CNN-LSTM attained higher accuracy values of 96.7% and 95.2%, respectively. However, these existing techniques have high computational loads. As shown in Figure 4(b), the existing ResNet-50 and ViT achieved low precision values of 92.1% and 90.0%, respectively. Moreover, these techniques are highly dependent on data.
(a)–(b) Accuracy and precision of the proposed and existing models.
Figures 5(a)–(b) elucidate the Recall and F1-score of the proposed and existing models. As shown in Figure 5(a), the existing ResNet-150 and ViT achieved lower recall values of 92.32% and 90.1%, respectively. However, these existing techniques are computationally very expensive. The existing CNN and CNN-LSTM attained high F1- Score of 96.7% and 95.4%, respectively, as shown in Figure 5(b). These existing techniques have a higher possibility of errors. In contrast, the proposed model achieved higher recall and F1-Score values of 97.2%.
(a)–(b) Recall and F1-Score of the proposed and existing models.
The comparison of the proposed and existing models in terms of specificity is shown in Figure 6. The existing CNN and ResNet-50 achieved average specificity values of 96.1% and 92.5%, respectively. The existing CNN-LSTM and ViT attained lower specificity values of 95.2% and 89.9%, respectively. However, these existing methods have high computational loads. Compared to the existing techniques, the proposed model achieved a higher specificity value of 99.3%.
Performance evaluation of specificity.
Figures 7(a)–(b) depict the performance analysis of the proposed and existing models in terms of TPR and TNR. As shown in Figure 7(a), the existing ViT and ResNet-150 models achieved lower TPR values of 88.7% and 90.5%, respectively. However, these existing techniques have high interpretability issues. The existing CNN and CNN-LSTM attained higher TNR values of 95.5% and 94.3%, as shown in Figure 7(b). The proposed model outperformed the existing models with higher TPR and TNR values of 97.2% and 99.3%, respectively.
(a)–(b) Performance evaluation based on true positive rate (TPR) and true negative rate (TNR).
The performance analysis of the proposed and existing models based on FPR and FNR are illustrated in Figures 8(a)–(b). The existing ViT achieved a higher FPR value of 0.056, as shown in Figure 8(a). Figure 8(b) shows that the existing ViT and ResNet-150 models attained higher FNR values of 0.3 and 0.278, respectively. The proposed model achieved higher FPR and FNR values of 0.021 and 0.18, respectively.
(a)–(b) Performance evaluation based on false positive rate (FPR) and false negative rate (FNR).
The performance of the proposed and existing models based on training accuracy curve is shown in Figure 9(a). The proposed model achieved a high performance value of 0.95 at 0 to 300 epochs. The performance of the proposed and existing models for the training loss curve is displayed in Figure 9(b). The proposed model achieved a desirable performance value of 0.12 between 0 to 300 epochs.
(a)–(b) Performance analysis of training accuracy and loss.
The receiver operating characteristic (ROC) curve for detecting CVD is demonstrated in Figure 10. ROC curves are used to graphically represent the performance of a binary classifier model at different threshold circumstances. To create an ROC curve, the true and false positive rates (TPR-FPR) are calculated at every threshold level. The ROC curve of the proposed model attained a high performance value of 98.9. The comparison of the proposed and existing models for detecting CVD is elucidated in Table 3.
ROC curve.
Model | Accuracy (%) | Precision (%) | Recall (%) | F1-score (%) | Specificity (%) | TPR | TNR | FPR | FNR |
---|---|---|---|---|---|---|---|---|---|
CNN | 96.7 | 96.2 | 96.3 | 96.7 | 96.1 | 93.67 | 95.56 | 0.021 | 0.18 |
CNN-LSTM | 95.2 | 95.3 | 95.2 | 95.4 | 95.2 | 92.5 | 94.3 | 0.032 | 0.24 |
ResNet-50 | 92.5 | 92.1 | 92.3 | 92.7 | 92.5 | 90.5 | 92.6 | 0.045 | 0.278 |
ViT | 90.5 | 90.08 | 90.1 | 90.8 | 89.9 | 88.7 | 89.8 | 0.056 | 0.3 |
Proposed | 99.0 | 97.2 | 97.2 | 97.2 | 99.3 | 97.2 | 99.3 | 0.006 | 0.027 |
Performance analysis of the proposed and existing models.
The image representation of the original and filtered ECG signal is illustrated in Figure 11.
Original and filtered ECG signal.
Table 4 shows the performance analysis for cross validation performances of the proposed model.
Fold | Accuracy (%) | Precision (%) | Recall (%) | F1-score (%) |
---|---|---|---|---|
1 | 98.7 | 96.8 | 96.5 | 96.5 |
2 | 98.9 | 97.1 | 97.0 | 97.1 |
3 | 99.0 | 97.6 | 97.4 | 97.3 |
4 | 98.8 | 97.4 | 96.9 | 96.5 |
5 | 98.9 | 97.0 | 97.2 | 97.2 |
Cross validation performance analysis.
The proposed model showed top-level accuracy, reaching 99%, yet it has valid overfitting concerns because of its elaborate deep layer structure, attention modules, and bidirectional processing elements. During model training, multiple regularization methods were used, which included MLP module dropout layers with a 0.4 dropout rate, longer with early stopping based on validation loss and batch normalization in all convolutional layers. The model utilized PTB-XL data that contained diverse ECG and clinical elements to achieve strong performance on new unseen data. The performance metrics show promise, yet 10- fold cross-validation experiments need to be done to validate the model’s capacity for multi-data division generalization. The performance estimation will establish reliability through this method while strengthening the accuracy.
The superior performance of the proposed OPA-BETN model over the existing models is attributed to several architectural features and methodological enhancements. Initially, the progressive attention mechanism played a critical role by adaptively selecting the most relevant features at each stage of pre-processing, effectively reducing noise and enhancing the interpretability of clinical and ECG data representations. The OPA-BETN model delivered better results than traditional CNN, CNN-LCTM, ResNet-50 and ViT models based on all performance measures. The BiLSTM layers within the model enabled it to track temporal patterns forward and backwards because understanding dynamic ECG signal characteristics and patient history behavior is essential when both forward and backward patterns matter in signal analysis. However, spatial features take precedence in other models, such as CNN and ResNet. DPAM achieved better clinical feature extraction along with superior dependency identification since the simple CNN model performed poorly in this regard. Simultaneously, the MLT-FD ensured meaningful frequency domain feature extraction from ECG signals, which had effectively isolated key components critical to CVD. The detection accuracy of optimized ECG signal sequence analysis was enhanced by using OA-BETN together with BiLSTM layers, which delivered 97.2% sensitivity and 99.3% specificity. The baseline models demonstrated considerable ineffective performances. The image-based tasks performed well for CNN along with ResNet-50, yet these models displayed inefficiency when processing time-dependent ECG data patterns. OPA-BETN outperformed CNN-LSTM due to its superior ability in multimodal fusion, while the latter improved temporal patterns. Small medical datasets caused ViT to become unstable, and significant data was needed to achieve its best performance. The proposed model maintained high accuracy with computational feasibility, making it suitable for integration into IoT for real-time monitoring.
The proposed OPA-BETN system reached high accuracy levels for CVD diagnosis but requires several important considerations. The complex combination of architectures within the proposed model demands major computational capabilities that limit its implementation capability on constrained resources such as IoT devices and wearable technology. Database usage restrictions regarding the PTB-XL and wearable-derived datasets introduce clinical and demographic restrictions that affect how well the model works for different populations. There are insufficient validation results demonstrating how the model would perform in real-world settings that come with ECG signals contaminated by noise and artifacts. Future activities will be focused on these restrictions by developing minimalistic deployment strategies while enhancing the CVD classification framework across multiple categories and employing additional informative datasets along with transparency systems through XAI methods, followed by real-time performance assessment using connected wearable systems. A comparison of CVD detection on clinical and ECG data is shown in Table 5.
Techniques | Accuracy (%) | Precision (%) | Recall (%) | F1-score (%) |
---|---|---|---|---|
Mishra et al. [17] | 97.8 | - | - | - |
Hannan et al. [18] | 97 | - | - | - |
Kumar et al. [22] | - | 85 | - | - |
Proposed model | 98 | 96 | 97.2 | 97.2 |
Comparative analysis of the existing models.
In order to reduce the negative impacts of CVD on individuals, this study highlights the significance of timely detection. This research significantly contributes to the area of ECG signal detection by obtaining greater accuracy and efficiency in CVD detection.
Several DL techniques, including CNN and LSTM, have shown promise in identifying CVD detection. The purpose of this study was to assess the performance of the proposed model on distinct datasets. The proposed method demonstrated encouraging outcomes, representing their capacity to recognize and categorize CVD diagnosis based on clinical and ECG data. Experimental results show that the clinical and ECG data attain higher accuracy, precision, recall, F1-Score, Specificity, TPR, TNR, FPR and FNR values of 99.0%, 97.2%, 97.2%, 97.2%, 99.3%, 97.2, 99.3, 0.006 and 0.027, respectively. Additional work needs to be done to develop a DL approach for CVD detection.
Divya, N.J.: Writing – original draft, Data Curation, Visualization, Methodology, Software, Validation, Formal analysis, Resources; Suresh Kumar, N.: Supervision, Project administration, Conceptualization, Writing – review & editing; Kanniga Devi, R.: Formal analysis, Investigation, Resources, Writing – review & editing.
This research did not receive external funding from any agencies
Not applicable.
Source data is not available for this article.
The authors declare no conflict of interest.
Written by
Article Type: Research Paper
Date of acceptance: May 2025
Date of publication: June 2025
DOI: 10.5772/acrt.20250009
Copyright: The Author(s), Licensee IntechOpen, License: CC BY 4.0
© The Author(s) 2025. Licensee IntechOpen. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
Impact of this article
51
Downloads
97
Views
Join us today!
Submit your Article