# Real-Time IC Aging Prediction via On-Chip Sensors Ke Huang\*, Md Toufiq Hasan Anik<sup>†</sup>, Xinqiao Zhang\* and Naghmeh Karimi<sup>†</sup> \* Electrical and Computer Engineering Dept. San Diego State University San Diego, CA 92182, USA † Computer Science and Electrical Engineering Dept. University of Maryland Baltimore County Baltimore, MD 21250, USA Abstract—Real-time aging prediction for nanoscale integrated circuits (ICs) is a crucial step for developing prevention and mitigation actions to avoid unexpected circuit failures in the field of operation. Current practices for predicting aging-related performance degradation in ICs consist of recording the operating conditions (e.g. workload, temperature, etc.) throughout ICs usage time and building a learning model that maps historical operating conditions to actual performance degradation. While some operating conditions such as IC workload can be readily recorded using existing on-chip structures (e.g. registers), other operating conditions such as historical temperature values may not be available for real-time aging degradation prediction. In this paper, we develop a novel real-time IC aging prediction scheme using a set of on-chip sensors that can accurately record historical operating condition parameter values, which will in turn be used for aging-related performance degradation prediction. Experimental results show that by using a machine learning based prediction model and the notion of equivalent aging time, we can achieve accurate aging degradation prediction with the proposed on-chip sensor structure. #### I. INTRODUCTION The performance degradation of an IC due to aging phenomena will result in serious reliability and safety concerns especially when ICs are deployed in safety-critical applications. Moreover, cumulative performance degradation over time would lead to complete device failure as a result of extensive usage. Thus, developing real-time aging degradation monitoring and prediction approaches is of paramount importance to enhance IC lifetime safety and to prevent unexpected run-time failures due to aging. Various approaches have been proposed for predicting realtime IC performance degradation. The key concept for such prediction scheme is to build a predictive model that maps usage time and operating conditions under which the IC had been used to the performance degradation level. The operating conditions that may affect IC performance degradation rate include voltage bias, temperature, workload distribution, etc. Once the historical operating conditions and usage time are obtained, we can use different techniques for predicting IC performance degradation, including look-up tables [1], aging sensors [2], electromagnetic signature [3], or machine learning models [4], [5]. The predicted IC performance degradation level can be then used to guide aging mitigation/compensation actions, such as adaptively changing maximum operating frequency, bias voltage [6], or by giving warnings on circuits with timing guardband violation [2]. Those aging prediction approaches often assume that operating condition parameters stay stationary over time. A recent study in [7] showed that time-variant operating condition parameters can be efficiently taken into account in the prediction scheme using the notion of equivalent aging time [8]. The major assumption in the aforementioned aging prediction techniques is that historical operating condition parameters, such as voltage bias, temperature, workload distribution, etc. are already available and can be readily used in the prediction model. While some operating condition parameters such as workload distribution can be sampled using on-chip structures such as registers, other parameters such as historical temperature values are in general not available. The inclusion of various on-chip aging sensors has been proposed to facilitate detection of aging-related failures [9], [10]. However, these on-chip aging detectors are designed to monitor aging degradation on the integrated aging-sensors without mapping sensors' degradation to actual device performances. Thus, the impact of device performance-specific operating conditions (e.g. workload distribution) on aging cannot be efficiently taken into account by these aging detectors. In this work, we propose a novel approach for real time IC aging prediction using a set of on-chip sensors and machine learning techniques. The key concept of the proposed approach is to collect historical operating condition parameter values from on-chip sensors, and feed them into a machine learning model for accurate prediction of aging-related performance degradation. Unlike previous on-chip aging and temperature sensor implementations which are also subject to aging degradation depending on the environmental condition as well as their activity rates (i.e., how frequent the sensor is on or off), we proposed to implement a smart control mechanism for our temperature sensor so that it is only powered on when sensing the actual environmental parameter values on the device. We show in our experimental results that aging degradation on the on-chip sensors is negligible while achieving accurate aging prediction for IC performances. #### II. BACKGROUND ON IC AGING Aging mechanisms result in performance deterioration and subsequent failure of digital circuits over time. The mechanisms include Negative-Bias Temperature-Instability (NBTI), Hot-Carrier Injection (HCI), Time-Dependent Dielectric Breakdown (TDDB), and Electro-Migration (EM). Among all, NBTI and HCI are the two key influences in the performance deterioration of digital circuits. Both procedures contribute to increased switching and path-delays [11]. **NBTI Aging:** A PMOS transistor is affected by NBTI when a negative voltage is applied to its gate. Depending on its operating condition, a PMOS transistor encounters two phases of NBTI. The first phase (stress) happens when the transistor is on. Here, at the Si-SiO<sub>2</sub> interface positive interface traps are created, leading to an increase in the transistor's threshold voltage. The second phase (recovery) happens when the transistor is off. In the recovery process, the threshold voltage drift occurred during the stress stage recovers partially. The NBTI effects depend on the transistor's physical parameters, supply voltage, temperature, and stress time [11]. HCI Aging: HCI happens as hot carriers are inserted into the dielectric gate during the transistor switching and stay there. HCI is a switching operation that degrades the circuit by changing the threshold voltage and the drain current of transistors under stress. HCI mainly affects NMOS transistors. The threshold voltage drift caused by HCI is sensitive to the number of transitions occur at the transistor gate input. The HCI rate depends on the temperature, clock frequency, period of use, and the activity factor of the transistor under stress, i.e., the ratio of the cycles the transistor is switching and the total number of cycles the device is utilized [11]. ## III. PRIOR WORK ON IC AGING PREDICTION The aging degradation of IC performances is influenced by a variety of its environmental factors such as supply voltage, temperature, workload distribution, stress time, etc. In [12], a worst-case scenario of IC environmental factors was considered for estimating aging related performance degradation. A lookup table-based failure prediction method was proposed in [1] by considering random changes in the system workload and supply voltages in the aging estimation. However, lookup table-based techniques may not be suitable for large-scale devices. A guardband technique was proposed in [2] which predicts circuit failure using aging sensors to capture impact of IC aging based on observation of timing guardband violation. In [3], aging effects in ICs were predicted by electromagnetic signatures which require expensive external equipment. In [4], [5], machine learning-based aging failure prediction techniques were proposed. In these techniques, a model was trained using a set of training samples that included operating condition parameter values such as workload and temperature, and aging indicator values such as delays of critical paths in a digital circuit. Once the aging prediction model is trained, it can then be used to predict aging degradation under new operating conditions. Traditional machine learning-based aging prediction approaches assume static operating condition, i.e., the condition parameters such as temperature under which the IC is operated are assumed to be constant between time 0 and time t when the prediction is performed. In a recent study in [7], time-variant operating condition parameters were taken into account in the prediction scheme using the notion of equivalent aging time [8] which considers time-variant information of operating conditions under which the IC is deployed. Note that historical time-variant operating condition parameters were assumed to be available a priori in [8] for aging degradation prediction. Once the aging degradation is predicted by a given prediction model, then actions for compensating aging degradation can be taken by adaptively changing maximum operating frequency [6], supply/bias voltage [13], device architecture [14], or by giving warnings on circuits with timing guardband violation [2]. Different dynamic adaptation techniques were explored in [15], including microarchitectural adaptation and dynamic voltage/frequency scaling. We propose a novel technique for real-time IC aging prediction that combines both on-chip sensing techniques to collect time-variant historical environmental parameters and machine learning prediction model. We will show the details of our proposed technique in the following section. ## IV. PROPOSED APPROACH Figure 1. Overview of the proposed approach. We show the implementation of our on-chip temperature and workload sensors. We discuss how the machine learning models can be used to accurately predict the aging-induced performance degradation using historical operating condition parameter values collected from our proposed on-chip sensors. # A. Proposed sensor-based aging prediction scheme Fig. 1 shows an overview of the proposed sensor-based aging prediction scheme. As shown, we first train a supervised model $f_j$ that maps the n operating condition vector (e.g. temperature, workload distribution, etc.) $O = [o_1, \ldots, o_n]$ to the j-th IC aging indicator (e.g. path delay value) $d_j$ : $f_j: O \mapsto d_j, j=1,\ldots,M$ , where M is the total number of considered IC performances used as aging indicators. In this work, we use a multivariate adaptive regression splines (MARS) model [16] to learn the function $f_j$ : (MARS) model [16] to learn the function $$f_j$$ : $$d_j = f_j(O,t) = a_0 + \sum_{i=1}^M a_i \cdot B_i(O,t) \tag{1}$$ where $a_0$ is the intercept, $a_i$ is the slope parameter, t is the usage time under O, and $B_i(O,t)$ is the i-th basis function. Note that the form of a basis function can be a hinge function or an interaction product of different hinge functions. The main reason for using a MARS model for our performance degradation training is that MARS model provides interpretable coefficients that can be used to quantify the contribution of input variables and their interactions on the performance degradation. This characteristic will assist process and test engineers in identifying and further mitigating aging degradation sources. Furthermore, the fact that MARS model can handle both continuous and discrete inputs makes it suitable for digital IC performance prediction. The training samples used for the proposed prediction model as shown in (1) can be obtained from circuit aging simulation. There are mainly two phases in the training of the MARS model, namely the forward learning phase, and the backward phase. In the forward learning phase, the basis functions are added by searching over all possible combinations of variables of hinge functions until convergence is reached (e.g., the residual error becomes smaller than a predefined threshold value). The search can be done using brute force method or heuristic approach to speed up the searching [17]. Then in the backward phase, the model is pruned by removing basis functions with the smallest increase in generalized cross-validation error. The goal of this phase is to remove the least effective basis functions to avoid overfitting. In case of deviation of the prediction model $f_i$ due to Process Variations (PV), as shown in Fig. 1, we employ a calibration technique to compensate the effect of PV on previously learned $f_i$ . In the calibration, the relative circuit performance deviation from the nominal value is calculated at the time of manufacturing, which will serve as a basis for calibrating PV-related deviation over time. Details on PV calibration can be found in [7]. Once the basic prediction model is learned and calibrated for PV, we employ an aging prediction technique for time-variant operating conditions based on the notion of equivalent aging time. In order to implement such technique, we need to collect historical time-variant operating condition parameters on a regular basis, which is done by our proposed on-chip sensors as shown in the bottom right part of Fig. 1. Specifically, the implemented sensors record the operating condition vector $O(t) = [o_1(t), \ldots, o_n(t)]$ , where each parameter $o_i$ is expressed by its own function $o_i(t)$ and n denotes the total number of operating condition parameters. Then the recorded O(t) will be fed into our aging prediction technique for time-variant operating conditions for accurate real-time prediction of circuit aging as shown in Fig. 1. ## B. Temperature sensor implementation Fig. 2(a) shows a block diagram of the deployed temperature sensor inspired by [18]. It includes a low-cost Ring Oscillator (RO) structure to record the sensed temperature, where the frequency of the RO is proportional to ambient temperature of the sensor. The RO includes a chain of even number of inverters (N) and one NAND gate to initialize the RO. RO's output frequency is determined based on the size of the underlying inverter-based delay chain and the delay of the included wires, inverters and the deployed NAND gate. As gate delays in the RO are affected by its ambient temperature, the RO's oscillation frequency is changed in different temperatures. Fig. 2(b) depicts the RO output in $10^{\circ}\text{C}$ and $75^{\circ}\text{C}$ when the sensor is fresh (i.e., age=0). In this figure, the frequency of R1 is $\frac{1}{2\eta}$ and $\frac{1}{2\beta}$ in $10^{\circ}\text{C}$ and $75^{\circ}\text{C}$ , respectively. The RO's output feeds the clock signal of the counter circuitry shown in blue in Fig. 2(a). Thereby, with the change of temperature, the counting rate will change and another value (10-bit in our design) is stored in the sensors' register (shown in green) representing the current temperature. Note that the recorded value is not a binary representation of temperature, yet it has a correlation with it [18]. In the ideal scenario, the RO's frequency is only a function of its ambient temperature. However, in practice, the RO-based temperature sensors are also affected by aging [19], [11], i.e., the RO output frequency and in turn the frequency of the counter's clock are subject to change during the field of operation due to aging degradation. This may result in metastability in the register storing the counter output (shown by the green block in Fig. 2(b) as its data input (i.e., counter's output) and clock signal (fed by external clock Clk) may change in a very short time-interval, i.e., setup- or hold-time violation (b) Ring Oscillator's output in different temperatures (age=0). The $\eta$ and $\beta$ values are not shown to make the waveform technology oblivious. Figure 2. Deployed sensor circuitry. Figure 3. Waveforms in age=50 months. The $\gamma$ value is not shown to make the waveform technology oblivious. occurs. This effect cannot be avoided during the design time by considering another clock frequency for the register since the aging degradation in RO's output and counter's clock frequency can be changed with arbitrary operating conditions later during the field of operation. To resolve this issue, as shown in Fig. 2(a), we have added a control circuitry in the clock path of this register. This circuitry prevents metastability and in turn setup- and hold-time violations in the related register by clocking the register in a specific time interval during which its data input is constant (previously loaded in the register input). Note that the register shown in green is enabled when the second counter (in red) counts up to a value stored in the Fixed-Value2 location. At this point of time, the value of $C_1$ is read and is used in the following clock cycles as a temperature representative. In fact, the value of Fixed-Value2 can be stored in a non-volatile memory during the design time. In this circuit, the register input $(C_1)$ gets stable one clock cycle before the register clock input to prevent metastability. Fig. 3 shows the intermediate signals in a 50 month old sensor operating in 75°C. The frequency of $R_1$ will be $\frac{1}{2\gamma}$ after 50 months of aging in 75°C. Accordingly, $C_1$ is changed in the rising edge of $R_2$ if the second counter value (in red) is less than *Fixed-Value1* stored in a non-volatile memory cell. Figure 4. Embedded circuitry to extract workload data of a M+1 input circuit to be used in our machine learning models for predicting aging effects. As discussed earlier, to minimize the impact of aging on the temperature sensor itself, we implemented a smart control mechanism for our temperature sensor so that it is only powered on when sensing the actual environmental parameter values on the device. As shown in Fig. 2(a) when signal PwC (Power Control) that controls the embedded switch is '0', the power of all underlying components is off, so they are not aged, otherwise all components can be functional. Thereby, by controlling PwC, we can turn on the sensor when needed. # C. Workload sensor implementation As discussed in Sec. IV-A, device workload is another important operating condition parameter that affects the aging-induced IC performance degradation rate, as the duration of having the value of "0" or "1" in the underlying transistor changes when running different workloads. The workload parameter in our prediction model is expressed as the number of primary inputs that get the value of "1" in each clock cycle. To efficiently extract the workload, we propose to implement an on-chip structure for computing the Hamming Weight (HW) (i.e., number of "1" values) of primary inputs in each clock cycle using a small tree of full adders. Fig. 4 depicts the circuitry needed to calculate the HW of primary inputs using a tree of adders (here, our circuit has M+1 primary inputs.) ## D. Model prediction under time-variant operating conditions Once the time-variant operating condition parameters $O = [o_1, \ldots, o_n]$ are extracted from the on-chip sensors, we can use them to perform time-variant aging degradation prediction. As discussed in Sec. IV-A, during the model training stage, we train and calibrate M functions to predict M IC performances from the operating condition vector O. To this end, we propose to approximate continuous time domain operating condition vector O(t) to discrete time domain $\tilde{O}(t)$ using piecewise-constant approximation derived from Riemann sum [20]: $$O(t) \approx O(t) = [O_1(t_1^*), O_2(t_2^*), \cdots, O_N(t_N^*)]$$ (2) where $O_i(t_i^*)$ denotes the constant approximation of the function $O(t)$ in the $i$ -th time interval: $O_i(t_i^*) = [o_1(t_i^*), \ldots, o_n(t_i^*)]$ . For example, the left rule can be used to approximate the value of $O_i(t_i^*)$ at the left endpoint $t_i^*$ in the $i$ -th interval. The total number of intervals $N$ can be determined by the following optimization scheme: $$\quad \text{minimize} \ N$$ subject to $$\int |O(t) - \tilde{O}(t)|^2 < \epsilon$$ (3) where $\epsilon$ is a user-defined threshold value. Once O(t) is estimated, we then use it to predict aging degradation under time-variant operation conditions based on equivalent aging time [8], [21]. Algorithm 1 shows the detailed steps for time-variant aging prediction scheme. # Algorithm 1 Time-Variant Aging Prediction - 1: **procedure** TIME\_VARIANT\_PREDICTION - 2: Train the function $d_j = f_j(O, t)$ using simulation samples, calibrate the model for process variations - 3: Select the total number of intervals N - 4: Set inputs $O(t) = [O_1(t_1^*), O_2(t_2^*), \cdots, O_N(t_N^*)]$ - 5: Set $i = 1, j = 1, t_{i,equ} = 0, t_{N+1}^* = t_{end}$ - 6: Select desired prediction time t in the i-th time interval - 7: Compute equivalent prediction time $t_p = t_{i,equ} + (t_{i+1}^* t_i^*)$ - Aging prediction of the j-th performance at the end of the i-th interval: d<sub>j,i</sub> = f<sub>j</sub>(O<sub>i</sub>, t<sub>p</sub>) - 9: **If** i < N - 0: Equivalent aging time computation: $t_{i+1,equ} = g(d_{j,i}, O_{i+1})$ - 11: i = i + 1 - 12: While i < N, repeat steps 6-12 - 13: j = j + 1 - 14: While j < M, repeat steps 2-14 - 15: end procedure ## V. EXPERIMENTAL RESULTS We demonstrate the effectiveness of the proposed approach using five different ISCAS'89 benchmarks. We used Synopsys Design Compiler and PrimeTime for logic synthesis and extraction of time-critical paths at 45-nm technology using the open-source Nangate library [22]. We consider the delays of timing critical paths as our aging indicators. These are the paths whose delay if degraded by 20% during the course of aging would possibly cause circuit failure. We used HSPICE MOSRA to evaluate the effect of NBTI and HCI aging for a period of 8 years with a time step of 2 months. The number of considered critical paths in each benchmark is shown in the parentheses in the 1st column of Table I. The operating condition vector considered in this study is: $O = [\alpha, T]$ , where $\alpha$ denotes the workload distribution parameter which is the average percentage value X% of primary inputs getting the value of '1' in each clock cycle where $X \in \{1, 25, 50, 75, 99\}$ , and T denotes the operating temperature. We conducted Monte Carlo (MC) aging simulations for each benchmark considering a Gaussian distribution for transistor gate length L: $3\sigma = 10\%$ ; threshold voltage $V_{TH}$ : $3\sigma = 30\%$ , and gate-oxide thickness $t_{OX}$ : $3\sigma = 3\%$ . ## A. Temperature sensor performance without aging degradation As shown in Sec. IV-C, the workload sensors extract binary workload data from the primary inputs of a device. Thereby, it is very unlikely that the workload sensor performances are impacted by aging degradation. Thus, we focus on the temperature sensors in this study to investigate the impact of aging degradation on the on-chip sensors. Fig. 5 shows a numerical example of the temperature sensor output plotted as Figure 5. Example of temperature sensor output vs. temperature values. a function of applied ambient temperature values without aging degradation. The 10-bit output values (output "O" in Fig. 2(a)) expressed in decimal numbers and the applied ambient temperature values are plotted in Y- and X-axis, respectively. We randomly sampled more than 30 temperature values in the range of $[10,75]^{\circ}C$ as our ambient temperature sample values, and for each sample we recorded the temperature sensor's 10bit output and converted it to decimal number. As Fig. 5 shows our temperature sensor output perfectly captures the changes in the ambient temperature with all the scatter points aligned in a straight line. A simple coefficient of determination value between temperature sensor output and ambient temperature is calculated as 0.997, which confirms the accuracy of the temperature sensor. Hereafter, we will use the temperature sensor output values as the temperature values in the training and validation of our aging prediction model. #### B. Temperature sensor performance with aging degradation We then performed aging simulation with our temperature sensors to study the impact of aging on the proposed temperature sensor implementation. We ran aging simulations using HSPICE MOSRA for a period of 8 years with a time step of 2 months. The aging simulation was performed 4 times for the following 4 temperature values applied across the 8 years' of simulated aging: $10^{\circ}C$ , $25^{\circ}C$ , $50^{\circ}C$ , $75^{\circ}C$ . We then repeated the same aging simulation for 5 iterations in an MC simulation by considering the process variation parameters discussed previously. Figures 6(a)-(d) show the temperature sensor output values plotted as a function of usage time for the 4 considered applied temperature values. Each sub-figure from Fig. 6(a)-(d) also contains the 5 MC samples generated from our MC simulations. It can be observed that our temperature sensor performances are very reliable w.r.t. usage time and experience almost no aging degradation. The main reason for such promising result is that by using the proposed smart control mechanism for our temperature sensors as discussed in Sec. IV-B, we only turn on the temperature sensors when needed, i.e. once every 2 months in our case. Thus, the impact of aging degradation on temperature sensors is minimized. Note that the sensor performance deviation caused by process variations can be easily calibrated as discussed in Sec. IV-A. To illustrate the advantages of our proposed temperature sensor implementation with smart control as compared to existing approaches without smart control [9], [10], we performed the same aging degradation analysis for our temperature sensors, without smart control this time, i.e., the temperature sensors are turned on throughout 8 years. The results are Figure 6. Temperature sensor output aging degradation with smart control for applied temperature value of (a) $10^{\circ}C$ (b) $25^{\circ}C$ (c) $50^{\circ}C$ , and (d) $75^{\circ}C$ . Figure 7. Temperature sensor output aging degradation without smart control for applied temperature value of (a) $10^{\circ}C$ (b) $25^{\circ}C$ (c) $50^{\circ}C$ , and (d) $75^{\circ}C$ . shown in Figures 7(a)-(d). It can be clearly observed that the aging degradation is very pronounced this time, which would result in inaccurate device performance predictions, especially with long usage time and high aging degradation. ## C. Benchmark device aging degradation prediction To train the aging prediction model, we generated a sample set of 2,000 devices by sampling the input space [O,t] using the LHS method. We sampled each model input parameter in the following ranges $\alpha=[0,1], T=[25,75], t=[0,8yrs]$ with T expressed in Celsius degree. We then randomly split the 2,000 samples into equal training and validation sets to build the prediction model $f_j(O,t)$ , as shown in Equ. (1). The root mean square error (RMSE) for the validation set averaged on all considered critical paths in each benchmark is below 2%. Once the basic models $f_j$ are learned, they are validated, and Figure 8. Aging prediction plot for a path delay in s5378 under scenario 3: (a) normalized aging degradation, and (b) prediction plot for this path. Table I AGING PREDICTION RESULTS UNDER TIME-VARIANT OPERATION CONDITIONS IN SCENARIO 3. | Benchmark | RMSE | RMSE | RMSE | |-----------------------|----------------|---------|-------| | (# of critical paths) | Proposed model | SVM [5] | RNN | | s510 (21) | 1.15% | 4.38% | 4.32% | | s1494 (57) | 1.19% | 4.41% | 4.33% | | s5378 (392) | 1.15% | 4.59% | 4.45% | | s9234 (179) | 1.18% | 4.62% | 4.68% | | s15850 (180) | 1.23% | 4.71% | 4.64% | calibrated for process variations as shown in Sec. IV-A. To show the effectiveness of the proposed approach for predicting aging degradation on time-variant operating conditions, we generated the following scenario: four temperatures: $25^{\circ}C$ , $10^{\circ}C$ , $75^{\circ}C$ , $50^{\circ}C$ applied at the time intervals [0, 2yrs], [2, 4yrs], [4, 6yrs], [6, 8yrs], respectively, during the 8 years. We then determine the number of time intervals as N=4using the optimization procedure outlined in (3). Thus, the equivalent aging time is computed three times according to Algorithm (1). We consider 5 different $\alpha$ values $\alpha$ = 1%, 25%, 50%, 75%, 99% under the same time-variant temperature profile for the 8 years: $25^{\circ}C$ , $10^{\circ}C$ , $75^{\circ}C$ , $50^{\circ}C$ applied in the time intervals [0, 2yrs], [2, 4yrs], [4, 6yrs], [6, 8yrs], respectively. Fig. 8(a) shows the normalized aging degradation for a path delay in s5378 under the operating conditions applied in this scenario. It can be observed that there is only one sharp increase in the path delay at the $4^{th}$ year, since the only sharp temperature increase in this scenario was applied in the $4^{th}$ year (from $10^{\circ}C$ to $75^{\circ}C$ ), while the temperature value changes between all other consecutive time intervals were a temperature decrease (from $25^{\circ}C$ to $10^{\circ}C$ at the $2^{nd}$ year, and from $75^{\circ}C$ to $50^{\circ}C$ at the 6th year), which resulted in even lower aging degradation rate. The $2^{nd}$ column in Table I shows the averaged prediction RMSE from all critical paths and all $\alpha$ values for each of the benchmarks using the proposed approach. For comparison with the state of the art prediction techniques, we also applied two other techniques for predicting the same path delay values, namely the Support Vector Machine (SVM) regression model [5], and the fully Recurrent Neural Network (RNN) model which is very efficient in capturing dynamic behavior for a time sequence [23]. The prediction results using the two techniques are shown in the 3rd and 4th columns in Table I. It can be observed that the proposed model consistently outperforms the state of the art models with approximately 3\% less in RMSE values, as the sharp increase of aging degradation rate at the 4th year would result in large prediction error if equivalent aging time was not considered in the prediction. Fig. 8(b) shows the aging prediction plot using the proposed approach for the same path from Fig. 8(a). It can be observed that the proposed model can accurately predict aging degradation under time-variant operating conditions. ## VI. CONCLUSIONS We proposed a novel approach for real time IC aging prediction using on-chip sensors and machine learning techniques. We collected historical operating condition parameter values from on-chip sensors, and fed them into a machine learning model for accurate prediction of aging-related performance degradation. Our proposed sensors are only powered on during collection of operating condition parameter values so that their aging-related degradation is minimized. Experimental results show that our approach outperforms existing methods in terms of aging prediction accuracy under different scenarios of timevariant operating conditions. ## ACKNOWLEDGMENT This work was partly funded by the Strategic Awards for Research Transitions (START) program at UMBC. #### REFERENCES - [1] Z. Yang et al., "Workload-aware failure prediction method for VLSI devices using an LUT based approach," in I2MTC, 2018, pp. 1-6. - [2] M. Agarwal et al., "Circuit failure prediction and its application to transistor aging," in VTS, 2007, pp. 1–8. S. Shinde et al., "Wideband microwave reflectometry for rapid detection - of dissimilar and aged ICs," TIM, vol. 66, no. 8, pp. 2156-2165, 2017. - [4] N. Karimi et al., "Prognosis of IC aging based on machine learning," in DFT, 2016, pp. 1-4. - [5] A. Vijayan et al., "Fine-grained aging-induced delay prediction based on the monitoring of run-time stress," *TCAD*, vol. 37, no. 5, 2018. - [6] A. Tiwari et al., "Facelift: Hiding and slowing down aging in multicores," in MICRO, 2008, pp. 129-140. - [7] K. Huang et al., "Real-time prediction for IC aging based on machine learning," TIM, vol. 68, no. 12, pp. 4756-4764, 2019. - [8] Y. Lu et al., "Statistical reliability analysis under process variation and aging effects," in DAC, 2009, pp. 514-519. - [9] K. K. Kim et al., "On-chip aging sensor circuits for reliable nanometer MOSFET digital circuits," TCAS-II, vol. 57, no. 10, pp. 798–802, 2010. - [10] J. Keane et al., "An all-in-one silicon odometer for separately monitoring - HCI, BTI, and TDDB," JSSC, vol. 45, no. 4, pp. 817–829, 2010. [11] M. Anik et al., "On the effect of aging on digital sensors," in VLSID, 2020, pp. 189-194. - [12] K. Kang et al., "Efficient transistorlevel sizing technique under temporal performance degradation due to NBTI," in *ICCD*, 2006, pp. 216–221. - [13] E. Karl et al., "Multi-mechanism reliability modeling and management - in dynamic systems," *TVLSI*, vol. 16, no. 4, pp. 476–487, 2008. [14] J. Srinivasan et al., "Exploiting structural duplication for lifetime reliability enhancement," in ISCA, 2005, pp. 520-531. - , "The case for lifetime reliability-aware microprocessors," in ISCA, - 2004, pp. 276–287. [16] F. Friedman, "Multivariate adaptive regression splines," *Ann. Stat.*, - vol. 19, no. 1, pp. 1-67, 1991. J. Friedman, "Fast MARS," Stanford University Department of Statistics, - Technical Report 110, pp. 1-17, 1993. [18] R. Zhang et al., "Impact of front-end wearout mechanisms on the performance of a ring oscillator-based thermal sensor," in IWASI, 2019, pp. 258–263. - [19] M. T. Rahman et al., "ARO-PUF: an aging-resistant ring oscillator PUF design," in DATE, 2014, pp. 1-6. - [20] P. Roe, "Approximate riemann solvers, parameter vectors, and difference - schemes," *J. Comput. Phys.*, pp. 357–372, 1981. [21] K. Huang et al., "Recycled IC detection based on statistical methods," - TCAD, vol. 34, no. 6, pp. 947–960, 2015. [22] "Nangate 45nm open cell library," "http://www.nangate.com". [23] R. Williams et al., "A learning algorithm for continually running fully recurrent neural networks," Neural Comp., vol. 1, no. 2, pp. 270-