# Real-Time Prediction for IC Aging Based on Machine Learning

Ke Huang, Member, IEEE, Xinqiao Zhang, and Naghmeh Karimi<sup>®</sup>, Member, IEEE

Abstract-Estimating the aging-related degradation and failure of nanoscale integrated circuits (ICs), before they actually occur, is crucial for developing aging prevention/mitigation actions and in turn avoiding unexpected in-field circuit failures. Real-time monitoring of IC operating conditions can be efficiently used for predicting aging degradation and in turn timing failures caused by device aging. The existing approaches only take some specific operating conditions (e.g., workload or temperature) into account. In this paper, we propose a novel method for real-time IC aging prediction by extending the prediction schemes to a comprehensive model which takes into account any time-variant dynamic operating conditions relevant to aging prediction. Using a machine learning prediction model and the notion of equivalent aging time, we show that our approach outperforms the existing methods in terms of aging-prediction accuracy under different scenarios of time-variant operating conditions.

Index Terms—Equivalent aging time, hot carrier injection (HCI), machine learning, negative/positive bias temperature instability (NBTI/PBTI), real-time IC aging prediction.

# I. INTRODUCTION

THE robustness and reliability concerns of modern integrated circuits (ICs) arise significantly with aggressive scaling and process variations. The deviation of electrical behavior of transistors due to negative/positive bias temperature instability (NBTI/PBTI) and hot carrier injection (HCI) degrades the circuit performance over time and ultimately leads to device failure as a result of extensive usage. Thus, runtime prediction and prognosis of circuit performance degradation, before it actually occurs, are of paramount importance to ensure that no catastrophic consequence would occur due to an unexpected run-time failure caused by aging degradation [1].

In practice, the rate of IC performance degradation is impacted by a variety of the IC operating conditions, such as voltage bias, temperature, and workload distribution. Moreover, these operating conditions often exhibit time-variant behaviors, i.e., these parameter values may change over time. This in turn affects the rate of run-time IC degradation and

Manuscript received December 3, 2018; accepted January 1, 2019. The Associate Editor coordinating the review process was Leonid Belostotski. (Corresponding author: Naghmeh Karimi.)

K. Huang and X. Zhang are with the Department of Electrical and Computer Engineering, San Diego State University, San Diego, CA 92182 USA (e-mail: khuang@sdsu.edu; xzhang10@sdsu.edu).

N. Karimi is with the Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County, Baltimore, MD 21250 USA (e-mail: naghmeh.karimi@umbc.edu).

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TIM.2019.2899477

makes aging failure prediction more uncertain and challenging. Classic approaches to evaluate the effect of aging degradation in the field of operation consist of analyzing IC runtime operating conditions (temperature, workload, and so on) and detecting/predicting performance degradation accordingly using lookup tables [2], aging sensors [3], or electromagnetic signature [4]. Then, actions for compensating aging degradation can be taken by adaptively changing the maximum operating frequency, bias voltage [5], or by giving warnings on circuits with timing guardband violation [3]. Recently, machine learning-based aging-prediction technique has received much attention as it can efficiently predict deviceaging-induced failure with generalization capability [6], [7]. The idea is to train a model using a set of samples with both operating condition parameter values (e.g., workload and temperature) and aging indicator values (e.g., the delays of critical paths). The trained model can then be used to predict the aging indicator values, given new operating conditions.

Among the challenges faced in the machine learning-based techniques for IC-aging prediction, one issue is that the models often consider a constant operating condition over time. Although a few models consider time-related information in some operating condition parameters, such as 0/1 signal probabilities in the workload, those models ignore the time-variant characteristics of other parameters, such as temperature [6], [7]. Intuitively, training a model without accurate time-variant information of *all* operating conditions under which the IC is deployed would result in inaccurate aging failure prediction and even unexpected failure (e.g., circuits may fail earlier with higher-than-expected temperature).

To fully account for the benefit of machine learningbased aging-prediction techniques, we extended our previous work [6] by proposing a novel model that employs a generalized function which takes into account a comprehensive set of operating conditions without making any assumption on their time stationarity. In other words, our model is able to predict IC aging degradation with time-variant operating conditions. To achieve this, we show that any operating condition parameter expressed with respect to usage time can be divided into N intervals using piecewise-constant approximation [8]. Then, aging degradation prediction in the ith interval is performed using the notion of equivalent aging time [9], which denotes the stress time needed under the operating condition in the ith interval to obtain the same performance degradation at the end of the (i-1)th interval. The main contributions of this paper are summarized as follows: 1) developing a generalized



Fig. 1. Bathtub curve illustrating typical device failure characteristics.

machine learning IC aging-prediction model that takes into account any operating condition parameters and 2) developing a time-variant prediction model that predicts the IC aging degradation from time-variant operating conditions.

The rest of this paper is organized as follows. In Section II, we present the background of NBTI aging mechanism for ICs. In Section III, we discuss the prior work on IC aging prediction. In Section IV, we illustrate the framework of the proposed approach. In Section V, we show the effectiveness of the proposed approach using several ISCAS'89 benchmarks, and the conclusions are drawn in Section VI.

#### II. BACKGROUND ON NBTI AGING

During the lifetime of an IC, its performances continuously degrade due to various aging mechanisms. A typical device failure behavior curve is shown in Fig. 1. This failure curve is commonly known as the bathtub curve [10], where the failure rate is defined as the probability that a device will fail in the time interval between t and  $t + \delta t$ , given that it has survived until time t [11]. As can be observed in Fig. 1, aged devices are expected to have shorter time to failure, as compared with the new devices. Thus, predicting IC aging behavior before aging-related failures occur is of paramount importance for assuring the safety and reliability features of ICs. In this section, we provide a brief description of NBTI, which is the most common IC aging phenomena.

As one of the leading factors in the performance degradation of digital ICs, the NBTI occurs in p-type metal—oxide—semiconductor (pMOS) devices stressed with negative gate voltages at elevated temperatures. In the reaction—diffusion model, the interface traps located near the gate oxide/silicon channel boundary are pacified with a hydrogen species. The bonds of the hydrogen species can be easily broken and so diffusion is allowed. This movement of charge impacts  $V_{\rm th}$  of the transistor. For n-type metal—oxide—semiconductor transistors, the equivalent phenomenon is PBTI. For pure oxide and nitrided oxides, this has not been a dominant degradation mode, but this may change with high-k metal gate.

In practice, a pMOS transistor experiences two phases of NBTI depending on its bias condition. The first phase, i.e., the stress phase, occurs when the transistor is ON, i.e., when a negative voltage is applied to its gate. In the stress phase, positive interface traps are generated at the  $Si-SiO_2$  interface. As a result, the magnitude of the threshold voltage  $V_{\rm th}$  of the transistor is increased. In the second phase, i.e., recovery phase, a positive voltage is applied to the gate of the transistor.



Fig. 2. Percentage change in threshold voltage of a pMOS transistor over time.

In this phase, the threshold voltage drift induced by NBTI during the stress phase can partially "recover".

Threshold voltage  $V_{\rm th}$  drifts of a pMOS transistor under stress depend on the physical parameters of the transistor, supply voltage, temperature, and stress time. Fig. 2 shows the threshold voltage drift of a pMOS transistor (at an operating temperature of 80 °C using 45-nm technology with high-k dielectric) that is continuously under stress for six months as well as a transistor that is under stress and recovery every other month. As can be observed, the NBTI effect is high in the first couple of months but the threshold voltage tends to saturate for long stress times.

The degradation of  $V_{\rm th}$  often exhibits logarithmic dependence on time [12]. Bhardwaj *et al.* [13] proposed a long-term aging model for characterizing NBTI. The model provides an analytical upper bound estimation of NBTI impact over time. As shown in [9], the NBTI-induced threshold shift model can be simplified to

$$\Delta V_{\text{th}}(T, \alpha, t) = be^{-\frac{nE\alpha}{kT}} \left(\frac{\alpha}{1 - \alpha}\right)^n t^n \tag{1}$$

where T is the average temperature in Kelvin,  $\alpha$  is the average signal duty cycle, t denotes the usage time, n is the time exponent, k is the Boltzmann constant,  $E_{\alpha}=0.49$  eV, and b is a fitting constant. A primary advantage of using (1) to characterize the aging effect is that, given a reference model precharacterized at  $T_{\rm ref}$  and  $\alpha_{\rm ref}$ , the aging effect under any arbitrary T and  $\alpha$  can be efficiently calculated using parameter scaling.

### III. PRIOR WORK ON IC AGING PREDICTION

Predicting device aging behavior in a proactive manner has always been a challenge for the safety and reliability enhancement of ICs. As discussed in Section II, the aging degradation of IC performances is influenced by a variety of environmental factors, such as supply voltage, temperature, workload distribution, and stress time. One can assume a worst case scenario of these environmental factors in estimating aging-related performance degradation [14], [15]. However, the pessimistic worst case scenario may not represent an accurate account of realistic performance degradation under various operating conditions.

Various approaches have been proposed to-date to predict IC aging degradation based on its environmental factors. In [2], a lookup table-based failure prediction method was proposed by considering the random change in the system workload and supply voltages in the aging estimation. However, as the size of transistors continues to shrink, lookup table-based techniques may not be suitable for large-scale devices. In [3], circuit failure prediction was done by using aging sensors that capture the impact of IC aging based on the observation of guardband violation in timing. In addition to the extra overhead introduced by aging sensors, guardband techniques also face challenges in modern technology nodes [16]. In [4], aging effects in ICs were predicted by the electromagnetic signature which requires expensive external equipment. The fact that aging degradation is predicted/inferred during the field of operation often renders the prediction approach as "on-line," where realtime operating condition parameters are collected and used for predicting aging. Once the aging degradation is predicted by a given prediction model, then actions for compensating aging degradation can be taken by adaptively changing the maximum operating frequency [5], supply/bias voltage [17], device architecture [18], or by giving warnings on circuits with timing guardband violation [3]. Different dynamic adaptation techniques were explored in [19], including microarchitectural adaptation and dynamic voltage/frequency scaling.

Recently, machine learning-based aging-prediction techniques have received much attention as they can efficiently predict device-aging-induced failures with generalization capability [6], [7]. The idea is to train a model using a set of samples with both the operating condition parameter values (e.g., workload and temperature) and the aging indicator values (e.g., the delays of critical paths in a digital circuit). The training samples are often obtained from aging simulation in which a degradation model, such as (1), is applied to the transistor threshold value  $V_{th}$  according to various applied operating conditions. Once the aging-prediction model is trained, it can then be used to predict aging degradation, given new operating conditions. The previously proposed machine learning models often assume static operating conditions, i.e., the condition parameters, such as temperature, under which the IC is operated are assumed to be constant between time 0 and time t when the prediction is performed. Training a model without accurate time-variant information of all operating conditions under which the IC is deployed would result in inaccurate aging prediction and even unexpected failures (e.g., circuits may fail earlier with higher-than-expected temperature).

In this paper, we propose a novel model that employs a generalized function which takes into account a comprehensive set of time-variant operating conditions. The proposed method will be discussed in Section IV.

# IV. PROPOSED APPROACH

In this section, we will describe the details of our proposed model. We will present an overview of our model and also show how the model is calibrated for compensating process variation effect and how the equivalent time technique is used to predict aging degradation under time-variant operating conditions.

### A. Overview of the General MARS Prediction Model

Let  $O = [o_1, \ldots, o_n]$  denote a set of operating conditions (workload distribution, temperature, and so on) under which the circuit operates, where n is the total number of considered operating conditions. We show that for a given device, we can employ a supervised learning scheme to learn a function  $f_j$  that maps the operating condition vector O to the jth IC aging indicator (e.g., path delay value)  $d_j \colon f_j \colon O \mapsto d_j$ ,  $j = 1, \ldots, M$ , where M is the total number of considered IC performances used as aging indicators. The top part of Fig. 3 shows the training of function  $f_j$ . In this paper, we use a multivariate adaptive regression splines (MARS) model [20] to learn the function  $f_j$ 

$$d_j = f_j(O, t) = a_0 + \sum_{i=1}^{M} a_i \cdot B_i(O, t)$$
 (2)

where  $a_0$  is the intercept,  $a_i$  denotes the slope parameter, t represents the usage time under O, and  $B_i(O, t)$  denotes the ith basis function which can take the form of a hinge function or an interaction product of different hinge functions. In addition to the easiness of implementation and straightforward learning phase, one big advantage of the MARS model compared with other models is its ability to provide interpretable coefficients in its basis functions that quantitatively describe the contribution of each input variable and their interactions to the output variable. These interpretable coefficients can assist the process/test engineer in understanding and moderating the source of aging-related performance degradation. Moreover, the MARS model can accurately handle both continuous and categorical data [21], which provides a higher level of flexibility especially for digital ICs with discrete binary signal values. Using a set of L training samples with  $[d_i, O, t]_p$ ,  $p = 1, \dots, L$ , the MARS builds the regression in two phases: the forward and the backward pass. In the forward pass, the MARS starts with an empty model and then repeatedly adds basis functions to the model by minimizing the sum-of-squares residual error. The basis functions are added by searching over all possible combinations of variables of hinge functions until convergence is reached (e.g., the sum-of-squares residual error becomes smaller than a predefined threshold value or the maximum number of terms is reached). The search at each step can be done using the brute force method or the heuristic approach to speed up the searching process [22]. Then, during the backward pass, the model is pruned by removing the basis functions with the smallest increase in generalized crossvalidation error. The goal of this pass is to remove the least effective basis functions to avoid the overfit problem.

The samples used to learn  $f_j(O, t)$  can be obtained from the circuit aging simulation by sampling the input space of [O, t]. To generate a representative set of samples, we employ a Latin hypercube sampling (LHS) method to partition the sampling space into different equally probable regions.

# B. Model Calibration

Note that in the case of process variations which result in the deviation of the prediction model  $f_i$ , as shown in the



Fig. 3. Overview of the proposed approach.

middle two blocks in Fig. 3, we can employ a calibration technique to compensate the effect of process variations on previously learned  $f_i$ . The idea is to calculate the relative circuit performance deviation from the nominal value with respect to its total possible corner variation range at time  $t_0$  and to calibrate its predicted performance using the relative performance variation [6]. Our calibration technique consists of three steps as given in the following:

- performing the corner simulation to obtain the best/worst cases of circuit performances;
- 2) computation of the compensation factor of a new device under a nominal operating condition at time  $t = t_0$ ;
- 3) process variation calibration using the compensation factor computed in step 2.

Specifically, we propose to learn the model  $f_j$  ( $j=1,\ldots,M$ ) described previously for the best/worst corner cases during corner simulations. The best/worst models are denoted by  $f_{j,\min}/f_{j,\max}$ , which are learned from best/worst corner circuit instances sampled at corner operating conditions  $O_{\min}/O_{\max}$ .

The second step of the calibration process is to compute the compensation factor for a manufactured circuit at time  $t=t_0$ . For any new fabricated circuit, we first need to measure its performance values  $d_{j,n}$  at a nominal operating condition denoted by  $O_{\text{nom}}$ . The jth performance value at  $t_0$  is denoted by  $d_j$ . Then, using the corner models  $f_{j,\text{min}}$  and  $f_{j,\text{max}}$  learned from step 1, we can predict the jth best and worst performance values at  $t_0$ :  $d_{j,\text{min}} = f_{j,\text{min}}(O_{\text{min}}, t_0)$  and  $d_{j,\text{max}} = f_{j,\text{max}}(O_{\text{max}}, t_0)$ ,  $j = 1, \ldots, M$ . Based on the above-mentioned definitions, we define the process variation compensation scaling factor  $s_j$  as follows:

$$s_j = (d_{j,n} - d_{j,\min})/(d_{j,\max} - d_{j,\min}).$$
 (3)

The process variation compensation scaling factor  $s_j$  defined above allows us to calibrate the circuit and compensate the impact of process variations on aging prediction at time  $t = t_0$ . Once  $s_j$  is computed, we can use it to calibrate new aging degradation prediction under any operating condition O and

usage time t

$$\hat{d}_j = d_{jo,\min} + (d_{jo,\max} - d_{jo,\min}) \times s_j \tag{4}$$

where  $d_{jo,\text{min}}$  and  $d_{jo,\text{max}}$  denote the best and worst performance values predicted under the new operating condition O and usage time t.

# C. Model Prediction Under Time-Variant Operating Conditions

Once the basic prediction model is learned and calibrated for process variations as shown in Fig. 3, we can use it to predict aging degradation under time-variant operating conditions. To this end, we propose to approximate a continuous time-variant operating condition vector O(t) using piecewise-constant approximation. Specifically, let  $O(t) = [o_1(t), \ldots, o_n(t)]$  denote the operating condition vector expressed as a function of usage time, where each parameter  $o_i$  is expressed by its own function  $o_i(t)$ , and n denotes the total number of operating condition parameters (e.g., workload distribution, temperature, bias voltage, and so on). Using a piecewise-constant approximation derived from the Riemann sum, we can approximate O(t) as [8]

$$O(t) \approx \tilde{O}(t) = [O_1(t_1^*), O_2(t_2^*), \cdots, O_N(t_N^*)]$$
 (5)

where  $O_i(t_i^*)$  denotes the constant approximation of the function O(t) in the *i*th time interval, i.e., all operating condition parameters are considered constant and denoted by  $O_i$  during the *i*th time interval:  $O_i(t_i^*) = [o_1(t_i^*), \ldots, o_n(t_i^*)]$ . The value of  $O_i(t_i^*)$  can be determined by the left rule which takes the value of O(t) at the left endpoint  $t_i^*$  in the *i*th interval. The total number of intervals N can be determined by a threshold value  $\epsilon$  such that N is minimized while the integral squared error between O(t) and  $\tilde{O}(t)$  is less than  $\epsilon$ 

$$\min N$$
s. t. 
$$\int |O(t) - \tilde{O}(t)|^2 < \epsilon.$$
 (6)

Once the approximated operating condition vector O(t) is obtained, for aging degradation prediction under time-variant operation conditions, we use the notion of equivalent aging time  $t_{eqv}$  [9], [23], which denotes the stress time needed under the operating condition  $O_{i+1}$  to obtain the same performance degradation at the end of the *i*th interval under  $O_i$ . Fig. 4 shows the concept of equivalent aging time for two sets of operating condition vectors, namely  $O_i$  and  $O_{i+1}$ . The aging degradation for one performance measurement under  $O_i$  and  $O_{i+1}$  is shown by the blue and red curves, respectively. It can be observed that the operating condition  $O_{i+1}$  resulted in a higher degradation rate as the red curve degrades at a higher rate as compared with the blue curve. Suppose that we apply a time-variant operating condition with  $O_i$  applied during the interval  $[t_0, t_2]$  and  $O_{i+1}$  applied after the time point  $t_2$ , as can be observed in Fig. 4, the degradation level at  $t_2$  under  $O_i$ is equivalent to the level at  $t_1$  if  $O_{i+1}$  was applied from  $t_0$ . Thus, a change of the operating condition to  $O_{i+1}$  at  $t_2$  will result in an equivalent degradation curve, as shown by the blue dotted curve in Fig. 4. As a result, the prediction of aging



Fig. 4. Illustration of equivalent aging time.

degradation after  $t_2$  is equivalent to the aging prediction after  $t_1$  by assuming that  $O_{i+1}$  was applied during  $[t_0, t_1]$  and the same  $O_{i+1}$  will be applied after  $t_1$ .

Specifically, let  $d_{j,i} = f_j(O_i, t_{i+1}^*)$  denote the predicted aging degradation for the jth aging performance indicator at the end of the ith interval. Then, the equivalent aging time to obtain the same  $d_{j,i}$  under  $O_{i+1}$  is computed as  $t_{i+1,\text{equ}} = g(d_{j,i}, O_{i+1})$ , where g denotes the function that computes  $t_{i+1,\text{equ}}$  from  $d_{j,i}$  and  $O_{i+1}$ :  $g(d_{j,i}, O_{i+1}) = \underset{i}{\operatorname{argmin}} |d_{j,i} - f_j(O_{i+1}, t)|$ . Finally, the aging prediction at the end of the (i+1)th interval  $t_{\text{end}}$  is performed as:  $d_{j,i+1} = f_j(O_{i+1}, (t_{i+1,\text{equ}} + (t_{\text{end}} - t_{i+1}^*)))$ . The detailed steps for timevariant aging prediction are summarized in Algorithm 1.

# Algorithm 1 Time-Variant Aging Prediction

- 1: procedure Time\_Variant\_Prediction
- 2: Train the function  $d_j = f_j(O, t)$  using simulation samples, calibrate the model for process variations
- 3: Select the total number of intervals N
- 4: Set inputs  $\tilde{O}(t) = [O_1(t_1^*), O_2(t_2^*), \cdots, O_N(t_N^*)]$
- 5: Set  $i = 1, j = 1, t_{i,equ} = 0, t_{N+1}^* = t_{end}$
- 6: Select desired prediction time t in the ith time interval
- 7: Compute equivalent prediction time  $t_p = t_{i,equ} + (t_{i+1}^* t_i^*)$
- 8: Aging prediction of the *j*th performance at the end of the *i*th interval:  $d_{i,i} = f_i(O_i, t_p)$
- 9: **If** i < N
- 10: Equivalent aging time computation:  $t_{i+1,equ} = g(d_{i,i}, O_{i+1})$
- 11: **End**
- 12: i = i + 1
- 13: While i < N, repeat steps 6-12
- 14: j = j + 1
- 15: While j < M, repeat steps 2-14 end procedure

The time-variant aging-prediction scheme outlined in Algorithm 1 allows us to estimate aging-related circuit performance degradation under arbitrary time-variant operating conditions, which provides a higher level of flexibility in predicting aging failures. As will be shown in Section V, our proposed scheme outperforms the existing approaches when the time-variant operating conditions are considered.

#### V. EXPERIMENTAL RESULTS

In this section, we demonstrate the effectiveness of the proposed approach by using five different ISCAS'89 benchmarks: s510, s1494, s5378, s9234, and s15850. We used the Synopsys design compiler and PrimeTime for logic synthesis and extraction of time-critical paths at the 45-nm technology using the open-source Nangate library [24]. We consider the delays of timing critical paths in each of the benchmarks as our aging indicators. These are the paths whose delay, if degraded by 20% during the course of aging, would possibly cause circuit failure. We used HSPICE MOSRA to conduct aging simulations and we consider the effect of NBTI/PBTI and HCI aging. The number of considered critical paths in each benchmark is shown in the parentheses in the first column of Table I. The operating condition vector considered in this paper is:  $O = [\alpha, T]$ , where  $\alpha$  denotes the workload distribution parameter which is the average percentage value X% of primary inputs getting the value of "1" in each clock cycle: X% = 1%, 25%, 50%, 75%, 99%,and T denotes the operating temperature value. We ran aging simulations using HSPICE MOSRA to evaluate the aging degradation for a period of eight years with a time step of two months. In addition, we ran Monte Carlo (MC) simulations for each benchmark using HSpice MC by considering the following process-variation parameters with a Gaussian distribution: transistor gate length L:  $3\sigma = 10\%$ ; threshold voltage  $V_{\text{th}}$ :  $3\sigma = 30\%$ , and gate-oxide thickness  $t_{OX}$ :  $3\sigma = 3\%$ .

For each considered benchmark, the basic aging-prediction models  $f_j(O,t)$ , as shown in (2), are learned from a generated sample set of 2000 devices which are obtained by sampling the input space [O,t] using the LHS method and subsequently performing aging simulation to obtain  $d_j$ . The sampling ranges in the input space are:  $\alpha = [0,1]$ , T = [25,75], and t = [0,8yrs] with T expressed in degree Celsius. We normalize each data parameter in the range of [0,1] for model learning purpose, and we randomly split the 2000 samples into equal training and validation sets to build the prediction model in (2). In this paper, we use the root mean square error (RMSE) as the metric for evaluating the prediction accuracy, which provides a robust indicator of the accuracy of predicted delay values. For the basic model  $f_j(O,t)$ , the RMSE value averaged from all considered critical paths in each benchmark is below 2%.

Fig. 5(a) shows a typical example of path delay aging degradation for one critical path randomly chosen from benchmark s5378 for a period from zero to eight years under constant workload distribution and temperature values:  $\alpha = 50\%$  and T = 25 °C. The delay values are normalized between [0, 1]. It can be observed that the delay degradation follows an exponential pattern, which is consistent with the NBTI-induced transistor threshold shift model in (1). Fig. 5(b) shows the prediction scatter plot of this path using the proposed learned MARS model with actual values shown in the x-axis and predicted values shown in the y-axis. It can be observed that the proposed model can accurately predict aging degradation with scatter points closely following the 45° line. Note that in the case of process variations that would lead to deviation in  $f_i(O,t)$ , as discussed in Section IV, we employ a calibration technique to compensate process variation-related model deviations [6].



Fig. 5. Example of a path delay prediction under constant operating condition:  $\alpha=50\%$  and T=25 °C from benchmark s5378. (a) Normalized aging degradation plotted as a function of usage time. (b) Prediction scatter plot using the proposed model.



Fig. 6. Aging-prediction plot for a path delay in s5378 under Scenario 1. (a) Normalized aging degradation. (b) Prediction plot for this path.

Once the basic models  $f_j$  are learned, validated, and calibrated for process variations, we use them to predict aging degradation under time-variant operating conditions. In this paper, we generate three scenarios for validating our approach: 1) two temperature values: 25 °C and 50 °C applied at the time intervals [0, 4 yrs] and [4, 8 yrs], respectively, during the eight years of simulated aging period; 2) three temperature values: 25 °C, 50 °C, and 75 °C applied at the time intervals [0, 2 yrs], [2, 6 yrs], and [6, 8 yrs], respectively, during the simulated eight years; and 3) four temperature values: 25 °C, 10 °C, 75 °C, and 50 °C applied at the time intervals [0, 2 yrs], [2, 4 yrs], [4, 6 yrs], and [6, 8 yrs], respectively, during the simulated eight years. The experimental results for the three considered scenarios are shown in the following.

#### A. Scenario 1

In this scenario, we determine the number of time intervals as N=2 using the procedure outlined in (6). Thus, the equivalent aging time is computed once according to Algorithm 1. Each benchmark is simulated with five different  $\alpha$  values  $\alpha=1\%, 25\%, 50\%, 75\%$ , and 99% under the same timevariant temperature profile for the eight years: 25 °C and 50 °C applied in the time intervals [0, 4 yrs] and [4, 8 yrs], respectively. Fig. 6(a) shows the normalized aging degradation for a path delay in s5378 under the operating conditions applied in Scenario 1. It can be observed that there is a sharp increase in the path delay at the time point in the beginning of the fourth year, which is the time when a change of temperature from 25 °C to 50 °C occurred. This sharp temperature increase, according to (1), would result in a

TABLE I AGING-PREDICTION RESULTS UNDER TIME-VARIANT OPERATION CONDITION IN SCENARIO 1

| Benchmark             | RMSE           | RMSE    | RMSE  |
|-----------------------|----------------|---------|-------|
| (# of critical paths) | Proposed model | SVM [7] | RNN   |
| s510 (21)             | 1.35%          | 2.49%   | 2.78% |
| s1494 (57)            | 1.29%          | 2.31%   | 2.63% |
| s5378 (392)           | 1.45%          | 2.61%   | 2.91% |
| s9234 (179)           | 1.42%          | 2.64%   | 2.85% |
| s15850 (180)          | 1.53%          | 2.67%   | 2.92% |



Fig. 7. Prediction scatter plots for the same path delay shown in Fig. 6(b) using (a) SVM regression and (b) RNN models.

sharp aging degradation in addition to the normal exponential degradation pattern, as can be observed in Fig. 6(a).

Using the procedure outlined in Algorithm 1, we predict the aging degradation at a time step of two months for the eight years for each considered critical path in each benchmark. The second column in Table I shows the averaged RMSE from all critical paths and all  $\alpha$  values for each of the benchmarks. It can be observed that the proposed approach can accurately predict aging degradation even under time-variant operating conditions with average RMSE below 2% for all benchmarks. Fig. 6(b) shows the aging-prediction plot for the same path from Fig. 6(a). It can be observed that the proposed model can accurately predict aging degradation under time-variant operating conditions applied in Scenario 1. The two separated groups of scatter points observed in Fig. 6(b) correspond to the degraded path delay values at time intervals [0, 4 yrs] and [4, 8 yrs] with 25 °C and 50 °C applied temperatures, respectively.

To illustrate the advantages of the proposed model over the existing models, we also applied two other techniques for predicting the same path delay values under the operating conditions applied in Scenario 1 without computing the equivalent aging time: 1) the support vector machine (SVM) regression model proposed in [7] and 2) the fully recurrent neural network (RNN) model which is very efficient in capturing dynamic behavior for a time sequence [25]. The third column in Table I shows the same averaged RMSE using the SVM regression model in [7], and the fourth column shows the RMSE obtained using the RNN model. It can be observed that the proposed approach resulted in lower prediction errors, roughly at 1%, as compared with both the SVM and RNN models. To gain some insights on the aging prediction using different models, Fig. 7 shows the prediction scatter plots for the same path delay shown in Fig. 6(a) using the SVM regression and RNN models, respectively. It can be



Fig. 8. Aging-prediction plot for a path delay in s5378 under Scenario 2. (a) Normalized aging degradation. (b) Prediction plot for this path.

observed that both the SVM regression and RNN models can efficiently predict the lower left group of delay values, which correspond to the degradation values between zero and four years applied under the temperature value 25 °C as shown in the first half of the time interval in Fig. 6(a). However, both models fail to accurately predict the path delays in the period of [4, 8 yrs] with 50 °C as the applied temperature value, as can be confirmed by the scatter points on the upper right part of the plots in Fig. 7(a) and (b) which are not well aligned with the 45° line. This observation confirms the superiority of the proposed model and the need for computing equivalent aging time during prediction.

#### B. Scenario 2

In this scenario, we determine the number of time intervals as N=3 using the procedure outlined in (6). Thus, the equivalent aging time is computed twice according to Algorithm 1. Each benchmark is simulated with five different  $\alpha$  values  $\alpha=1\%,25\%,50\%,75\%$ , and 99% under the same timevariant temperature profile for the eight years: 25 °C, 50 °C, and 75 °C applied in the time intervals [0, 2 yrs], [2, 6 yrs], and [6, 8 yrs], respectively. Fig. 8(a) shows the normalized aging degradation for a path delay in s5378 under the operating conditions applied in Scenario 2. Two sharp increases in the path delay can be observed at the beginning of the second and sixth years when a sharp temperature increase was applied. There two delay increases correspond to the temperature changes from 25 °C to 50 °C and from 50 °C to 75 °C applied at the second and sixth years, respectively.

We predict the aging degradation at a time step of two months for the eight years for all considered critical paths in each benchmark using the proposed model as before. The second column in Table II shows the averaged RMSE from all the critical paths and all  $\alpha$  values for each of the benchmarks. It can be observed that the proposed model consistently predicts aging degradation with high accuracy with average RMSE below 2% for all benchmarks. Fig. 8(b) shows the aging-prediction plot for the same path from Fig. 8(a). It can be observed that the proposed model can accurately predict aging degradation under time-variant operating conditions applied in Scenario 2. The three separated groups of scatter points observed in Fig. 8(b) correspond to the degraded path delay values at time intervals [0, 2 yrs], [2, 6 yrs], and [6, 8 yrs] with 25 °C, 50 °C, and 75 °C applied temperatures, respectively.

TABLE II AGING-PREDICTION RESULTS UNDER TIME-VARIANT OPERATION CONDITION IN SCENARIO 2

| Benchmark             | RMSE           | RMSE    | RMSE  |
|-----------------------|----------------|---------|-------|
| (# of critical paths) | Proposed model | SVM [7] | RNN   |
| s510 (21)             | 1.29%          | 3.9%    | 4.8%  |
| s1494 (57)            | 1.25%          | 3.85%   | 4.63% |
| s5378 (392)           | 1.39%          | 3.98%   | 4.86% |
| s9234 (179)           | 1.43%          | 5.26%   | 5.46% |
| s15850 (180)          | 1.55%          | 5.3%    | 5.54% |



Fig. 9. Prediction scatter plots for the same path delay shown in Fig. 8(a) using (a) SVM regression and (b) RNN models.

The third and fourth columns in Table II show the average RMSE values obtained using the SVM regression and RNN models, respectively. It can be observed that the proposed model consistently outperforms the state-of-the-art models with approximately 2% less in RMSE values. Fig. 9 shows the prediction scatter plots for the same path delay shown in Fig. 8(a) using the SVM regression and RNN models, respectively. It can be observed that both the SVM regression and RNN models can efficiently predict the lower left group of delay values, which correspond to the degradation values between zero and two years applied under the temperature value 25 °C as shown in the delay values between zero and two years in Fig. 8(a). However, both the models fail to accurately predict the path delay values in the period of [2, 6 yrs] and [6, 8 yrs] with 50 °C and 75 °C as the applied temperature values.

# C. Scenario 3

In this scenario, we determine the number of time intervals as N = 4 using the procedure outlined in (6). Thus, the equivalent aging time is computed three times according to Algorithm 1. Each benchmark is simulated with five different  $\alpha$  values  $\alpha = 1\%, 25\%, 50\%, 75\%, and 99\% under the$ same time-variant temperature profile for the eight years: 25 °C, 10 °C, 75 °C, and 50 °C applied in the time intervals [0, 2 yrs], [2, 4 yrs], [4, 6 yrs], and [6, 8 yrs], respectively. Fig. 10(a) shows the normalized aging degradation for a path delay in s5378 under the operating conditions applied in Scenario 3. It can be observed in Fig. 10(a) that only one sharp increase in the path delay at the fourth year can be identified. The reason for this observation is that the only sharp temperature increase in this scenario was applied in the fourth year (from 10 °C to 75 °C), while the temperature value changes between all other consecutive time intervals were a temperature decrease (from 25 °C to 10 °C at the second year



Fig. 10. Aging-prediction plot for a path delay in s5378 under Scenario 3. (a) Normalized aging degradation. (b) Prediction plot for this path.

TABLE III
AGING-PREDICTION RESULTS UNDER TIME-VARIANT
OPERATION CONDITION IN SCENARIO 3

| Benchmark             | RMSE           | RMSE    | RMSE  |
|-----------------------|----------------|---------|-------|
| (# of critical paths) | Proposed model | SVM [7] | RNN   |
| s510 (21)             | 1.18%          | 4.56%   | 4.51% |
| s1494 (57)            | 1.21%          | 4.49%   | 4.43% |
| s5378 (392)           | 1.15%          | 4.62%   | 4.55% |
| s9234 (179)           | 1.26%          | 4.89%   | 4.86% |
| s15850 (180)          | 1.31%          | 4.86%   | 4.72% |



Fig. 11. Prediction scatter plots for the same path delay shown in Fig. 10(a) using (a) SVM regression and (b) RNN models.

and from 75 °C to 50 °C at the sixth year). As we discussed in the equivalent aging time analysis in Section IV-C, a decrease in the temperature value would result in a lower aging degradation rate, which explains the relatively flat degradation patterns in the time intervals [2, 4 yrs] and [6, 8 yrs] in Fig. 10(a).

We predict the aging degradation in each benchmark using the proposed model as before. The second column in Table III shows the averaged prediction RMSE from all critical paths and all  $\alpha$  values for each of the benchmarks. It can be observed that the proposed model consistently predicts aging degradation with a high accuracy with an average RMSE below 2% for all benchmarks. Fig. 10(b) shows the aging-prediction plot for the same path from Fig. 10(a). It can be observed that the proposed model can accurately predict the aging degradation under time-variant operating conditions applied in Scenario 3.

The third and fourth columns in Table III show the average RMSE values obtained using the SVM regression and RNN models, respectively. It can be observed that the proposed model consistently outperforms the state-of-the-art models with approximately 3% less in RMSE values. Fig. 11 depicts

the prediction scatter plots for the same path delay shown in Fig. 10(a) using the SVM regression and RNN models, respectively. It can be observed that the SVM regression and RNN models fail to accurately predict the path delay values under the operating conditions applied in Scenario 3, which justifies the use of the proposed model for aging degradation prediction.

# VI. CONCLUSION

In this paper, we proposed a general purpose model for predicting IC aging degradation during the runtime of the IC based on machine learning and equivalent aging time. We extended the existing prediction scheme to a comprehensive model which takes into account arbitrary time-variant dynamic operating conditions relevant to aging prediction. The proposed model can be readily implemented offline and online with a few operating condition sensing circuitry (e.g., temperature sensors). The experimental results showed that our approach outperforms existing methods in terms of aging-prediction accuracy under different scenarios of time-variant operating conditions.

#### REFERENCES

- M. Baybutt et al., "Improving digital system diagnostics through prognostic and health management (PHM) technology," IEEE Trans. Instrum. Meas., vol. 58, no. 2, pp. 255–262, Feb. 2009.
- [2] Z. Yang, P. Sun, Y. Yu, H. Zhang, G. Gao, and X. Peng, "Workload-aware failure prediction method for VLSI devices using an LUT based approach," in *Proc. IEEE Int. Instrum. Meas. Technol. Conf.*, May 2018, pp. 1–6.
- [3] M. Agarwal, B. C. Paul, M. Zhang, and S. Mitra, "Circuit failure prediction and its application to transistor aging," in *Proc. IEEE VLSI Test Symp.*, May 2007, pp. 277–286.
- [4] S. Shinde, S. Jothibasu, M. T. Ghasr, and R. Zoughi, "Wideband microwave reflectometry for rapid detection of dissimilar and aged ICs," *IEEE Trans. Instrum. Meas.*, vol. 66, no. 8, pp. 2156–2165, Aug. 2017.
- [5] A. Tiwari and J. Torrellas, "Facelift: Hiding and slowing down aging in multicores," in *Proc. 41st Annu. IEEE/ACM Int. Symp. Microarchitecture*, Nov. 2008, pp. 129–140.
- [6] N. Karimi and K. Huang, "Prognosis of NBTI aging using a machine learning scheme," in *Proc. Int. Symp. Defect Fault Tolerance VLSI Nanotechnol. Syst.*, 2016, pp. 7–10.
- [7] A. Vijayan, A. Koneru, S. Kiamehr, K. Chakrabarty, and M. B. Tahoori, "Fine-grained aging-induced delay prediction based on the monitoring of run-time stress," *IEEE Trans. Comput.-Aided Design Integr. Circuits* Syst., vol. 37, no. 5, pp. 1064–1075, May 2018.
- [8] P. L. Roe, "Approximate Riemann solvers, parameter vectors, and difference schemes," *J. Comput. Phys.*, vol. 43, no. 2, pp. 357–372, 1981.
- [9] Y. Lu, L. Shang, H. Zhou, H. Zhu, F. Yang, and X. Zeng, "Statistical reliability analysis under process variation and aging effects," in *Proc.* 46th ACM/IEEE Design Automat. Conf., Jul. 2009, pp. 514–519.
- [10] D. Pantic, "Benefits of integrated-circuit burn-in to obtain high reliability parts," *IEEE Trans. Rel.*, vol. R-35, no. 1, pp. 3–6, Apr. 1986.
- [11] J. M. Carulli and T. J. Anderson, "Test connections—Tying application to process," in *Proc. IEEE Int. Conf. Test*, Nov. 2005, p. 8 and p. 686.
- [12] A. T. Krishnan et al., "Material dependence of hydrogen diffusion: Implications for NBTI degradation," in *IEEE Int. Electron Devices Meeting*, Tech. Dig., Dec. 2005, p. 4 and p. 691.
- [13] S. Bhardwaj, W. Wang, R. Vattikonda, Y. Cao, and S. Vrudhula, "Predictive modeling of the NBTI effect for reliable design," in *Proc. IEEE Custom Integr. Circuits Conf. (CICC)*, Sep. 2006, pp. 189–192.
- [14] K. Kang, H. Kufluoglu, M. A. Alam, and K. Roy, "Efficient transistor-level sizing technique under temporal performance degradation due to NBTI," in *Proc. IEEE Int. Conf. Comput. Design*, Oct. 2006, pp. 216–221.

- [15] B. C. Paul, K. Kang, H. Kufluoglu, M. A. Alam, and K. Roy, "Temporal performance degradation under NBTI: Estimation and design for improved reliability of nanoscale circuits," in Proc. Design Automat. Test Eur. Conf., Mar. 2006, pp. 1-6.
- [16] J. W. McPherson, "Reliability challenges for 45 nm and beyond," in Proc. 43rd ACM/IEEE Design Automat. Conf., Jul. 2006, pp. 176–181.
- [17] E. Karl, D. Blaauw, D. Sylvester, and T. Mudge, "Multi-mechanism reliability modeling and management in dynamic systems," IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 16, no. 4, pp. 476-487,
- [18] J. Srinivasan, S. V. Adve, P. Bose, and J. A. Rivers, "Exploiting structural duplication for lifetime reliability enhancement," in Proc. 32nd Annu. Int. Symp. Comput. Archit., Jun. 2005, pp. 520-531.
- [19] J. Srinivasan, S. V. Adve, P. Bose, and J. A. Rivers, "The case for lifetime reliability-aware microprocessors," in Proc. 31st Annu. Int. Symp. Comput. Archit., Jun. 2004, pp. 276-287.
- [20] J. H. Friedman, "Multivariate adaptive regression splines," Ann. Statist., vol. 19, no. 1, pp. 1-67, Mar. 1991.
- [21] J. H. Friedman, "Estimating functions of mixed ordinal and categorical variables using adaptive splines," Dept. Comput. Statist., Stanford Univ., Stanford, CA, USA, Tech. Rep. 108, 1991, pp. 1–42. [22] J. Friedman, "Fast MARS," Dept. Statist., Stanford Univ., Stanford, CA,
- USA, Tech. Rep. 110, 1993, pp. 1-17.
- [23] K. Huang, Y. Liu, N. Korolija, J. M. Carulli, and Y. Makris, "Recycled IC detection based on statistical methods," IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 34, no. 6, pp. 947-960, Jun. 2015.
- [24] Nangate 45 nm Open Cell Library. [Online]. Available: http://www.nangate.com
- [25] R. J. Williams and D. Zipser, "A learning algorithm for continually running fully recurrent neural networks," Neural Comput., vol. 1, no. 2, pp. 270-280, 1989.



Ke Huang (S'10-M'12) received the B.S., M.S., and Ph.D. degrees in electrical engineering from the Université Grenoble Alpes, Grenoble, France, in 2006, 2008, and 2011, respectively.

He was a Post-Doctoral Research Associate with the University of Texas at Dallas, Richardson, TX, USA, from 2012 to 2014. In 2014, he joined the Department of Electrical and Computer Engineering, San Diego State University, San Diego, CA, USA, where he is currently an Assistant Professor. He has published over 30 journals and conference papers.

His current research interests include machine learning applications for verylarge-scale integration (VLSI) testing, reliability and security, computer-aided design of integrated circuits, and intelligent vehicles.

Dr. Huang served as a program committee member for a number of IEEE conferences. He was a recipient of the Best Paper Award from the 2013 Design Automation and Test in Europe Conference and the 2015 IEEE VLSI Test Symposium. He served as a Guest Editor for the Journal of Electronic Testing: Theory and Applications (Springer).



Xinqiao Zhang received the B.S. degree in electrical and electronics engineering from Northeastern University, Shenyang, China, in 2017. He is currently pursuing the Master's Degree in Electrical Engineering with the Department of Electrical and Computer Engineering, San Diego State University, San Diego, CA. USA.

His current research interests include machine learning applications in very-large-scale integration



Naghmeh Karimi (M'05) received the B.Sc., M.Sc., and Ph.D. degrees in computer engineering from the University of Tehran, Tehran, Iran, in 1997, 2002, and 2010, respectively.

She was a Visiting Researcher with Yale University, New Haven, CT, USA, from 2007 to 2009, and a Post-Doctoral Researcher with Duke University, Durham, NC, USA, from 2011 to 2012. She has been a Visiting Assistant Professor with New York University, New York, NY, USA, and Rutgers University, New Brunswick, NJ, USA, from

2012 to 2016. She joined the University of Maryland Baltimore County, Baltimore, MD, USA, as an Assistant Professor in 2017, where she currently leads the SECure, REliable and Trusted Systems (SECRETS) research laboratory. She has published three book chapters and has authored/ co-authored more than 40 papers in referred conference proceedings and journal manuscripts. Her current research interests include hardware security, Very-Large-Scale Integration (VLSI) testing, design for trust, design for testability, and design for reliability.