CN116401719A

CN116401719A - Method for positioning hardware Trojan horse in gate-level netlist based on machine learning

Info

Publication number: CN116401719A
Application number: CN202310395996.XA
Authority: CN
Inventors: 王泉; 黄钊; 周丽榕; 谢昌健; 李泽宇; 王骏君; 刘锦辉; 樊璐; 刘潇; 万波; 李少峰; 吴自力; 田玉敏
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2023-04-13
Filing date: 2023-04-13
Publication date: 2023-07-07

Abstract

The invention discloses a method for detecting and positioning a hardware Trojan in a gate-level netlist based on machine learning, which mainly solves the problems that in the prior art, the hardware Trojan positioning accuracy and efficiency are low, and an ideal model is required to be used as a reference. The implementation scheme is as follows: dividing an integrated circuit in a sample into a plurality of maximum output sub-modules, extracting characteristic vectors of the integrated circuit and constructing a data set; training the existing machine learning model by using a cross-validation method to obtain a classifier; the method comprises the steps of utilizing a classifier to detect Trojan horse on an integrated circuit to be detected; trojan horse locating is carried out on the detected maximum output sub-module containing the hardware Trojan horse by the Trojan horse searching method based on layer-by-layer difference analysis. The invention takes the maximum output sub-module as a unit to carry out machine learning, thereby obviously improving the performance of the classifier and the detection accuracy of Trojan horse; the positioning accuracy and efficiency of the Trojan horse circuit in the gate-level netlist are improved through comparing and analyzing the maximum output submodule, and the method can be used for hardware Trojan horse protection in the design of the gate-level netlist of the integrated circuit.

Description

Method for positioning hardware Trojan horse in gate-level netlist based on machine learning

Technical Field

The invention belongs to the technical field of integrated circuits, and particularly relates to a method for detecting and positioning a hardware Trojan in a gate-level netlist, which can be used for protecting the hardware Trojan in the design stage of the gate-level netlist of an integrated circuit.

Background

A hardware Trojan is a malicious circuit that can be implanted at any stage in the design and manufacturing process of an integrated circuit, and its practical application has affected some key fields such as mobile communication, medical treatment, aerospace, civil infrastructure, and so on, so as to be national safe. Currently, protection measures for hardware Trojan mainly focus on nondestructive detection, and the essential idea is to use the change of certain characteristics after the hardware Trojan is implanted to determine whether a certain integrated circuit is implanted into the hardware Trojan. Nondestructive hardware Trojan detection can be classified into dynamic detection and static detection according to the stage of the selected integrated circuit characteristic.

Dynamic detection determines whether to implant a hardware Trojan by observing characteristics of the operation stage of the integrated circuit, such as by-pass parameters of power consumption and path delay. Under the influence of a hardware Trojan, certain characteristic changes of the operation stage of the integrated circuit are obvious and easy to observe, so that the dynamic detection can generally obtain higher detection accuracy. However, the dynamic detection method generally needs to carefully select the value set of the test vector, which is difficult to implement and takes a long time when the input pins of the integrated circuit are more.

Static detection determines whether a hardware trojan is implanted by extracting features of the design phase of the integrated circuit, such as fan-in, number of ring structures. The static detection does not need to design a perfect integrated circuit, so that hardware Trojan detection in stages and modules is facilitated. Furthermore, static detection does not require actual operation of the integrated circuit, nor does it naturally require test vectors. However, most of the features of an integrated circuit at the design stage generally need to take time to extract in direct correlation with the circuit scale, and it is difficult to ensure correlation of these features with hardware Trojan, resulting in lower detection accuracy.

The patent document with the publication number of CN 110287735A discloses a Trojan horse infection circuit identification method based on chip netlist features, which comprises the steps of extracting a node SCOAP metric value, detecting a suspicious node set by using a k-means++ clustering network, correcting the suspicious node set by combining the topological structure of a chip netlist, and recovering Trojan horse trigger nodes through node reachable analysis. The method is lack of consideration of mixing rare nodes and common nodes as Trojan trigger nodes, so that Trojan with a large number of common nodes is easy to miss, the time consumption is too long, and the accuracy is low.

The patent document with publication number of CN 114065308A discloses a door-level hardware Trojan horse positioning method and system based on deep learning, which comprises the steps of extracting door-level netlist information, constructing a characteristic path set and detecting and positioning by using textCNN. The method has the advantages that the construction time of the path characteristics is longer, and the distinction between the path characteristics and the common path characteristics is not obvious, so the Trojan horse detection accuracy is lower.

The patent document with publication number CN 109740348A discloses a hardware Trojan horse positioning method based on machine learning, which comprises the steps of extracting gate-level netlist characteristics, dividing hardware Trojan horse types, and detecting and positioning by using Ove-class SVM and BPNN respectively. The method is difficult to divide the types of the hardware Trojan in the preprocessing stage, and the Trojan network cables are easy to be positioned by mistake.

In summary, the existing hardware Trojan detection and positioning method still has the defects of low hardware Trojan positioning precision and efficiency and the need of an ideal model as a reference.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a hardware Trojan horse positioning method in a gate level netlist based on machine learning, so that the positioning accuracy and efficiency of the hardware Trojan horse are improved under the condition that an ideal model is not required to be referenced.

In order to achieve the above purpose, the technical scheme of the invention comprises the following steps:

(1) Dividing an integrated circuit in a sample into a plurality of extremely large single-output sub-modules;

(2) Feature extraction is performed by taking each maximum single output sub-module in the integrated circuit as a unit to form a data set, and the data set is formed according to 7:3, dividing the ratio into a training set and a testing set;

(3) Training a machine learning model by using a cross-validation method to obtain a trained classifier;

(4) Selecting a gate-level netlist to be detected for Trojan horse detection, and outputting a detection result;

(5) Judging whether the output result of the step (4) contains a hardware Trojan horse or not:

if the hardware Trojan is not contained, the positioning is completed;

otherwise, executing the step (6);

(6) Positioning the detected hardware Trojan horse:

(6a) The current gate-level network table is marked as C, the golden design version of the non-implanted Trojan corresponding to the C is marked as C ', the C and the C' are divided into a plurality of maximum single output sub-modules, and the characteristic vector of each maximum single output sub-module is extracted;

(6b) A maximum single output sub-module a of Trojan is implanted into one of the C detected in the step (4), and the maximum single output sub-module a which is the closest to the C 'is found out according to Euclidean distance between feature vectors and is marked as a';

(6c) Performing Trojan horse searching based on layer-by-layer difference analysis on the a and the a' to obtain a plurality of Trojan horse areas;

(6d) Steps (6 b) to (6 c) are performed on all the maximum single output sub-modules detected in step (4) to obtain a Trojan horse area.

Compared with the prior art, the invention has the following advantages:

first, the invention divides the integrated circuit to be tested into a plurality of extremely large single output sub-modules, which can realize the secondary division of different logic cone overlapping areas in the traditional method for dividing the integrated circuit by logic cones, thereby simplifying the operation in the process of detecting and positioning the hardware Trojan and improving the efficiency of detecting and positioning the hardware Trojan.

Secondly, in hardware Trojan detection, because the gate-level netlist of the integrated circuit is divided into a plurality of maximum single-output sub-modules, each maximum single-output sub-module is mutually independent in the hardware Trojan detection process, the parallel detection of the plurality of maximum single-output sub-modules is facilitated, and the time required for detecting the large-scale integrated circuit can be effectively shortened; meanwhile, as the data set is constructed by taking the extremely large single-output sub-module instead of the whole gate-level netlist as a unit, the size of the constructed data set is increased by tens of times, the performance of the classifier obtained by training is obviously improved, and the Trojan detection accuracy is further improved.

Third, in the hardware Trojan positioning, as the gate-level netlist of the integrated circuit is divided into a plurality of extremely large single-output sub-modules, most logic gates and signal lines belong to the hardware Trojan in each Trojan area finally positioned, and the accuracy of Trojan positioning is improved.

Drawings

FIG. 1 is a flow chart of an implementation of the present invention;

FIG. 2 is a schematic diagram of a very large single output sub-module division in accordance with the present invention;

FIG. 3 is a schematic illustration of Trojan search based on layer-by-layer variance analysis in the present invention.

Detailed Description

Embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

Referring to fig. 1, the implementation steps of this example are as follows:

and step 1, acquiring an integrated circuit sample, and dividing the integrated circuit in the sample into a plurality of extremely large single output sub-modules.

(1.1) selecting as a sample a gate level netlist of a plurality of integrated circuits comprising a plurality of "golden designs" without Trojan horse implantation, and each of the various versions of the "golden designs" when different Trojan horse implantation;

(1.2) for each gate-level netlist in the sample, dividing it into a plurality of very large single-output sub-modules:

(1.2.1) abstracting the gate-level netlist into a directed graph by taking a logic gate in the gate-level netlist as a vertex, taking a signal line branch as a directed edge, namely taking a starting point as an input pin of the logic gate and taking an end point as an output pin of the logic gate, wherein a gate to which each main output pin in the gate-level netlist belongs corresponds to a 'converging node' of the directed graph to form a converging node set T;

(1.2.2) performing breadth-first traversal on the maximum single-output sub-module taking one converging node t in the converging node set as a starting point, and judging whether all output nodes connected with the node i in the traversal process belong to the maximum single-output sub-module or not:

if yes, adding the node i into the maximum single output sub-module, and executing the step (1.2.3);

otherwise, consider node i as a junction node, add it to junction node set T, execute (1.2.3);

(1.2.3) repeatedly executing the step (1.2.2) on all nodes in the maximum single-output sub-module taking the merging node t as a starting point until all nodes are traversed, namely forming the maximum single-output sub-module taking a logic gate corresponding to the merging node t as a vertex;

(1.2.4) repeating steps (1.2.2) through (1.2.3) for all the junction nodes in the junction node set T, the gate level netlist can be partitioned into a plurality of very large single output sub-modules.

(1.3) performing step (1.2) on all gate level netlists in the sample, resulting in a plurality of maximum output sub-modules of the sample.

And 2, performing feature extraction on a plurality of maximum output sub-modules in the integrated circuit to form a data set, and dividing the data set into a training set and a testing set.

(2.1) extracting static structural features of each maximum single-output sub-module, constructing feature vectors, and combining the feature vectors of all the maximum single-output sub-modules to form a matrix at the tail parts of the feature vectors according to whether the maximum single-output sub-modules contain hardware Trojan additional tags or not, namely 1 represents the existence and 0 represents the nonexistence;

(2.2) executing the step (2.1) on the maximum output submodules of all gate-level netlists in the sample to obtain a plurality of matrixes, merging the matrixes according to rows, and removing repeated rows to obtain matrixes, namely a data set;

(2.3) according to 7: the scale of 3 divides the dataset into a training portion and a testing portion.

And step 3, training a machine learning model to obtain a trained classifier.

K nearest neighbors, decision trees, naive Bayes classifiers are trained using a cross-validation approach. And dynamically adjusting parameters for each classifier according to the cross-validation score to achieve the best effect, thereby obtaining the trained classifier.

And 4, dividing a gate-level netlist of the integrated circuit to be tested into two maximum single-output sub-modules.

Referring to fig. 2, the implementation of this step is as follows:

(4.1) taking a logic gate as a vertex, branching a signal line into a directed edge, abstracting a gate-level netlist into a directed graph, and respectively corresponding to two junction nodes T1 and T2 in the directed graph to a logic gate G4 to which a main output pin PO1 belongs and a logic gate G8 to which a main output pin PO2 belongs to form a junction node set T;

(4.2) performing breadth-first traversal on the maximum single-output sub-module taking the junction node T1 as a starting point, when traversing to the logic gate G2, adding the logic gate G2 as a junction node T3 into the junction node set T because one output node G6 of the logic gate G2 does not belong to the current maximum single-output sub-module, and continuing breadth-first traversal, wherein after traversing, main input pins PI 1-PI 3, main output pins PO1, signal lines W1-W2 and logic gates G1, G3 and G4 obtained by traversing the junction node T1 are divided into a maximum single-output sub-module a;

(4.3) performing breadth-first traversal on the maximum single-output sub-module taking the junction node T2 as a starting point, when traversing to the main input pin PI4, adding PI4 as a junction node T4 into the junction node set T as one output node G2 of the PI4 does not belong to the current maximum single-output sub-module, continuing the traversal, and dividing main input pins PI 5-PI 6 obtained by traversing the junction node T2 after the traversal is finished, main output pins PO2, signal lines W4-W6 and logic gates G5-G8 into a maximum single-output sub-module b;

(4.4) performing breadth-first traversal on the maximum single-output sub-module taking the merging node t3 as a starting point, wherein after the traversal is finished, the logic gate G2 is divided into a maximum single-output sub-module c through a main input pin PI3 obtained by traversing the merging node;

(4.5) performing breadth-first traversal on the maximum single-output sub-module taking the merging node t4 as a starting point, and dividing only the main input pin PI4 into the maximum single-output sub-module d because PI4 traversal is empty.

And 5, extracting feature vectors of the maximum single-output sub-modules a, b, c and d obtained by traversing the four converging nodes, and forming a feature matrix by using the feature vectors.

(5.1) traversing the four maximum single output sub-modules a, b, c and d respectively, and calculating 10 hardware Trojan horse related characteristics of the respective main input number, the main input branch number, the main output branch number, the logic gate number, the trigger number, the signal line number, the total fan-in, the total fan-out and the loop number, wherein the specific definition of each hardware Trojan horse related characteristic is shown in table 1:

TABLE 1 hardware Trojan horse related feature definition

Feature name	Description of the invention
		Number of main inputs	The number of main input pins contained in the maximum single output sub-module
Number of branches	Number of logic gate input pins connected to the main input pin in the maximum single output sub-module
		Number of main outputs	The number of main output pins contained in the maximum single output sub-module
Number of branches	Number of logic gate input pins connected to the main output pin in the maximum single output sub-module
		Number of logic gates	The number of basic logic gates contained in the maximum single output sub-module
Number of triggers	The number of trigger class logic gates contained in the very large single output sub-module
		Number of signal lines	The number of interconnects contained in a very large single output sub-module
General fan-in	The sum of the input logic gate numbers of all logic gates in the maximum single output sub-module
		Total fan-out	The sum of the output logic gate numbers of all logic gates in the maximum single output sub-module
Number of loops	The number of loops (i.e., simple loops in the directed graph) contained in the very large single output sub-module

(5.2) arranging the 10 hardware Trojan horse related features of the four maximum single output sub-modules a, b, c and d in sequence to form feature vectors of the maximum single output sub-modules a, b, c and d, wherein the feature vectors are shown in table 2:

TABLE 2 eigenvector values for different polarity large single output sub-modules

(5.3) combining the feature vectors of the maximum single output sub-module a, the maximum single output sub-module b, the maximum single output sub-module c and the maximum single output sub-module d to form a feature Matrix, wherein the feature Matrix is expressed as follows:

and 6, predicting by a classifier according to the characteristic Matrix to obtain tag vectors corresponding to the four maximum output sub-modules a, b, c and d which are respectively traversed by the four converging nodes.

(6.1) inputting the feature Matrix into three classifiers of a K nearest neighbor, a decision tree and a naive Bayes classifier, and predicting a corresponding four-dimensional tag column vector through each classifier to obtain three different four-dimensional tag column vectors;

(6.2) constructing a tag vector v1 corresponding to the maximum single output sub-module a with a first dimension of the three different four-dimensional tag column vectors, constructing a tag vector v2 corresponding to the maximum single output sub-module b with a second dimension of the three different four-dimensional tag column vectors, constructing a tag vector v3 corresponding to the maximum single output sub-module c with a third dimension of the three different four-dimensional tag column vectors, and constructing a tag vector v4 corresponding to the maximum single output sub-module d with a fourth dimension of the three different four-dimensional tag column vectors, expressed as follows:

v1＝(1 1 0)，

v2＝(0 0 0)，

v3＝(0 0 0)，

v4＝(0 0 0)。

and 7, detecting the hardware Trojan horse in the gate-level netlist to be detected according to the label vectors corresponding to the four maximum single output sub-modules, and outputting a hardware Trojan horse set M.

According to the method, whether at least one tag exists in a tag vector corresponding to a maximum single output sub-module or not is judged to be 1, whether a hardware Trojan is implanted into the maximum single output sub-module or not is judged, and a hardware Trojan set M is set to be empty:

(7.1) judging tag vectors of four maximum single output sub-modules obtained by four converging nodes in the gate-level netlist to be tested:

when two labels are 1 in the label vector v1 corresponding to the maximum single output sub-module a, the hardware Trojan is considered to be implanted into the a, the a is added into the hardware Trojan set M, and the label vector of the b is judged;

when the labels in the label vector v2 corresponding to the maximum single output sub-module b are all 0, the hardware Trojan horse is considered not to be implanted in the label vector b, and the label vector of the label vector c is judged;

when the labels in the label vector v3 corresponding to the maximum single output sub-module c are all 0, the hardware Trojan horse is considered not to be implanted in the c, and the label vector of d is judged;

and when the labels in the label vector v4 corresponding to the maximum single output sub-module d are all 0, the hardware Trojan horse is not implanted in d, and the step (7.2) is executed.

And (7.2) outputting the hardware Trojan horse set M.

And 8, carrying out Trojan positioning on the maximum output sub-module a in which the hardware Trojan is implanted in the hardware Trojan set M.

Referring to fig. 3, the implementation of this step is as follows:

(8.1) marking the current gate level network table as C, marking the golden design version of the non-implanted Trojan corresponding to C as C ', dividing the C' into a plurality of maximum single output sub-modules, and extracting the characteristic vector of each maximum single output sub-module;

(8.2) finding out the nearest maximum single output sub-module of a in C 'according to Euclidean distance between feature vectors, and marking as a';

(8.3) performing Trojan search based on layer-by-layer difference analysis on the a and a', and obtaining a plurality of Trojan areas.

(8.3.1) comparing the layers 1, wherein a and a' only comprise a logic gate G5 in the layer 1, and the two logic gates are the same in type and are two input or gates, so that the logic gate in the next position is continuously traversed and compared to obtain the layer 2;

(8.3.2) comparing the layer 2, wherein the logic gate of a is G4 and T3 in the layer 2, the logic gate of a 'is G4 and G3 in the layer 2, the logic gates G4 of a and a' are two-input OR gates, but the logic gate T3 of a is two-input AND gate and the logic gate G3 of a 'is an inverter, so the logic gate T3 of a is recorded as a Trojan horse output gate, the logic gates T3 of a and the logic gate G3 of a' are eliminated, and the logic gate of the next position is compared continuously to obtain the layer 3;

(8.3.3) comparing the 3 rd layer, wherein a and a' both contain logic gates G1 and G2 in the 3 rd layer, and the two logic gates G1 are two-input AND gates, and the two logic gates G2 are two-input NAND gates, so that the next logic gate is continuously traversed and compared to obtain the 4 th layer;

(8.3.4) comparing layers 4, and ending the traversal if layers 4 of a and a' are both empty;

(8.3.5) the unique Trojan output gate T3 is traversed by a breadth first of depth 8 starting from this, resulting in a unique Trojan region D in a which contains logic gates T3, T2, T1, G3 and primary input pins PI5, PI6.

The above description is only one specific example of the invention and does not constitute any limitation of the invention, and it will be apparent to those skilled in the art that various modifications and changes in form and details may be made without departing from the principles, construction of the invention, but these modifications and changes based on the idea of the invention are still within the scope of the claims of the invention.

Claims

1. The method for positioning the hardware Trojan in the gate-level netlist based on machine learning is characterized by comprising the following steps of:

if the hardware Trojan is not contained, the detection is completed;

otherwise, executing the step (6);

(6) Positioning the detected hardware Trojan horse:

2. The method of claim 1, wherein the step (1) of dividing the integrated circuit under test into a plurality of very large single output sub-modules comprises the steps of:

1a) The logic gate is taken as a vertex, a signal line branch is taken as a directed edge, namely, the starting point is taken as an input pin of the logic gate, the end point is taken as an output pin of the logic gate, and the gate-level netlist is abstracted into a directed graph, wherein a gate to which each main output pin in the gate-level netlist belongs corresponds to a 'merging node' of the directed graph, and a merging node set T is formed;

1b) Performing breadth-first traversal on a maximum single-output sub-module taking one converging node t in a converging node set as a starting point, and judging whether all output nodes connected with the node i in the traversal process belong to the maximum single-output sub-module or not:

if yes, adding the node i into the maximum single output sub-module, and executing the step 1 c);

otherwise, consider node i as a junction node, add it to junction node set T, execute 1 d);

1c) Repeatedly executing the step (1 b) on all nodes in the maximum single-output sub-module taking the merging node t as a starting point until all nodes are traversed, namely forming the maximum single-output sub-module taking a logic gate corresponding to the merging node t as a vertex;

1d) Repeating steps 1 b) through 1 c) for all the junction nodes in the junction node set T, the gate level netlist can be partitioned into a plurality of maximum single output sub-modules.

3. The method of claim 1, wherein the feature extraction in step (2) is performed in units of each maximum single output sub-module in the integrated circuit to form a dataset, implemented as follows:

2a) Selecting gate level netlists of a plurality of integrated circuits as samples, wherein the gate level netlists comprise a plurality of gold designs without Trojan horse implantation, and various versions of each gold design when different Trojan horse is implanted;

2b) Dividing each gate-level netlist in the sample into a plurality of maximum single-output sub-modules, extracting static structural features of each maximum single-output sub-module, constructing a feature vector, and merging the feature vectors of all the maximum single-output sub-modules into a matrix at the tail of the feature vector according to whether the maximum single-output sub-modules contain hardware Trojan additional tags or not, namely 1 represents the presence and 0 represents the absence;

2c) And (3) executing the step (2 b) on all gate-level netlists in the sample to obtain a plurality of matrixes, merging the matrixes according to rows, and removing repeated rows to obtain the matrixes, namely the data set.

4. The method of claim 1, wherein the hardware Trojan horse detection in the step (4) is performed and a detection result is output, and the following is implemented:

4a) Selecting a gate-level netlist, setting a hardware Trojan horse set of the gate-level netlist as M, and setting M as null at the moment;

4b) Dividing the gate-level netlist into a plurality of maximum single-output sub-modules, extracting the feature vector of each maximum single-output sub-module, and forming a feature matrix by using the feature vectors of all the maximum single-output sub-modules;

4c) Respectively inputting the feature matrix into a K nearest neighbor, a decision tree and a naive Bayes classifier, and predicting to obtain a corresponding label vector;

4d) For each maximum single output sub-module, judging whether at least one label of label vectors predicted by a classifier is 1:

if yes, the hardware Trojan is considered to be implanted, and the hardware Trojan is added into the hardware Trojan set M, and the next maximum single output sub-module is continuously judged;

otherwise, continuing to judge the next maximum single output sub-module;

4e) Repeatedly executing the step 4 d) on all the maximum single output sub-modules in the gate-level netlist until the judging operation is finished, and judging whether the hardware Trojan horse set M of the gate-level netlist is empty or not:

if yes, reporting that the device does not contain a hardware Trojan horse;

otherwise, outputting the hardware Trojan horse set M.

5. The method according to claim 1, wherein the step (6 c) of performing a Trojan horse search based on layer-by-layer difference analysis on a and a' is implemented as follows:

6c1) The corresponding vertexes are used as starting points, the corresponding directed graphs of a and a ' are traversed by breadth first respectively, logic gate sequences accessed by a and a ' under the same traversing depth are compared in sequence in the BFS process, and logic gates at the same positions in the two logic gate sequences are recorded as g and g ' respectively;

6c2) Comparing whether the types of g and g' are the same:

if the types of g and g 'are different, firstly recording g as a Trojan horse output gate, respectively eliminating g and g' in the traversal of the next depth of a and a ', and then continuously comparing the types of the logic gates g and g' of the next position;

if the types of g and g 'are the same, directly continuing to compare the types of the logic gates g and g' at the next position;

6c3) Repeating the step (6 c 2) until all logic gates of a or a' are traversed, and obtaining a plurality of trojan output gates;

6c4) Performing breadth-first traversal on each Trojan output gate, and dividing all non-repeated logic gates, signal lines, main input pins and main output pins accessed in the traversal process into a Trojan region;

6c5) And (6 c 4) executing the step of outputting the door to all the obtained trojans to obtain a plurality of trojan areas.

6. A method according to claim 3, wherein the extracting of the static structural features of each of the maximum single output sub-modules in step (2 b) comprises: the number of main inputs, the number of main inputs branches, the number of main outputs branches, the number of logic gates, the number of flip-flops, the number of signal lines, the total fan-in, the total fan-out, and the number of loops.