CN113794683A

CN113794683A - Industrial Internet of things intrusion detection method, device, equipment and storage medium

Info

Publication number: CN113794683A
Application number: CN202110906425.9A
Authority: CN
Inventors: 李贝贝; 杜卿芸; 李涛; 高飞; 黄猛
Original assignee: Sichuan University
Current assignee: Sichuan University
Priority date: 2021-08-06
Filing date: 2021-08-06
Publication date: 2021-12-14

Abstract

The invention discloses an industrial Internet of things intrusion detection method, which comprises the steps of obtaining original network data of a target industrial Internet of things; performing characteristic screening on the original network data to obtain network characteristic data; preprocessing the network characteristic data to obtain data to be detected; and inputting the data to be detected into an intrusion detection intelligent agent obtained by training so as to obtain an intrusion detection result. The invention also discloses an industrial Internet of things intrusion detection device, equipment and a storage medium. According to the invention, the data to be detected is obtained by processing the original network data, the data to be detected is input into the mature intrusion detection agent of the training strategy which is continuously updated by the loss function, and the intrusion detection is carried out on the data to be detected, so that the identification strategy of the intelligent intrusion detection agent can be dynamically adjusted according to the characteristics of the continuous change of the environment and the structure of the industrial Internet of things, the accurate intrusion detection result is obtained, and the intelligent intrusion detection agent has strong adaptability.

Description

Industrial Internet of things intrusion detection method, device, equipment and storage medium

Technical Field

The invention relates to the technical field of network security, in particular to an industrial Internet of things intrusion detection method, device, equipment and storage medium.

Background

The industrial internet of things is a complex network, and any failure or abnormality of a part of the system can cause great damage to the whole system in a short time. Therefore, early detection of a network attack is critical to timely and effective network response. An Intrusion Detection System (IDS) is an important component of network security protection, and can help the System to effectively discover network Intrusion behavior.

However, in recent years, as the operating environment and structure of the industrial internet of things continuously change, a traditional intrusion detection model (such as an intrusion detection model based on simple machine learning) often does not have the adaptive adjustment capability for network threats, cannot dynamically adjust the identification strategy of the traditional intrusion detection model when the network risk environment of the industrial internet of things changes, and further cannot provide adaptive detection, response, defense and the like for complex network attacks.

Therefore, how to provide adaptive intrusion detection for the industrial internet of things with continuously changing operating environments and structures is a technical problem which needs to be solved urgently.

The above is only for the purpose of assisting understanding of the technical aspects of the present invention, and does not represent an admission that the above is prior art.

Disclosure of Invention

The invention mainly aims to provide an industrial Internet of things intrusion detection method, device, equipment and storage medium, and aims to solve the technical problem that a traditional intrusion detection model in the prior art cannot provide self-adaptive intrusion detection for industrial Internet of things with continuously changing operating environment and structure.

In order to achieve the purpose, the invention provides an industrial internet of things intrusion detection method, which comprises the following steps:

acquiring original network data of a target Internet of things, wherein the original network data is text data for recording interaction and operation of the target Internet of things;

performing characteristic screening on the original network data to obtain network characteristic data;

preprocessing the network characteristic data to obtain data to be detected;

inputting the data to be detected into an intrusion detection agent obtained by training to obtain an intrusion detection result; the intrusion detection agent is obtained according to a training strategy and a loss function in a training mode, the training strategy and the loss function are obtained according to a reward cumulative function simulated by an environment state, and the environment state is obtained according to historical industrial Internet of things data.

Optionally, the step of performing feature screening on the original network data to obtain network feature data specifically includes:

extracting the characteristics of the original network data to obtain initial network characteristic data;

and performing characteristic screening on the initial network characteristic data by adopting a LightGBM algorithm to obtain the network characteristic data.

Optionally, the step of performing feature screening on the initial network feature data by using a LightGBM algorithm to obtain the network feature data specifically includes:

and deleting target network characteristic data from the initial network characteristic data, wherein the target network characteristic data comprise network characteristic data with a missing value larger than a first preset value, network characteristic data with a unique value, any one of strong correlation network characteristic data pairs and network characteristic data with a characteristic importance rank lower than a second preset value, which are obtained according to a LightGBM algorithm.

Optionally, the step of preprocessing the network characteristic data to obtain data to be detected specifically includes:

carrying out normalization processing on the network characteristic data to obtain normalized characteristic data;

performing feature vectorization on the normalized feature data to obtain feature vector data;

and carrying out single-hot coding on the feature vector data to obtain data to be detected.

Optionally, the intrusion detection agent is obtained based on training set cycle training, wherein the loss function is utilized in each training cycle to optimize the training strategy, so that the intrusion detection agent is obtained when a preset convergence condition is satisfied.

Optionally, the expression of the reward accumulation function is:

wherein γ ∈ [0,1 ]]For the discount coefficient, t is the time step in the ambient state, R_t+k+1The accumulated sum of the rewards from time step t to time step t + k + 1.

Optionally, the training strategy is obtained according to a state value function and an action value function obtained by the reward cumulative function; wherein:

the expression of the state value function is:

the expression of the action value function is:

in the formula,

in order to be the probability of a state transition,

in the form of a set of state spaces,

is a motion space set, R is a reward function, and s is a stateA is the action, s 'is the next state for state s transition, and a' is the next action performed by action a.

In addition, in order to achieve the above object, the present invention further provides an industrial internet of things intrusion detection device, including:

the data acquisition module is used for acquiring original network data of the target Internet of things, wherein the original network data is text data for recording interaction and operation of the target Internet of things;

the characteristic processing module is used for carrying out characteristic screening on the original network data to obtain network characteristic data;

the preprocessing module is used for preprocessing the network characteristic data to obtain data to be detected;

the detection module is used for inputting the data to be detected into an intrusion detection agent obtained by training so as to obtain an intrusion detection result; the intrusion detection agent is obtained according to a training strategy and a loss function in a training mode, the training strategy and the loss function are obtained according to a reward cumulative function simulated by an environment state, and the environment state is obtained according to historical industrial Internet of things data.

In addition, in order to achieve the above object, the present invention further provides an industrial internet of things intrusion detection device, including: the intrusion detection method comprises a memory, a processor and an industrial Internet of things intrusion detection program, wherein the industrial Internet of things intrusion detection program is stored in the memory and can run on the processor, and the industrial Internet of things intrusion detection program is realized when the processor executes.

In addition, in order to achieve the above object, the present invention further provides a storage medium, where an industrial internet of things intrusion detection program is stored on the storage medium, and the industrial internet of things intrusion detection program, when executed by a processor, implements the steps of the industrial internet of things intrusion detection method.

In the invention, original network data of a target Internet of things are obtained; performing characteristic screening on the original network data to obtain network characteristic data; preprocessing the network characteristic data to obtain data to be detected; and inputting the data to be detected into an intrusion detection intelligent agent obtained by training so as to obtain an intrusion detection result. According to the invention, the data to be detected is obtained by processing the original network data, the data to be detected is input into the mature intrusion detection agent of the training strategy which is continuously updated by the loss function, and the intrusion detection is carried out on the data to be detected, so that the identification strategy of the intelligent intrusion detection agent can be dynamically adjusted according to the characteristics of the continuous change of the environment and the structure of the industrial Internet of things, the accurate intrusion detection result is obtained, and the intelligent intrusion detection agent has strong adaptability.

Drawings

Fig. 1 is a schematic structural diagram of a recommendation device of a hardware operating environment and an intrusion detection method for an industrial internet of things according to an embodiment of the present invention.

Fig. 2 is a diagram of a communication network system architecture according to an embodiment of the present invention.

Fig. 3 is a schematic flow chart of a first embodiment of an intrusion detection method for an industrial internet of things according to the present invention.

Fig. 4 is a detailed flowchart of step S200 in fig. 3.

Fig. 5 is a detailed flowchart of step S300 in fig. 3.

Fig. 6 is a flowchart illustrating an intrusion detection method for an industrial internet of things according to a second embodiment of the present invention.

Fig. 7 is a schematic diagram of a training process of the intrusion detection method for the industrial internet of things.

Fig. 8 is a flowchart illustrating an intrusion detection method for an industrial internet of things according to a third embodiment of the present invention.

Fig. 9 is a block diagram of an embodiment of an intrusion detection device for an industrial internet of things.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The industrial internet of things is a complex network, and any failure or abnormality of a part of the system can cause great damage to the whole system in a short time. Therefore, early detection of a network attack is critical to timely and effective network response. An Intrusion Detection System (IDS) is an important component of network security protection, and can help the System to effectively discover network Intrusion behavior. However, in recent years, as the operating environment and structure of the industrial internet of things continuously change, a traditional intrusion detection model (such as an intrusion detection model based on simple machine learning) often does not have the adaptive adjustment capability for network threats, cannot dynamically adjust the identification strategy of the traditional intrusion detection model when the network risk environment of the industrial internet of things changes, and further cannot provide adaptive detection, response, defense and the like for complex network attacks.

In order to solve the problem, various embodiments of the intrusion detection method for the industrial internet of things are provided. The industrial Internet of things intrusion detection method provided by the invention is based on the original network data of the target Internet of things; and processing the original network data to obtain data to be detected, and inputting the data to be detected into a mature intrusion detection intelligent agent of a training strategy which is continuously updated by a loss function to judge whether intrusion behaviors exist or not.

Referring to fig. 1, fig. 1 is a schematic structural diagram of a recommendation device of a hardware operating environment and an intrusion detection method for an industrial internet of things according to an embodiment of the present invention.

The device may be a User Equipment (UE) such as a Mobile phone, smart phone, laptop, digital broadcast receiver, Personal Digital Assistant (PDA), tablet computer (PAD), handheld device, vehicular device, wearable device, computing device or other processing device connected to a wireless modem, Mobile Station (MS), or the like. The device may be referred to as a user terminal, portable terminal, desktop terminal, etc.

Generally, the apparatus comprises: at least one processor 301, a memory 302, and an industrial internet of things intrusion detection program stored on the memory and executable on the processor, the industrial internet of things intrusion detection program being configured to implement the steps of the industrial internet of things intrusion detection method as described above.

The processor 301 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so on. The processor 301 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 301 may also include a main processor and a coprocessor, where the main processor is a processor for processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 301 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content required to be displayed on the display screen. The processor 301 may further include an AI (Artificial Intelligence) processor for processing relevant industrial internet of things intrusion detection operations, so that the industrial internet of things intrusion detection model may be trained and learned autonomously, thereby improving efficiency and accuracy.

Memory 302 may include one or more computer-readable storage media, which may be non-transitory. Memory 302 may also include high speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in the memory 302 is used to store at least one instruction for execution by the processor 801 to implement the industrial internet of things intrusion detection method provided by the method embodiments herein.

In some embodiments, the terminal may further include: a communication interface 303 and at least one peripheral device. The processor 301, the memory 302 and the communication interface 303 may be connected by a bus or signal lines. Various peripheral devices may be connected to communication interface 303 via a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of radio frequency circuitry 304, a display screen 305, and a power source 306.

The communication interface 303 may be used to connect at least one peripheral device related to I/O (Input/Output) to the processor 301 and the memory 302. The communication interface 303 is used for receiving the movement tracks of the plurality of mobile terminals uploaded by the user and other data through the peripheral device. In some embodiments, processor 301, memory 302, and communication interface 303 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 301, the memory 302 and the communication interface 303 may be implemented on a single chip or circuit board, which is not limited in this embodiment.

The Radio Frequency circuit 304 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuit 304 communicates with a communication network and other communication devices through electromagnetic signals, so as to obtain the movement tracks and other data of a plurality of mobile terminals. The rf circuit 304 converts an electrical signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 304 comprises: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuitry 304 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: metropolitan area networks, various generation mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the rf circuit 304 may further include NFC (Near Field Communication) related circuits, which are not limited in this application.

The display screen 305 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 305 is a touch display screen, the display screen 305 also has the ability to capture touch signals on or over the surface of the display screen 305. The touch signal may be input to the processor 301 as a control signal for processing. At this point, the display screen 305 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the display screen 305 may be one, the front panel of the electronic device; in other embodiments, the display screens 305 may be at least two, respectively disposed on different surfaces of the electronic device or in a folded design; in still other embodiments, the display screen 305 may be a flexible display screen disposed on a curved surface or a folded surface of the electronic device. Even further, the display screen 305 may be arranged in a non-rectangular irregular figure, i.e. a shaped screen. The Display screen 305 may be made of LCD (liquid crystal Display), OLED (Organic Light-Emitting Diode), and the like.

The power supply 306 is used to power various components in the electronic device. The power source 306 may be alternating current, direct current, disposable or rechargeable. When the power source 306 includes a rechargeable battery, the rechargeable battery may support wired or wireless charging. The rechargeable battery may also be used to support fast charge technology.

Those skilled in the art will appreciate that the configuration shown in fig. 1 does not constitute a limitation of the industrial internet of things intrusion detection device, and may include more or fewer components than those shown, or some components in combination, or a different arrangement of components.

In order to facilitate understanding of the embodiment of the present invention, a communication network system on which the industrial internet of things intrusion detection device of the present invention is based is described below.

Referring to fig. 2, fig. 2 is an architecture diagram of a communication Network system according to an embodiment of the present invention, where the communication Network system is an LTE system of a universal mobile telecommunications technology, and the LTE system includes a UE (User Equipment) 201, an E-UTRAN (Evolved UMTS Terrestrial Radio Access Network) 202, an EPC (Evolved Packet Core) 203, and an IP service 204 of an operator, which are in communication connection in sequence.

Specifically, the UE201 may be the terminal 100 described above, and is not described herein again.

The E-UTRAN202 includes eNodeB2021 and other eNodeBs 2022, among others. Among them, the eNodeB2021 may be connected with other eNodeB2022 through backhaul (e.g., X2 interface), the eNodeB2021 is connected to the EPC203, and the eNodeB2021 may provide the UE201 access to the EPC 203.

The EPC203 may include an MME (Mobility Management Entity) 2031, an HSS (Home Subscriber Server) 2032, other MMEs 2033, an SGW (Serving gateway) 2034, a PGW (PDN gateway) 2035, and a PCRF (Policy and Charging Rules Function) 2036, and the like. The MME2031 is a control node that handles signaling between the UE201 and the EPC203, and provides bearer and connection management. HSS2032 is used to provide registers to manage functions such as home location register (not shown) and holds subscriber specific information about service characteristics, data rates, etc. All user data may be sent through SGW2034, PGW2035 may provide IP address assignment for UE201 and other functions, and PCRF2036 is a policy and charging control policy decision point for traffic data flow and IP bearer resources, which selects and provides available policy and charging control decisions for a policy and charging enforcement function (not shown).

The IP services 204 may include the internet, intranets, IMS (IP Multimedia Subsystem), or other IP services, among others.

Although the LTE system is described as an example, it should be understood by those skilled in the art that the present invention is not limited to the LTE system, but may also be applied to other wireless communication systems, such as GSM, CDMA2000, WCDMA, TD-SCDMA, and future new network systems.

Based on the hardware structure of the industrial Internet of things intrusion detection equipment and the communication network system, the embodiment of the industrial Internet of things intrusion detection method is provided.

An embodiment of the invention provides an industrial internet of things intrusion detection method, and referring to fig. 3, fig. 3 is a schematic flow diagram of a first embodiment of the industrial internet of things intrusion detection method.

In this embodiment, the method for implementing intrusion detection of the industrial internet of things includes the following steps:

step S100: and acquiring original network data of the target Internet of things.

Specifically, in practical application, an execution main body of the method of the embodiment is an industrial internet of things intrusion detection device, and the industrial internet of things intrusion detection device may be various types of electronic devices such as a mobile phone, a tablet, a computer, or a wearable device. Of course, other devices with similar functions may also be used, and the present embodiment is not limited thereto.

The target internet of things is an industrial internet of things to be subjected to intrusion detection, and includes but is not limited to high-automation industrial field internet of things systems such as an electric power internet of things system, an oil and gas internet of things system, an urban rail transit internet of things system and the like. The original network data of the industrial Internet of things is text data and the like for recording the interaction and operation of the target Internet of things.

In addition, it is worth mentioning that, in this embodiment, the action of acquiring the original network data of the target internet of things may be specified by a user. For example, for an industrial internet of things with high real-time requirement and fast response to an intrusion action, the original network data can be acquired every day, every hour or every minute, or even acquired without intervals; for industrial internet of things with low real-time requirements and slow response of intrusion actions, the original network data can be acquired every three days, every week or every month, or even can be acquired manually by a user when the intrusion detection is required.

And it is easy to understand that, according to the requirements of the user, the original network data in the target internet of things can be obtained at each preset time interval or no time interval or manually, and whether the original network data access to the industrial internet of things has intrusion actions and behaviors or not is judged by processing the original network data.

Step S200: and performing characteristic screening on the original network data to obtain network characteristic data.

It should be noted that, when the intrusion action and behavior are determined according to the original network data, the feature data of the traffic information is used instead of the traffic information data, and after the original network data of the target internet of things is obtained, feature extraction and feature screening are performed on the original network data to obtain the network feature data for detecting the intrusion action.

Specifically, text data for recording interaction and operation of a target Internet of things is extracted and screened, and then interaction feature data of the target Internet of things and operation feature data of the target Internet of things are obtained; the interactive feature data of the target Internet of things comprise adjacent message time intervals, equipment IDs in command messages, equipment IDs in response messages, values of command/response neutron function codes and the like; the target internet of things operation characteristic data comprises a sensor measurement value, a response function, a set value, a command function code value, an equipment state value and the like. Of course, other network feature data with text data for recording interaction and operation of the target internet of things may also be used, which is not limited in this embodiment.

It is easy to understand that, in this embodiment, after the original data of the target internet of things is obtained, feature extraction and feature screening are performed on the original network data to obtain network feature data for detecting an intrusion behavior, and then whether the access of the target internet of things is an intrusion action and behavior is determined according to the network feature data.

Step S300: and preprocessing the network characteristic data to obtain data to be detected.

It is easy to understand that the acquired network characteristic data may be based on different terminal devices and different communication networks, and the data dimensions and data magnitudes obtained by the network characteristic data will be different, so that the acquired network characteristic data needs to be preprocessed before the network characteristic data is input into the detection model for intrusion action and behavior determination to obtain the data to be detected.

Specifically, after network characteristic data extracted from original network data of a target internet of things are obtained, in order to use the network characteristic data with different dimensions and different magnitudes as detection data of the same detection model, adaptive preprocessing needs to be performed on the network characteristic data, so that the data to be detected obtained through preprocessing can be directly input into the detection model to obtain an intrusion detection result, time consumed by feature engineering of the detection model in the detection process is reduced, and influence of unprocessed network characteristic data on the detection rate of the detection model is reduced.

It is easy to understand that, in this embodiment, after the network feature data extracted from the original network data of the target internet of things is obtained, the network feature data is preprocessed to obtain the data to be detected, which can be directly input into the detection model, where the data to be detected is the network feature data with different dimensions and different magnitudes, and the network feature data with different dimensions and different magnitudes are unified in the same dimension and the same magnitude.

Step S400: and inputting the data to be detected into an intrusion detection intelligent agent obtained by training so as to obtain an intrusion detection result.

It should be noted that, in this embodiment, after the data to be detected is obtained, the intrusion detection agent based on the deep reinforcement learning training is adopted to judge the intrusion behavior and the action of the data to be detected, so as to obtain an intrusion detection result.

Specifically, in the embodiment, the data to be detected is subjected to judgment of intrusion behaviors and actions to obtain an intrusion detection result, wherein the intrusion detection agent obtaining the intrusion detection result is obtained by training a training strategy and a loss function obtained by a reward cumulative function simulated by an environment state, and the environment state is obtained according to historical industrial internet of things data.

In the embodiment, the data to be detected is obtained by processing the original network data, the data to be detected is input into the mature intrusion detection agent of the training strategy which is continuously updated through the loss function, the data to be detected is subjected to intrusion detection, the identification strategy of the intelligent intrusion detection agent can be dynamically adjusted according to the characteristics of the continuous change of the environment and the structure of the industrial internet of things, the accurate intrusion detection result is obtained, and the intelligent intrusion detection agent has strong adaptability.

For convenience of understanding, referring to fig. 4, fig. 4 is a flowchart illustrating a specific implementation scheme of step S200 of the industrial internet of things intrusion detection method according to the present invention. The embodiment provides a specific implementation scheme for performing feature screening on the original network data to obtain network feature data, which is specifically as follows:

s201: and extracting the characteristics of the original network data to obtain initial network characteristic data.

S202: and performing characteristic screening on the initial network characteristic data by adopting a LightGBM algorithm to obtain the network characteristic data.

Specifically, in this step, feature screening is performed on the original network data to obtain network feature data, which includes feature extraction of the original network data and screening of the network feature data.

It should be noted that the feature extraction of the original network data is used to extract feature data in the original network data to obtain initial network feature data for detection by the detection model. The feature extraction may use feature extraction algorithms such as Principal Component Analysis (PCA), Independent Component Analysis (ICA), and Linear Discriminant Analysis (LDA) to process the original network data, and of course, other algorithms with feature extraction may also be used, which is not limited in this embodiment.

It should be noted that after the feature extraction is performed on the original network data, the feature screening is further performed on the original network feature data to obtain the processed network feature data. In this embodiment, the feature screening of the initial network feature data adopts a LightGBM algorithm to process the initial network feature data to delete the target network feature data in the initial network feature data, so as to obtain the network feature data for detecting the model.

Specifically, in this embodiment, the target network feature data includes network feature data with a missing value greater than a first preset value, network feature data with a unique value, any one of the pair of strongly correlated network feature data, and network feature data with a feature importance rank lower than a second preset value, which is obtained according to the LightGBM algorithm, and it should be noted that the pair of strongly correlated network feature data in this embodiment is a pair of network feature data with a Pearson correlation coefficient not lower than 0.99.

In this embodiment, when the missing rate of a feature is greater than 60%, it is defined that the feature has a large influence on the accuracy of intrusion detection, and the feature needs to be deleted, the first preset value is set to 60%; when the importance rank of one feature is lower than 70% of the number of all features, the feature is defined to have small influence on the detection result of intrusion detection, and the feature needs to be deleted, the second preset value is set to be 70%. It should be noted that, when different industrial internet of things process different feature data, different first preset values and second preset values may be set to improve the applicability of the feature data of intrusion detection, which is not limited in this embodiment.

In addition, it is easy to understand that, in order to reduce noise redundancy of the original network data and improve multi-class detection accuracy of the model, the feature selection is performed on the original network data first, and the redundancy dimension of the data is effectively reduced while the intrusion detection performance is ensured. After the original data of the target Internet of things are obtained, feature extraction and feature screening are carried out on the original network data to obtain network feature data for detecting intrusion behaviors, and whether the current access of the target Internet of things is an intrusion action or behavior is judged according to the network feature data.

For convenience of understanding, referring to fig. 5, fig. 5 is a flowchart illustrating a specific implementation scheme of step S300 of the method for detecting an intrusion of an industrial internet of things according to the present invention. This embodiment provides a specific implementation scheme for preprocessing the network characteristic data to obtain data to be detected, which is specifically as follows:

s301: and carrying out normalization processing on the network characteristic data to obtain normalized characteristic data.

S302: and carrying out feature vectorization on the normalized feature data to obtain feature vector data.

S303: and carrying out single-hot coding on the feature vector data to obtain data to be detected.

Data normalization, also called data normalization, is to limit the data to be processed to a certain range after being processed by some algorithm. In the data standardization process, a basic work of data mining is carried out, different evaluation indexes often have different dimensions and dimension units, the data analysis result is influenced under the condition, and in order to eliminate the dimension influence among the indexes, the data needs to be subjected to normalization process, so that the comparability problem among the data indexes is solved.

Specifically, in the present embodiment, data normalization uses the min-max function to scale the range eigenvalues to [0, 1%]Intervals, and further all variables in different intervals are normalized by the normalization formula:

wherein x is the original value and x' is the normalized value.

It should be noted that the feature vectorization is used to extract and combine attributes in the network feature data, each piece of network feature data has its attribute, different attributes are represented by different attribute values, and a feature vector is obtained by combining multiple attribute values.

It should be noted that the One-Hot encoding, i.e., One-Hot encoding, encodes N states by using N as a state register, each state having a separate register bit, and only One of the bits is active at any time. In the embodiment, the feature vector data is processed through the one-hot coding, only one feature in the feature vector is activated, sparse feature vector data is obtained, and then the data to be detected which can be directly detected by the detection model is obtained.

And it is easy to understand that, the present embodiment can directly use the data to be detected in order to obtain the detection model. After network characteristic data extracted from original network data of a target Internet of things are obtained, preprocessing the network characteristic data to obtain to-be-detected data which can be directly input into a detection model, wherein the to-be-detected data are obtained by considering the obtained network characteristic data with different dimensions and different orders of magnitude, and the network characteristic data with different dimensions and different orders of magnitude are unified in the same dimension and the same order of magnitude.

Further, based on the first embodiment of the industrial internet of things intrusion detection method, a second embodiment of the industrial internet of things intrusion detection method is provided. Referring to fig. 6, fig. 6 is a schematic flow chart of a second embodiment of the intrusion detection method for the industrial internet of things according to the present invention.

In practical application, after data to be detected are obtained, an intrusion detection intelligent body based on deep reinforcement learning training is adopted to judge intrusion behaviors and actions of the data to be detected so as to obtain an intrusion detection result.

Specifically, in this embodiment, the data to be detected is subjected to judgment of intrusion behavior and action to obtain an intrusion detection result, where the intrusion detection agent that obtains the intrusion detection result is obtained by training a training strategy and a loss function obtained by a reward cumulative function of environmental state simulation, and the environmental state is obtained according to historical industrial internet of things data.

The embodiment provides a specific implementation scheme for obtaining an intrusion detection agent through a training strategy obtained through a reward cumulative function simulated by an environment state and a loss function training, which specifically includes the following steps:

step A100: network sample data of the target Internet of things is obtained.

Step A200: and inputting the network sample data into an intrusion detection model to obtain an expected intrusion detection result.

Step A300: and judging whether the intrusion detection model meets a convergence condition or not according to the expected intrusion detection result.

Step A400: if not, adjusting the training strategy of the intrusion detection model by using a reward cumulative function, returning to the step of inputting the network sample data into the intrusion detection model, and circulating until the expected intrusion detection result meets the convergence condition to obtain the intrusion detection agent.

It should be noted that the network sample data is acquired target internet of things historical data, the target internet of things historical data is used for training the acquired intrusion detection model, the data type required for training is the same as the data to be detected, which is acquired by performing feature screening and preprocessing on the target internet of things original network data in the foregoing embodiment, and details are not repeated in this embodiment.

It should be noted that, when the intrusion detection model is trained according to network sample data, the intrusion detection model after each training needs to be determined, and therefore, an expected intrusion detection result of the network sample data processed by the intrusion detection model after each training needs to be obtained. Judging whether the intrusion detection model meets a convergence condition or not according to an expected intrusion detection result; if the intrusion detection model meets the convergence condition, acquiring an intrusion detection intelligent agent for intrusion detection; and if the intrusion detection model does not meet the convergence condition, adjusting the training strategy of the intrusion detection model by using a reward cumulative function, and returning to the step of inputting the network sample data into the intrusion detection model.

Specifically, in this embodiment, it is determined whether the intrusion detection model satisfies a convergence condition, where the convergence condition of the intrusion detection model is the convergence of the intrusion detection model or a preset step length of the intrusion detection model after training.

It should be noted that, in this embodiment, the step of adjusting the training strategy of the intrusion detection model by using the reward cumulative function specifically includes:

a401: and processing the strategy network and the value network of the training strategy based on the reward cumulative function to obtain the advantage function value of the training strategy and the ratio of updating the new strategy and the old strategy.

A402: and constructing a loss function of the training strategy according to the ratio of the merit function value to the updated new strategy and the updated old strategy.

A403: and updating strategy parameters of the training strategy by using the loss function so as to adjust the training strategy of the intrusion detection model.

Specifically, when a loss function of a training strategy is constructed according to a ratio of an advantage function value to an updated new strategy and an updated old strategy, in this embodiment, a PPO2 strategy gradient algorithm is used to process the ratio of the advantage function value to the updated new strategy and the updated old strategy, so as to obtain a gradient value of the training strategy; and processing the gradient value of the training strategy by using a random gradient rise algorithm to obtain a loss function of the strategy gradient.

It is easily understood that the PPO2 policy gradient algorithm is to calculate an estimation amount of policy gradient and insert it into a stochastic gradient boosting algorithm, and to calculate policy gradient loss to update the parameters of the policy network by performing stochastic gradient boosting on the policy parameter θ.

Specifically, the expression of the policy gradient algorithm is:

the expression for updating the policy network is: a. the^π(s,a)＝Q^π(s,a)-V^π(s). Wherein,

is the merit function estimate for the time step t. When in use

Is positive and the gradient is positive, the probability of these actions should be increased at this time and decreased conversely. Expectation of

Shows the empirical average of finite batch samples when the strategy pi is adopted_θIn general, a neural network takes as input a state observed from the environment, takes as output an action taken, log π_θIs the probability logarithm of the policy network output, a_tFor the action of time step t, s_tFor the state of time step t, Q^π(s, a) is a function of the action value, V^π(s) is a function of the state value.

In this embodiment, in order to prevent the too large oscillation amplitude during the intrusion detection model training, the PPO2 introduces an objective Function (Clipped simulation Function) to constrain the update ratio of the new policy and the old policy, so as to implement small batch update in multiple steps. Definition of

For the new and old policy proportions, the Conservative Policy Iteration (CPI) penalty expression is:

it should be noted that if there is no constraint, the maximization of CPI may cause gradient explosion, and the change of proportion far from 1 may be punished by using a shear function, and the policy gradient expression after the constraint is obtained is:

by passing

Updating the strategy, and obtaining an updated strategy gradient expression as follows:

where ε is 0.2 is a hyperparameter, the first term in the minimum is CPI, the second term modifies the replacement target by clipping the scale, which will guarantee r_tIn the corresponding stimulation interval (1- ε,1+ ε). Clip is a cut function and the min function makes the final target the lower bound of CPI. The ratio is ignored if and only if the target improves, and is taken into account when making the target worse.

Further, as shown in fig. 7, in this embodiment, in order to update the neural network in time, the policy network and the value network are fused, and the weights of the two networks are shared and updated at the same time. PPO2 formalizes a fixed-length track segment of a representation using a set of historical records of states and actions. In each iteration, N parallel agents collect data for T steps. Constructing loss on NT step, optimizing the loss by using small batch gradient descent or Adam optimizer, using MLP network sharing 3 hidden layers between strategy network and value network, 128 neurons at

layer

1, 64 neurons at layer 2 and 64 neurons at layer 3, and adding a linear unit ReLU activation function after each hidden layer.

And it is easy to understand that, this embodiment provides a training process for an intrusion detection agent for detecting data to be detected, in the training process, network sample data is processed through an intrusion detection model, before the intrusion detection model meets a convergence condition, a reward cumulative function is used to adjust a training strategy of the intrusion detection model and return to the step of inputting the network sample data into the intrusion detection model, and the process is circulated until the expected intrusion detection result meets the convergence condition to obtain the intrusion detection agent. The method can dynamically adjust the identification strategy of the industrial Internet of things according to the characteristics of continuous change of the environment and the structure of the industrial Internet of things, and obtain an accurate intrusion detection result.

Further, based on the first embodiment and the second embodiment of the industrial internet of things intrusion detection method, a third embodiment of the industrial internet of things intrusion detection method is provided. Referring to fig. 8, fig. 8 is a schematic flowchart of a third embodiment of the intrusion detection method for the industrial internet of things according to the present invention.

In practical application, the intrusion detection model used for training according to network sample data is an intrusion detection network architecture obtained based on historical internet of things data, and the network sample data is trained by using the intrusion detection network architecture, so that an intrusion detection intelligent agent is obtained.

The embodiment provides a specific implementation scheme of an intrusion detection network architecture based on historical internet of things data acquisition, which is specifically as follows:

step B100: and obtaining environmental state data based on historical internet of things data.

Step B200: and acquiring a reward cumulative function of the intrusion detection network framework based on the environment state data.

Step B300: and obtaining a cost function of the intrusion detection network framework according to the reward cumulative function, and obtaining a training strategy of the intrusion detection network framework according to the cost function.

Step B400: and obtaining a loss function of the intrusion detection network architecture by utilizing the reward cumulative function so as to obtain the intrusion detection network architecture.

It should be noted that the environmental state data is simulated by using a real industrial internet of things data set, and an environmental state required by the intrusion detection network architecture structure of the embodiment is formed. In this embodiment, the historical data set of the industrial internet of things comprises network interaction data and system state data; the network interaction data comprises an equipment address, a function code, a message length, message error check information, a message interval and the like; the system state data includes sensor measurements, monitoring inputs, distributed control states, and the like.

Specifically, based on historical internet of things data, obtaining environmental status data includes:

step B101: and obtaining common network users in the environment state data according to network user data in historical internet of things data.

Step B102: and acquiring malicious attackers in the environment state data according to attack intrusion data in historical data of the Internet of things.

Step B103: and obtaining a detection manager in the environment state data according to intrusion detection data in historical internet of things data.

It will be readily appreciated that based on the obtained environmental status data, the intrusion detection network infrastructure is able to sense the environmental status and obtain a reward accumulation function for the intrusion detection network infrastructure based on feedback signals provided by the environment. Wherein the intrusion detection network architecture senses the environmental state and provides a feedback signal r according to the environment_tSelecting an action that maximizes the future reward, i.e. from the current time step t, until the reward r of the final state_t,nIs R_t＝r_t,1+r_t,2+…+r_t,n。

Specifically, the step of obtaining the reward cumulative function of the intrusion detection network framework based on the environment state data comprises the following steps:

based on the environmental state data, obtaining a feedback signal r of each time step t_t(ii) a According to the feedback signal r of each time step t_tAnd acquiring a reward cumulative function of the intrusion detection network framework.

Wherein, according to the feedback signal r of each time step t_tThe step of obtaining the reward cumulative function of the intrusion detection network framework specifically comprises:

when the intrusion detection network architecture detects attack intrusion data and successfully classifies the attack intrusionType of data, obtaining a positive feedback signal r_t+ 1; or when the intrusion detection network framework does not detect attack intrusion data or detects that the attack intrusion data are not successfully classified, acquiring a negative feedback signal r_t-1; or when the intrusion detection network framework detects the network user data and does not send the intrusion detection data, no signal is fed back.

Further, to reduce uncertainty and randomness, the present embodiment uses a discount factor to reduce the strong correlation between steps, with the discounted future jackpot G_tInstead of the future reward.

Specifically, according to the feedback signal r of each time step t_tAfter the step of obtaining the reward accumulation function of the intrusion detection network framework, the method further comprises the following steps:

processing the reward cumulative function obtained by the intrusion detection network framework by using a discount factor to obtain the processed reward cumulative function; wherein the processed reward cumulative function expression is:

It should be noted that, after obtaining the reward accumulation function, the value function of the intrusion detection network architecture can be obtained according to the reward accumulation function, and then the training strategy and the loss function of the intrusion detection network architecture are obtained, so as to obtain the intrusion monitoring network architecture for intrusion detection.

Specifically, the steps of obtaining a cost function of the intrusion detection network framework according to the reward cumulative function and obtaining a training strategy of the intrusion detection network framework according to the cost function include:

step B301: and obtaining a cost function of the intrusion detection network framework according to the reward cumulative function.

Step B302: and acquiring a state value function and an action value function of the intrusion detection network framework based on the value function.

Step B303: and acquiring the value of each environment state data and the value of different actions under each environment state data according to the state value function and the action value function of the intrusion detection network framework.

Step B304: and selecting the action which enables the maximum value of the state value function and the action value function under the current environment state data to obtain the training strategy of the intrusion detection network framework.

It should be understood that the data for evaluating the state of the intrusion detection agent in different states, guiding the selection of the action of the agent, influencing the decision making of the next action by the intrusion detection network architecture can be obtained, and the degree of the state of the intrusion detection agent in a certain time t can be evaluated by a cost function. In this embodiment, Q is defined_π(s, a) is a function of the action value, V_π(s) is a function of the state value. The former is used to evaluate the current agent's expectation return from state s, performing action a, and subject to policy pi, and the latter represents the reward expectation from performing action a in state s, where,

in the embodiment, the action space in the intrusion detection network framework is a positive discrete value, "0" indicates that the intrusion detection network framework is predicted to be normal traffic, and "1, 2, …, n" indicates n types of attacks. Specifically, a Markov decision process is adopted to define a state value function and an action value function of an intrusion detection intelligent agent in the action decision process, and then the state value function or the action value function is formally expressed through a Bellman equation to complete the action decision process of the intrusion detection network framework.

In particular, the markov decision process has a markov property, namely: at time step t +1, the feedback of the environment depends only on the state and action a of the last time step t, with no correlation with time step t-1 and the time before t-1 step. The next state of the system is only relevant to the current state. Therefore, the temperature of the molten metal is controlled,the decision making process of the intrusion detection network architecture can be simplified. The Markov decision process of the system consists of five elements, S is a state space set, A is an action space set, and P_saRepresenting the probability of a state transition (the probability distribution of a transition to another state S 'after performing action a in state S, with an action reward written as P (S', R | S, a)), R being the reward function and γ being the discount factor. The Bellman equation will award Rt in time and the discount value gamma of the future state, the state value V (S) of time step t +1_t+1) Adding, reflecting the function V (S) of the state value in the current state_t) And the next time state value function V (S)_t+1) The relationship between them.

The Markov decision expression is:

MDP＝(S,A,P_sar, γ); wherein S ═ { S ═ S₁,S₂,…,S_n}；A＝{A₁,A₂,…,A_n}。

The expression of the Bellman equation is as follows:

similarly, the expression of the action value can be obtained as follows:

considering recursive updating of the Bellman equation, the Bellman equation expression is divided into an action value function and a state value function. When the next action is taken, the two value functions respectively follow the strategy pi to update the value function, wherein

Representing the state transition probability.

The expression of the state value function is:

the expression of the action value function is:

in the formula,

in order to be the probability of a state transition,

in the form of a set of state spaces,

for the action space set, R is the reward function, s is the state, a is the action, s 'is the next state for the transition of state s, and a' is the next action performed by action a.

It should be noted that after the training strategy of the intrusion detection network architecture is obtained, the loss function of the intrusion detection network architecture can be obtained according to the reward accumulation function, and then the intrusion monitoring network architecture for intrusion detection is obtained through the training strategy and the loss function.

Specifically, the step of obtaining the loss function of the intrusion detection network architecture by using the reward cumulative function to obtain the intrusion detection network architecture includes:

step B401: and processing the strategy network and the value network of the training strategy based on the reward cumulative function to obtain the advantage function value of the training strategy and the ratio of updating the new strategy and the old strategy.

Step B402: and constructing a loss function of the training strategy according to the ratio of the merit function value to the updated new strategy and the updated old strategy.

Step B403: and obtaining an intrusion detection network framework according to the training strategy and the loss function.

It should be noted that the PPO2 strategy gradient algorithm is described in detail in the foregoing embodiments, and will not be described herein.

Referring to fig. 9, fig. 9 is a block diagram of a structure of an intrusion detection device for an industrial internet of things according to the present invention.

As shown in fig. 9, an intrusion detection device for an industrial internet of things according to an embodiment of the present invention includes:

the data acquisition module 10 is used for acquiring original network data of a target internet of things;

a feature processing module 20, configured to perform feature screening on the original network data to obtain network feature data;

the preprocessing module 30 is configured to preprocess the network characteristic data to obtain data to be detected;

the detection module 40 is used for inputting the data to be detected into an intrusion detection agent obtained by training so as to obtain an intrusion detection result; the intrusion detection agent is obtained according to a training strategy and a loss function in a training mode, the training strategy and the loss function are obtained according to a reward cumulative function simulated by an environment state, and the environment state is obtained according to historical industrial Internet of things data.

Other embodiments or specific implementation manners of the industrial internet of things intrusion detection device according to the present invention may refer to the above method embodiments, and are not described herein again.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. An element defined by the phrase "comprising", without further limitation, does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the terms first, second, etc. do not denote any order, but rather the terms first, second, etc. are used to denote any order.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, where the computer software product is stored in a storage medium (e.g., a Read Only Memory (ROM)/Random Access Memory (RAM), a magnetic disk, an optical disk), and includes several instructions for enabling a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the methods according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. The industrial Internet of things intrusion detection method is characterized by comprising the following steps:

preprocessing the network characteristic data to obtain data to be detected;

2. The intrusion detection method for the industrial internet of things according to claim 1, wherein the step of performing feature screening on the original network data to obtain network feature data specifically comprises:

3. The intrusion detection method for the industrial internet of things according to claim 2, wherein the step of performing feature screening on the initial network feature data by using a LightGBM algorithm to obtain the network feature data specifically comprises:

4. The intrusion detection method for the industrial internet of things according to claim 1, wherein the step of preprocessing the network characteristic data to obtain the data to be detected specifically comprises:

5. The industrial internet of things intrusion detection method of claim 1, wherein the intrusion detection agent is obtained based on training set loop training, wherein the training strategy is optimized with the loss function in each training loop such that the intrusion detection agent is obtained when a preset convergence condition is satisfied.

6. The industrial internet of things intrusion detection method of claim 1, wherein the expression of the reward accumulation function is:

7. The industrial internet of things intrusion detection method according to claim 6, wherein the training strategy is obtained according to a state value function and an action value function obtained by a reward accumulation function; wherein:

the expression of the state value function is:

the expression of the action value function is:

in the formula,

in order to be the probability of a state transition,

in the form of a set of state spaces,

8. The utility model provides an industry thing networking intrusion detection device which characterized in that, industry thing networking intrusion detection device includes:

9. The utility model provides an industry thing networking intrusion detection equipment which characterized in that, industry thing networking intrusion detection equipment includes: the intrusion detection system comprises a memory, a processor and an industrial Internet of things intrusion detection program which is stored on the memory and can run on the processor, wherein when the industrial Internet of things intrusion detection program is executed by the processor, the steps of the intrusion detection method of the industrial Internet of things according to any one of claims 1 to 7 are realized.

10. A storage medium having an industrial internet of things intrusion detection program stored thereon, wherein the industrial internet of things intrusion detection program, when executed by a processor, implements the steps of the industrial internet of things intrusion detection method according to any one of claims 1 to 7.