CN116663499A - Intelligent industrial data processing method and system - Google Patents

Intelligent industrial data processing method and system Download PDF

Info

Publication number
CN116663499A
CN116663499A CN202310598570.4A CN202310598570A CN116663499A CN 116663499 A CN116663499 A CN 116663499A CN 202310598570 A CN202310598570 A CN 202310598570A CN 116663499 A CN116663499 A CN 116663499A
Authority
CN
China
Prior art keywords
feature vector
semantic understanding
industrial data
classification
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310598570.4A
Other languages
Chinese (zh)
Inventor
刘红军
李冠军
张雅暄
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Henan Tongli Smart Technology Co ltd
Original Assignee
Henan Tongli Smart Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Henan Tongli Smart Technology Co ltd filed Critical Henan Tongli Smart Technology Co ltd
Priority to CN202310598570.4A priority Critical patent/CN116663499A/en
Publication of CN116663499A publication Critical patent/CN116663499A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/02Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computer And Data Communications (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

An intelligent industrial data processing method and system, which acquire five-tuple information, external environmental characteristics and business behavior of industrial data; by adopting an artificial intelligence technology based on deep learning, quintuple information, external environment characteristics and semantic understanding relevance characteristic distribution information of business behaviors of industrial data are mined, so that confidentiality level classification of the industrial data is comprehensively carried out, and the safety of the industrial data is improved.

Description

Intelligent industrial data processing method and system
Technical Field
The present application relates to the field of intelligent processing technologies, and in particular, to a method and a system for intelligent industrial data processing.
Background
With the popularity of the industrial internet, production management data has covered enterprise parks, private data centers, public clouds, and industry regulatory authorities, but lacks an effective regulatory mechanism. The existing 5GUPF (User Plane Function ) technology can split data based on five-tuple, but the classification method is less and the security is insufficient, an operator is required to configure or build a virtual private network to manage, meanwhile, only simple splitting operation can be realized, other operations still need to be connected with other network elements in series, and the security and reliability of the data can be reduced. The existing SDN service chain is mainly used for distributed deployment in cloud and wide area networks, and is not suitable for enterprise export protection.
Accordingly, an optimized intelligent industrial data processing scheme is desired to improve the security of industrial data.
Disclosure of Invention
The present application has been made to solve the above-mentioned technical problems. The embodiment of the application provides an intelligent industrial data processing method and system, which acquire quintuple information, external environment characteristics and business behaviors of industrial data; by adopting an artificial intelligence technology based on deep learning, quintuple information, external environment characteristics and semantic understanding relevance characteristic distribution information of business behaviors of industrial data are mined, so that confidentiality level classification of the industrial data is comprehensively carried out, and the safety of the industrial data is improved.
In a first aspect, a method for intelligent industrial data processing is provided, which includes:
acquiring five-tuple information, external environment characteristics and business behaviors of industrial data;
passing the quintuple information of the industrial data through a context encoder comprising an embedded layer to obtain a quintuple semantic understanding feature vector;
passing the external environmental features through the context encoder comprising an embedded layer to obtain external environmental feature semantic understanding feature vectors;
the business behavior passes through the context encoder comprising the embedded layer to obtain a business behavior semantic understanding feature vector;
Fusing the five-tuple semantic understanding feature vector, the external environment feature semantic understanding feature vector and the business behavior semantic understanding feature vector to obtain a classification feature vector; and
and passing the classification feature vector through a classifier to obtain a classification result, wherein the classification result is used for representing the grade label of the industrial data.
In the above-mentioned intelligent industrial data processing method, passing the quintuple information of the industrial data through a context encoder including an embedded layer to obtain a quintuple semantic understanding feature vector, including: performing word segmentation processing on the quintuple information of the industrial data to convert the quintuple information of the industrial data into a first word sequence consisting of a plurality of first words; mapping each first word in the first word sequence to a word vector using an embedding layer of the context encoder including the embedding layer to obtain a sequence of first word vectors; and performing global-based context semantic coding on the sequence of the first word vectors by using the context encoder comprising the embedded layer to obtain the five-tuple semantic understanding feature vector.
In the above-mentioned intelligent industrial data processing method, performing global-based context semantic coding on the sequence of the first word vector by using the context encoder including the embedded layer to obtain the five-tuple semantic understanding feature vector, including: one-dimensional arrangement is carried out on the sequence of the first word vectors so as to obtain global word feature vectors; calculating the product between the global word feature vector and the transpose vector of each first word vector in the sequence of first word vectors to obtain a plurality of self-attention association matrices; respectively carrying out standardization processing on each self-attention correlation matrix in the plurality of self-attention correlation matrices to obtain a plurality of standardized self-attention correlation matrices; obtaining a plurality of probability values by using a Softmax classification function through each normalized self-attention correlation matrix in the normalized self-attention correlation matrices; and weighting each first word vector in the sequence of the first word vectors by taking each probability value in the plurality of probability values as a weight to obtain the five-tuple semantic understanding feature vector.
In the above method for intelligent industrial data processing, passing the external environmental feature through the context encoder including the embedded layer to obtain an external environmental feature semantic understanding feature vector, including: word segmentation processing is carried out on the external environment characteristics so as to convert the external environment characteristics into a second word sequence composed of a plurality of second words; mapping each second word in the sequence of second words to a word vector using an embedding layer of the context encoder including the embedding layer to obtain a sequence of second word vectors; and performing global-based context semantic coding on the sequence of the second word vectors by using the context encoder comprising the embedded layer to obtain the external environment feature semantic understanding feature vector.
In the above method for intelligent industrial data processing, the step of passing the business behavior through the context encoder including the embedded layer to obtain a business behavior semantic understanding feature vector includes: word segmentation processing is carried out on the business behaviors so as to convert the business behaviors into a third word sequence composed of a plurality of third words; mapping each third word in the sequence of third words to a word vector using an embedding layer of the context encoder including the embedding layer to obtain a sequence of third word vectors; and performing global-based context semantic coding on the sequence of the third word vectors by using the context encoder comprising the embedded layer to obtain the business behavior semantic understanding feature vector.
In the above method for processing intelligent industrial data, fusing the five-tuple semantic understanding feature vector, the external environment feature semantic understanding feature vector and the business behavior semantic understanding feature vector to obtain a classification feature vector includes: fusing the five-tuple semantic understanding feature vector, the external environment feature semantic understanding feature vector and the business behavior semantic understanding feature vector by using the following fusion formula to obtain a classification feature vector; wherein, the fusion formula is:
V s =λV a +βV b +αV c
wherein V is s Representing the classification feature vector, V a Representing the five-tuple semantic understanding feature vector, V b Representing the semantic understanding feature vector of the external environment features, V c Representing the service behavior semantic understanding feature vector, "+" represents the five-tuple semantic understanding feature vector, the external environment feature semantic understanding feature vector and elements at corresponding positions of the service behavior semantic understanding feature vector are added, and λ, β and α represent weighting parameters for controlling balance among the five-tuple semantic understanding feature vector, the external environment feature semantic understanding feature vector and the service behavior semantic understanding feature vector.
The intelligent industrial data processing method further comprises training the context encoder comprising the embedded layer and the classifier; wherein training the context encoder including the embedded layer and the classifier comprises: acquiring training data, wherein the training data comprises training quintuple information of industrial data, training external environment characteristics and training business behaviors, and a true value of a level label of the industrial data; passing the training quintuple information of the industrial data through the context encoder comprising the embedded layer to obtain training quintuple semantic understanding feature vectors; passing the training external environment features through the context encoder comprising an embedded layer to obtain training external environment feature semantic understanding feature vectors; passing the training business behavior through the context encoder comprising the embedded layer to obtain a training business behavior semantic understanding feature vector; fusing the training quintuple semantic understanding feature vector, the training external environment feature semantic understanding feature vector and the training business behavior semantic understanding feature vector to obtain a training classification feature vector; performing feature distribution optimization on the training classification feature vector to obtain an optimized training classification feature vector; the optimized training classification feature vector passes through the classifier to obtain a classification loss function value; and training the embedded layer-containing context encoder and the classifier based on the classification loss function value and by back propagation of gradient descent.
In the above method for intelligent industrial data processing, performing feature distribution optimization on the training classification feature vector to obtain an optimized training classification feature vector, including: performing Geng Beier normal periodic re-parameterization on the training classification feature vector by using the following optimization formula to obtain the optimized training classification feature vector; wherein, the optimization formula is:
wherein v is i The characteristic values of the positions of the training classification characteristic vector are represented, mu and sigma are respectively the mean value and the variance of the characteristic value set of the positions of the training classification characteristic vector, log represents a logarithmic function based on 2, arcsin(s) represents an arcsin function, arccos(s) represents an arccosine function, v i ' represents the feature values of the respective positions of the optimization training classification feature vector.
In the above method for intelligent industrial data processing, the step of passing the optimized training classification feature vector through the classifier to obtain a classification loss function value includes: the classifier processes the optimized training classification feature vector with a classification formula to generate a classification result, wherein the classification formula is: softmax { (W) n ,B n ):...:(W 1 ,B 1 ) X, where X represents the optimized training classification feature vector, W 1 To W n Is a weight matrix, B 1 To B n Representing a bias matrix; and calculating a cross entropy value between the classification result and a true value as the classification loss function value.
In a second aspect, a system for intelligent industrial data processing is provided, comprising:
the data acquisition module is used for acquiring quintuple information, external environment characteristics and business behaviors of the industrial data;
the quintuple encoding module is used for enabling quintuple information of the industrial data to pass through a context encoder comprising an embedded layer to obtain a quintuple semantic understanding feature vector;
the external environment coding module is used for enabling the external environment characteristics to pass through the context coder containing the embedded layer to obtain external environment characteristic semantic understanding characteristic vectors;
the business behavior coding module is used for enabling the business behavior to pass through the context encoder comprising the embedded layer to obtain a business behavior semantic understanding feature vector;
the fusion module is used for fusing the five-tuple semantic understanding feature vector, the external environment feature semantic understanding feature vector and the business behavior semantic understanding feature vector to obtain a classification feature vector;
and the grade generation module of the industrial data is used for passing the classification feature vector through a classifier to obtain a classification result, wherein the classification result is used for representing grade labels of the industrial data.
Compared with the prior art, the intelligent industrial data processing method and system provided by the application acquire five-tuple information, external environment characteristics and business behaviors of industrial data; by adopting an artificial intelligence technology based on deep learning, quintuple information, external environment characteristics and semantic understanding relevance characteristic distribution information of business behaviors of industrial data are mined, so that confidentiality level classification of the industrial data is comprehensively carried out, and the safety of the industrial data is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic view of a scenario of a method of intelligent industrial data processing according to an embodiment of the present application.
Fig. 2 is a flowchart of a method of intelligent industrial data processing according to an embodiment of the present application.
FIG. 3 is a schematic diagram of an intelligent industrial data processing method according to an embodiment of the present application.
Fig. 4 is a flowchart of the sub-steps of step 120 in a method of intelligent industrial data processing according to an embodiment of the present application.
Fig. 5 is a flowchart of the substeps of step 123 in a method of intelligent industrial data processing according to an embodiment of the present application.
Fig. 6 is a flow chart of the sub-steps of step 130 in a method of intelligent industrial data processing according to an embodiment of the present application.
FIG. 7 is a flowchart of the sub-steps of step 140 in a method of intelligent industrial data processing according to an embodiment of the present application.
Fig. 8 is a flowchart of the sub-steps of step 170 in a method of intelligent industrial data processing according to an embodiment of the present application.
Fig. 9 is a block diagram of a system for intelligent industrial data processing according to an embodiment of the present application.
Detailed Description
The following description of the technical solutions according to the embodiments of the present application will be given with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
Unless defined otherwise, all technical and scientific terms used in the embodiments of the application have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used in the present application is for the purpose of describing particular embodiments only and is not intended to limit the scope of the present application.
In describing embodiments of the present application, unless otherwise indicated and limited thereto, the term "connected" should be construed broadly, for example, it may be an electrical connection, or may be a communication between two elements, or may be a direct connection, or may be an indirect connection via an intermediate medium, and it will be understood by those skilled in the art that the specific meaning of the term may be interpreted according to circumstances.
It should be noted that, the term "first\second\third" related to the embodiment of the present application is merely to distinguish similar objects, and does not represent a specific order for the objects, it is to be understood that "first\second\third" may interchange a specific order or sequence where allowed. It is to be understood that the "first\second\third" distinguishing objects may be interchanged where appropriate such that embodiments of the application described herein may be practiced in sequences other than those illustrated or described herein.
As described above, although the existing 5GUPF technology can split data based on five-tuple, the classification method is less and the security is not enough, and an operator is required to configure or build a virtual private network to manage, and meanwhile, only a simple splitting operation can be realized, and other operations are still required to be connected with other network elements in series, so that the security and reliability of the data can be reduced. The existing SDN service chain is mainly used for distributed deployment in cloud and wide area networks, and is not suitable for enterprise export protection. Accordingly, an optimized intelligent industrial data processing scheme is desired to improve the security of industrial data.
Accordingly, it is considered that in the actual industrial data processing process, industrial data mainly includes the following categories: quintuple information, external environmental characteristics, and business behavior. Wherein the five-tuple information includes: a time stamp for recording a time point or a time period of data acquisition; a device identifier for uniquely identifying a number or name of the device; a sensor identifier for uniquely identifying the number or name of the data source sensor; the data type is used for recording the types of data, such as temperature, pressure, flow and the like; data value refers to a specific data value. The external environmental features include the following: weather factors such as temperature, humidity, noise and interference, electromagnetic interference, vibration and shock, light and radiation. It should be appreciated that these external environmental characteristics may reflect the production process and equipment operating conditions to take appropriate action to optimize production efficiency and product quality. The business behavior includes the following aspects: equipment control, fault detection and maintenance, quality control, production planning and scheduling, energy management, energy conservation and emission reduction. It should be appreciated that these business activities may reflect manufacturing processes and management efficiencies to take appropriate action to optimize production efficiency, improve product quality, and protect the environment.
Based on this, in the technical solution of the present application, it is desirable to perform the processing of industrial data based on five-tuple information of the industrial data, external environmental characteristics, and semantic understanding characteristics of business behavior, so as to perform the level classification of the industrial data, including, for example, a general level, an enterprise secret level, a country secret level, and the like, so as to improve the security of the industrial data. However, since each data item in the industrial data has respective semantic understanding feature information, the semantic understanding features all characterize security level characterization information of the industrial data, and each data item in the industrial data also has contextual semantic association features, which makes classification of the security level of the industrial data difficult. That is, in this process, it is difficult to mine five-tuple information, external environmental features and semantic understanding relevance feature distribution information of business behavior of the industrial data, so as to comprehensively classify the security level of the industrial data, thereby improving the security of the industrial data.
In recent years, deep learning and neural networks have been widely used in the fields of computer vision, natural language processing, text signal processing, and the like. The development of deep learning and neural networks provides new solutions and schemes for mining quintuple information of the industrial data, external environment characteristics and semantic understanding relevance characteristic distribution information of business behaviors.
Specifically, in the technical scheme of the application, firstly, five-tuple information, external environment characteristics and business behaviors of industrial data are acquired. The quintuple information comprises a time stamp, a device identifier, a sensor identifier, a data type and a data value; the external environmental characteristics include meteorological factors such as temperature and humidity, noise and interference, electromagnetic interference, vibration and shock, light and radiation. The business behavior comprises equipment control, fault detection and maintenance, quality control, production planning and scheduling, energy management and energy conservation and emission reduction. It should be understood that the quintuple information reflects basic information of industrial data, including data acquisition information, data type information and the like; the external environment characteristics can reflect the production process and the equipment operation condition, and corresponding measures can be adopted by utilizing the external environment characteristics to optimize the production efficiency and the product quality; the business behavior reflects the production process and the management efficiency, and can be used for taking corresponding measures to optimize the production efficiency, improve the product quality and protect the environment.
Then, it is considered that five-tuple information, external environmental characteristics and business behavior due to the industrial data are all composed of words, and that the words also have semantic association relations of context. And, it is also considered that in industrial data processing, five-tuple information, external environmental features and business behavior of the industrial data tend to be unstructured data, and can be converted into vector representations of fixed dimensions by a context encoder, so that feature extraction and classification are more convenient. Therefore, in order to enable classification judgment of security level of industrial data, it is necessary to make semantic understanding features of five-tuple information, external environmental features, and business behavior of the industrial data sufficiently expressed. Based on the above, in the technical scheme of the application, a context encoder comprising an embedded layer is used for respectively carrying out semantic understanding of five-tuple information, external environment characteristics and business behaviors of the industrial data, so that the five-tuple information, the external environment characteristics and the global context semantic association characteristic information based on the business behaviors of the industrial data are respectively extracted, and thus five-tuple semantic understanding characteristic vectors, external environment characteristic semantic understanding characteristic vectors and business behavior semantic understanding characteristic vectors are obtained.
Further, the five-tuple semantic understanding feature vector, the external environment feature semantic understanding feature vector and the business behavior semantic understanding feature vector are fused to obtain a classification feature vector, so that five-tuple information semantic understanding features of the industrial data are combined, the semantic understanding features of the external environment features and the semantic understanding features of the business behavior are integrated into a complete feature vector, interaction and dependency relations among the semantic understanding features of different data types of the industrial data are captured, and secret-level implicit feature information of the industrial data is better depicted. In addition, the five-tuple semantic understanding feature vector, the external environment feature semantic understanding feature vector and the feature information in the business behavior semantic understanding feature vector are fused, so that information loss and error transfer in a data processing process can be reduced, stability and reliability of a model are improved, and classification accuracy is improved.
And then, the classification feature vector is further subjected to classification processing in a classifier to obtain a classification result of the level label for representing the industrial data. That is, in the technical solution of the present application, the label of the classifier is a security level label of the industrial data, specifically including a general level, an enterprise security level, and a country security level, where the classifier determines to which classification label the classification feature vector belongs through a soft maximum function, so as to perform security level judgment of the industrial data.
In particular, in the technical solution of the present application, in order to make full use of semantic association information between basic quintuple information (including a timestamp, a device representation, a sensor representation, a data type, and a data value) of industrial data expressed by the quintuple semantic understanding feature vector, semantic information of external environmental features of the industrial data expressed by the external environmental feature semantic understanding feature vector, and business behavior semantic information of the industrial data expressed by the business behavior semantic understanding feature vector, the classification feature vector is preferably obtained by directly concatenating the quintuple semantic understanding feature vector, the external environmental feature semantic understanding feature vector, and the business behavior semantic understanding feature vector. However, a distribution gap (distribution gap) is introduced at the concatenation position of the five-tuple semantic understanding feature vector, the external environment feature semantic understanding feature vector, and the business behavior semantic understanding feature vector. On the other hand, although the five-tuple semantic understanding feature vector, the external environment feature semantic understanding feature vector and the business behavior semantic understanding feature vector are obtained by performing context semantic encoding by a semantic encoder with the same structure (i.e. obtained by performing context semantic encoding by a context encoder of the embedded layer), the five-tuple information of the industrial data, the data expression mode, the data length and other basic features of the external environment feature and the business behavior at the data source domain end are greatly different, which results in poor similarity and consistency among the five-tuple semantic understanding feature vector, the external environment feature semantic understanding feature vector and the data manifold of the business behavior semantic understanding feature vector in the high-dimensional feature space. The superposition of the two aspects can cause poor continuity of the overall feature distribution of the classification feature vector, and the training effect during model training is affected.
Based on this, the applicant of the present application performs a normal periodic re-parameterization of the classification feature vector, for example denoted V Geng Beier (gummel), to obtain an optimized classification feature vector V The method is specifically expressed as follows:
mu and sigma are respectively the eigenvalue sets v i Mean and variance of e V, and V i ∈V
Here, the Geng Beier normal periodic re-parameterization is performed by classifying the feature values V of the respective positions of the feature vector V i The random periodic operation mode based on Geng Beier (Gumbel) distribution introduces random periodic distribution in normal distribution of feature value set to obtain periodic continuous micro approximation of original feature distribution, so as to raise optimized classified feature vector V by periodic re-parameterization of feature During training, the gradient of the loss function counter-propagates dynamic continuous wave capability in the model to improve the embedding layerWen Bianma is applied dynamically during training to compensate for the influence of poor continuity of the feature distribution of the classification feature vector on training effects such as training speed and convergence result accuracy. Thus, the security level detection and judgment of the industrial data can be accurately performed, thereby improving the security of the industrial data.
Fig. 1 is a schematic view of a scenario of a method of intelligent industrial data processing according to an embodiment of the present application. As shown in fig. 1, in the application scenario, first, five-tuple information (e.g., C1 as illustrated in fig. 1), external environmental characteristics (e.g., C2 as illustrated in fig. 1), and business behavior (e.g., C3 as illustrated in fig. 1) of industrial data are acquired; the obtained quintuple information, external environmental characteristics, and business behavior are then input to a server (e.g., S as illustrated in fig. 1) deployed with an intelligent industrial data processing algorithm, wherein the server is capable of processing the quintuple information, the external environmental characteristics, and the business behavior based on the intelligent industrial data processing algorithm to generate a classification result for a level tag representing industrial data.
Having described the basic principles of the present application, various non-limiting embodiments of the present application will now be described in detail with reference to the accompanying drawings.
In one embodiment of the present application, FIG. 2 is a flow chart of a method of intelligent industrial data processing according to an embodiment of the present application. As shown in fig. 2, a method 100 for intelligent industrial data processing according to an embodiment of the present application includes: 110, acquiring five-tuple information, external environment characteristics and business behaviors of industrial data; 120, passing the quintuple information of the industrial data through a context encoder comprising an embedded layer to obtain a quintuple semantic understanding feature vector; 130, passing the external environment feature through the context encoder comprising an embedded layer to obtain an external environment feature semantic understanding feature vector; 140, passing the business behavior through the context encoder comprising the embedded layer to obtain a business behavior semantic understanding feature vector; 150, fusing the five-tuple semantic understanding feature vector, the external environment feature semantic understanding feature vector and the business behavior semantic understanding feature vector to obtain a classification feature vector; and 160, passing the classification feature vector through a classifier to obtain a classification result, wherein the classification result is used for representing the grade label of the industrial data.
FIG. 3 is a schematic diagram of an intelligent industrial data processing method according to an embodiment of the present application. As shown in fig. 3, in the network architecture, first, five-tuple information, external environmental characteristics, and business behavior of industrial data are acquired; then, passing the quintuple information of the industrial data through a context encoder comprising an embedded layer to obtain a quintuple semantic understanding feature vector; then, the external environment features pass through the context encoder comprising the embedded layer to obtain external environment feature semantic understanding feature vectors; then, the business behavior passes through the context encoder comprising the embedded layer to obtain a business behavior semantic understanding feature vector; then, fusing the five-tuple semantic understanding feature vector, the external environment feature semantic understanding feature vector and the business behavior semantic understanding feature vector to obtain a classification feature vector; and finally, the classification feature vector is passed through a classifier to obtain a classification result, wherein the classification result is used for representing the grade label of the industrial data.
Specifically, in step 110, quintuple information, external environmental characteristics, and business behavior of the industrial data are obtained. As described above, although the existing 5GUPF technology can split data based on five-tuple, the classification method is less and the security is not enough, and an operator is required to configure or build a virtual private network to manage, and meanwhile, only a simple splitting operation can be realized, and other operations are still required to be connected with other network elements in series, so that the security and reliability of the data can be reduced. The existing SDN service chain is mainly used for distributed deployment in cloud and wide area networks, and is not suitable for enterprise export protection. Accordingly, an optimized intelligent industrial data processing scheme is desired to improve the security of industrial data.
Accordingly, it is considered that in the actual industrial data processing process, industrial data mainly includes the following categories: quintuple information, external environmental characteristics, and business behavior. Wherein the five-tuple information includes: a time stamp for recording a time point or a time period of data acquisition; a device identifier for uniquely identifying a number or name of the device; a sensor identifier for uniquely identifying the number or name of the data source sensor; the data type is used for recording the types of data, such as temperature, pressure, flow and the like; data value refers to a specific data value. The external environmental features include the following: weather factors such as temperature, humidity, noise and interference, electromagnetic interference, vibration and shock, light and radiation. It should be appreciated that these external environmental characteristics may reflect the production process and equipment operating conditions to take appropriate action to optimize production efficiency and product quality. The business behavior includes the following aspects: equipment control, fault detection and maintenance, quality control, production planning and scheduling, energy management, energy conservation and emission reduction. It should be appreciated that these business activities may reflect manufacturing processes and management efficiencies to take appropriate action to optimize production efficiency, improve product quality, and protect the environment.
Based on this, in the technical solution of the present application, it is desirable to perform the processing of industrial data based on five-tuple information of the industrial data, external environmental characteristics, and semantic understanding characteristics of business behavior, so as to perform the level classification of the industrial data, including, for example, a general level, an enterprise secret level, a country secret level, and the like, so as to improve the security of the industrial data. However, since each data item in the industrial data has respective semantic understanding feature information, the semantic understanding features all characterize security level characterization information of the industrial data, and each data item in the industrial data also has contextual semantic association features, which makes classification of the security level of the industrial data difficult. That is, in this process, it is difficult to mine five-tuple information, external environmental features and semantic understanding relevance feature distribution information of business behavior of the industrial data, so as to comprehensively classify the security level of the industrial data, thereby improving the security of the industrial data.
In recent years, deep learning and neural networks have been widely used in the fields of computer vision, natural language processing, text signal processing, and the like. The development of deep learning and neural networks provides new solutions and schemes for mining quintuple information of the industrial data, external environment characteristics and semantic understanding relevance characteristic distribution information of business behaviors.
Specifically, in the technical scheme of the application, firstly, five-tuple information, external environment characteristics and business behaviors of industrial data are acquired. The quintuple information comprises a time stamp, a device identifier, a sensor identifier, a data type and a data value; the external environmental characteristics include meteorological factors such as temperature and humidity, noise and interference, electromagnetic interference, vibration and shock, light and radiation. The business behavior comprises equipment control, fault detection and maintenance, quality control, production planning and scheduling, energy management and energy conservation and emission reduction. It should be understood that the quintuple information reflects basic information of industrial data, including data acquisition information, data type information and the like; the external environment characteristics can reflect the production process and the equipment operation condition, and corresponding measures can be adopted by utilizing the external environment characteristics to optimize the production efficiency and the product quality; the business behavior reflects the production process and the management efficiency, and can be used for taking corresponding measures to optimize the production efficiency, improve the product quality and protect the environment.
Specifically, in step 120, step 130 and step 140, passing the quintuple information of the industrial data through a context encoder comprising an embedded layer to obtain a quintuple semantic understanding feature vector; passing the external environmental features through the context encoder comprising an embedded layer to obtain external environmental feature semantic understanding feature vectors; and passing the business behavior through the context encoder comprising the embedded layer to obtain a business behavior semantic understanding feature vector.
Then, it is considered that five-tuple information, external environmental characteristics and business behavior due to the industrial data are all composed of words, and that the words also have semantic association relations of context. And, it is also considered that in industrial data processing, five-tuple information, external environmental features and business behavior of the industrial data tend to be unstructured data, and can be converted into vector representations of fixed dimensions by a context encoder, so that feature extraction and classification are more convenient. Therefore, in order to enable classification judgment of security level of industrial data, it is necessary to make semantic understanding features of five-tuple information, external environmental features, and business behavior of the industrial data sufficiently expressed. Based on the above, in the technical scheme of the application, a context encoder comprising an embedded layer is used for respectively carrying out semantic understanding of five-tuple information, external environment characteristics and business behaviors of the industrial data, so that the five-tuple information, the external environment characteristics and the global context semantic association characteristic information based on the business behaviors of the industrial data are respectively extracted, and thus five-tuple semantic understanding characteristic vectors, external environment characteristic semantic understanding characteristic vectors and business behavior semantic understanding characteristic vectors are obtained.
FIG. 4 is a flowchart of sub-steps of step 120 in a method of intelligent industrial data processing according to an embodiment of the present application, as shown in FIG. 4, for passing quintuple information of the industrial data through a context encoder comprising an embedded layer to obtain a quintuple semantic understanding feature vector, comprising: 121, performing word segmentation processing on the five-tuple information of the industrial data to convert the five-tuple information of the industrial data into a first word sequence composed of a plurality of first words; 122 mapping each first word in the sequence of first words to a word vector using an embedding layer of the context encoder including the embedding layer to obtain a sequence of first word vectors; and 123, performing global-based context semantic coding on the sequence of the first word vectors by using the context encoder comprising the embedded layer to obtain the five-tuple semantic understanding feature vector.
FIG. 5 is a flowchart showing the sub-steps of step 123 in the intelligent industrial data processing method according to the embodiment of the present application, as shown in FIG. 5, for performing global-based context semantic coding on the sequence of the first word vectors using the context encoder including the embedded layer to obtain the five-tuple of semantic understood feature vectors, including: 1231, performing one-dimensional arrangement on the sequence of the first word vectors to obtain global word feature vectors; 1232, calculating the product between the global word feature vector and the transpose vector of each first word vector in the sequence of first word vectors to obtain a plurality of self-attention association matrices; 1233, respectively performing standardization processing on each self-attention correlation matrix in the plurality of self-attention correlation matrices to obtain a plurality of standardized self-attention correlation matrices; 1234, passing each normalized self-attention correlation matrix of the plurality of normalized self-attention correlation matrices through a Softmax classification function to obtain a plurality of probability values; and 1235, weighting each first word vector in the sequence of first word vectors with each probability value in the plurality of probability values as a weight to obtain the five-tuple semantic understanding feature vector.
FIG. 6 is a flowchart of the substeps of step 130 in the method for intelligent industrial data processing according to an embodiment of the present application, as shown in FIG. 6, the step of passing the external environmental feature through the context encoder including the embedded layer to obtain an external environmental feature semantic understanding feature vector, including: 131, performing word segmentation processing on the external environment feature to convert the external environment feature into a second word sequence composed of a plurality of second words; 132 mapping each second word in the sequence of second words to a word vector using an embedding layer of the context encoder including the embedding layer to obtain a sequence of second word vectors; and, 133, performing global-based context semantic coding on the sequence of second word vectors using the context encoder including the embedded layer to obtain the external environment feature semantic understanding feature vector.
FIG. 7 is a flowchart showing sub-steps of step 140 in a method for intelligent industrial data processing according to an embodiment of the present application, where, as shown in FIG. 7, the step of passing the business behavior through the context encoder including the embedded layer to obtain a business behavior semantic understanding feature vector includes: 141, word segmentation processing is carried out on the business behavior so as to convert the business behavior into a third word sequence composed of a plurality of third words; 142 mapping each third word in the sequence of third words to a word vector using the embedding layer of the context encoder including the embedding layer to obtain a sequence of third word vectors; and 143, performing global-based context semantic coding on the sequence of the third word vectors by using the context encoder comprising the embedded layer to obtain the business behavior semantic understanding feature vector.
The context encoder aims to mine for hidden patterns between contexts in the word sequence, optionally the encoder comprises: CNN (Convolutional Neural Network ), recurrent NN (RecursiveNeural Network, recurrent neural network), language Model (Language Model), and the like. The CNN-based method has a better extraction effect on local features, but has a poor effect on Long-Term Dependency (Long-Term Dependency) problems in sentences, so Bi-LSTM (Long Short-Term Memory) based encoders are widely used. The repetitive NN processes sentences as a tree structure rather than a sequence, has stronger representation capability in theory, but has the weaknesses of high sample marking difficulty, deep gradient disappearance, difficulty in parallel calculation and the like, so that the repetitive NN is less in practical application. The transducer has a network structure with wide application, has the characteristics of CNN and RNN, has a better extraction effect on global characteristics, and has a certain advantage in parallel calculation compared with RNN (RecurrentNeural Network ).
Specifically, in step 150, the five-tuple semantic understanding feature vector, the external environment feature semantic understanding feature vector, and the business behavior semantic understanding feature vector are fused to obtain a classification feature vector. Further, the five-tuple semantic understanding feature vector, the external environment feature semantic understanding feature vector and the business behavior semantic understanding feature vector are fused to obtain a classification feature vector, so that five-tuple information semantic understanding features of the industrial data are combined, the semantic understanding features of the external environment features and the semantic understanding features of the business behavior are integrated into a complete feature vector, interaction and dependency relations among the semantic understanding features of different data types of the industrial data are captured, and secret-level implicit feature information of the industrial data is better depicted. In addition, the five-tuple semantic understanding feature vector, the external environment feature semantic understanding feature vector and the feature information in the business behavior semantic understanding feature vector are fused, so that information loss and error transfer in a data processing process can be reduced, stability and reliability of a model are improved, and classification accuracy is improved.
Fusing the five-tuple semantic understanding feature vector, the external environment feature semantic understanding feature vector and the business behavior semantic understanding feature vector to obtain a classification feature vector, wherein the method comprises the following steps of: fusing the five-tuple semantic understanding feature vector, the external environment feature semantic understanding feature vector and the business behavior semantic understanding feature vector by using the following fusion formula to obtain a classification feature vector; wherein, the fusion formula is:
V s =λV a +βV b +αV c
wherein V is s Representing the classification feature vector, V a Representing the five-tuple semantic understanding feature vector, V b Representing the semantic understanding feature vector of the external environment features, V c Representing the service behavior semantic understanding feature vector, "+" represents the five-tuple semantic understanding feature vector, the external environment feature semantic understanding feature vector and elements at corresponding positions of the service behavior semantic understanding feature vector are added, and λ, β and α represent weighting parameters for controlling balance among the five-tuple semantic understanding feature vector, the external environment feature semantic understanding feature vector and the service behavior semantic understanding feature vector.
Specifically, in step 160, the classification feature vector is passed through a classifier to obtain a classification result, which is used to represent a class label of industrial data. And then, the classification feature vector is further subjected to classification processing in a classifier to obtain a classification result of the level label for representing the industrial data. That is, in the technical solution of the present application, the label of the classifier is a security level label of the industrial data, specifically including a general level, an enterprise security level, and a country security level, where the classifier determines to which classification label the classification feature vector belongs through a soft maximum function, so as to perform security level judgment of the industrial data.
Wherein, pass the classification feature vector through the classifier to obtain classification result, the classification result is used for representing the level label of industrial data, include: performing full-connection coding on the classification feature vectors by using a plurality of full-connection layers of the classifier to obtain coded classification feature vectors; and passing the coding classification feature vector through a Softmax classification function of the classifier to obtain the classification result.
The intelligent industrial data processing method further comprises training the context encoder containing the embedded layer and the classifier; FIG. 8 is a flowchart of the sub-steps of step 170 in a method of intelligent industrial data processing according to an embodiment of the present application, as shown in FIG. 8, training 170 the context encoder including the embedded layer and the classifier, including: 171, obtaining training data, wherein the training data comprises training quintuple information, training external environment characteristics and training business behaviors of industrial data, and a true value of a level label of the industrial data; 172, passing the training quintuple information of the industrial data through the context encoder comprising the embedded layer to obtain training quintuple semantic understanding feature vectors; 173, passing the training external environment feature through a context encoder of the embedded layer to obtain a training external environment feature semantic understanding feature vector; 174, passing the training business behavior through the context encoder comprising the embedded layer to obtain a training business behavior semantic understanding feature vector; 175, fusing the training quintuple semantic understanding feature vector, the training external environment feature semantic understanding feature vector and the training business behavior semantic understanding feature vector to obtain a training classification feature vector; 176, performing feature distribution optimization on the training classification feature vector to obtain an optimized training classification feature vector; 177, passing the optimized training classification feature vector through the classifier to obtain a classification loss function value; and, 178, training the embedded layer containing context encoder and the classifier based on the classification loss function value and by back propagation of gradient descent.
In particular, in the technical solution of the present application, in order to make full use of semantic association information between basic quintuple information (including a timestamp, a device representation, a sensor representation, a data type, and a data value) of industrial data expressed by the quintuple semantic understanding feature vector, semantic information of external environmental features of the industrial data expressed by the external environmental feature semantic understanding feature vector, and business behavior semantic information of the industrial data expressed by the business behavior semantic understanding feature vector, the classification feature vector is preferably obtained by directly concatenating the quintuple semantic understanding feature vector, the external environmental feature semantic understanding feature vector, and the business behavior semantic understanding feature vector. However, a distribution gap (distribution gap) is introduced at the concatenation position of the five-tuple semantic understanding feature vector, the external environment feature semantic understanding feature vector, and the business behavior semantic understanding feature vector.
On the other hand, although the five-tuple semantic understanding feature vector, the external environment feature semantic understanding feature vector and the business behavior semantic understanding feature vector are obtained by performing context semantic encoding by a semantic encoder with the same structure (i.e. obtained by performing context semantic encoding by a context encoder of the embedded layer), the five-tuple information of the industrial data, the data expression mode, the data length and other basic features of the external environment feature and the business behavior at the data source domain end are greatly different, which results in poor similarity and consistency among the five-tuple semantic understanding feature vector, the external environment feature semantic understanding feature vector and the data manifold of the business behavior semantic understanding feature vector in the high-dimensional feature space. The superposition of the two aspects can cause poor continuity of the overall feature distribution of the classification feature vector, and the training effect during model training is affected.
Based on this, the applicant of the present application is performing Geng Beier (gummel) on the classification feature vector, for exampleState periodic re-parameterization to obtain optimized classification feature vector V The method is specifically expressed as follows: performing Geng Beier normal periodic re-parameterization on the training classification feature vector by using the following optimization formula to obtain the optimized training classification feature vector; wherein, the optimization formula is:
wherein v is i The characteristic values of the positions of the training classification characteristic vector are represented, mu and sigma are respectively the mean value and the variance of the characteristic value set of the positions of the training classification characteristic vector, log represents a logarithmic function based on 2, arcsin(s) represents an arcsin function, arccos(s) represents an arccosine function, v i And the feature values of the positions of the optimized training classification feature vector are represented.
Here, the Geng Beier normal periodic re-parameterization is performed by classifying the feature values V of the respective positions of the feature vector V i The random periodic operation mode based on Geng Beier (Gumbel) distribution introduces random periodic distribution in normal distribution of feature value set to obtain periodic continuous micro approximation of original feature distribution, so as to raise optimized classified feature vector V by periodic re-parameterization of feature During training, the gradient of the loss function is in reverse propagation with dynamic continuous wave capacity in the model so as to improve the dynamic application of the context encoder comprising the embedded layer in the training process, thereby compensating the influence of poor continuity of the feature distribution of the classification feature vector on training effects, such as training speed and convergence result accuracy. Thus, the security level detection and judgment of the industrial data can be accurately performed, thereby improving the security of the industrial data.
Further, passing the optimized training classification feature vector through the classifier to obtain a classification loss function value, including: the classifier classifies the optimized training characteristics according to the following classification formulaThe symptom vector is processed to generate a classification result, wherein the classification formula is: softmax { (W) n ,B n ):…:(W 1 ,B 1 ) X, where X represents the optimized training classification feature vector, W 1 To W n Is a weight matrix, B 1 To B n Representing a bias matrix; and calculating a cross entropy value between the classification result and a true value as the classification loss function value.
In summary, a method 100 for intelligent industrial data processing according to an embodiment of the present application is illustrated, which obtains five-tuple information, external environmental features, and business behavior of industrial data; by adopting an artificial intelligence technology based on deep learning, quintuple information, external environment characteristics and semantic understanding relevance characteristic distribution information of business behaviors of industrial data are mined, so that confidentiality level classification of the industrial data is comprehensively carried out, and the safety of the industrial data is improved.
In one embodiment of the present application, FIG. 9 is a block diagram of a system for intelligent industrial data processing according to an embodiment of the present application. As shown in fig. 9, a system 200 for intelligent industrial data processing according to an embodiment of the present application includes: the data acquisition module 210 is configured to acquire quintuple information, external environmental characteristics and service behavior of industrial data; the quintuple encoding module 220 is configured to pass quintuple information of the industrial data through a context encoder including an embedded layer to obtain a quintuple semantic understanding feature vector; an external environment coding module 230, configured to pass the external environment feature through the context encoder including the embedded layer to obtain an external environment feature semantic understanding feature vector; a business behavior coding module 240, configured to pass the business behavior through the context encoder including the embedded layer to obtain a business behavior semantic understanding feature vector; the fusion module 250 is configured to fuse the five-tuple semantic understanding feature vector, the external environment feature semantic understanding feature vector, and the business behavior semantic understanding feature vector to obtain a classification feature vector; and a level generation module 260 for the industrial data, configured to pass the classification feature vector through a classifier to obtain a classification result, where the classification result is used to represent a level label of the industrial data.
In a specific example, in the above system for intelligent industrial data processing, the five-tuple encoding module includes: the five-element word unit is used for carrying out word segmentation processing on the five-element information of the industrial data so as to convert the five-element information of the industrial data into a first word sequence consisting of a plurality of first words; a five-tuple word mapping unit, configured to map each first word in the first word sequence to a word vector by using an embedding layer of the context encoder including the embedding layer to obtain a sequence of first word vectors; and a quintuple semantic coding unit, configured to perform global-based context semantic coding on the sequence of the first word vector by using the context encoder including the embedded layer to obtain the quintuple semantic understanding feature vector.
In a specific example, in the above system for intelligent industrial data processing, the five-tuple semantic coding unit includes: a one-dimensional arrangement subunit, configured to perform one-dimensional arrangement on the sequence of the first word vector to obtain a global word feature vector; the incidence matrix calculating subunit is used for calculating the product between the global word characteristic vector and the transpose vector of each first word vector in the sequence of the first word vectors to obtain a plurality of self-attention incidence matrices; the normalization processing subunit is used for respectively performing normalization processing on each self-attention correlation matrix in the plurality of self-attention correlation matrices to obtain a plurality of normalized self-attention correlation matrices; the classifying subunit is used for obtaining a plurality of probability values from each normalized self-attention correlation matrix in the normalized self-attention correlation matrices through a Softmax classifying function; and the weighting subunit is used for respectively weighting each first word vector in the sequence of the first word vectors by taking each probability value in the plurality of probability values as a weight so as to obtain the five-tuple semantic understanding feature vector.
In a specific example, in the above system for intelligent industrial data processing, the external environment coding module includes: the external environment word segmentation unit is used for carrying out word segmentation processing on the external environment characteristics so as to convert the external environment characteristics into a second word sequence composed of a plurality of second words; an external environment word mapping unit, configured to map each second word in the second word sequence to a word vector using an embedding layer of the context encoder including the embedding layer to obtain a sequence of second word vectors; and an external environment semantic coding unit, configured to perform global-based context semantic coding on the sequence of the second word vectors using the context encoder including the embedding layer to obtain the external environment feature semantic understanding feature vector.
In a specific example, in the above system for intelligent industrial data processing, the business behavior encoding module includes: the business behavior word segmentation unit is used for carrying out word segmentation processing on the business behavior so as to convert the business behavior into a third word sequence consisting of a plurality of third words; a business action word mapping unit, configured to map each third word in the third word sequence to a word vector by using the embedding layer of the context encoder including the embedding layer to obtain a sequence of third word vectors; and the business behavior semantic coding unit performs global-based context semantic coding on the sequence of the third word vector by using the context encoder comprising the embedded layer to obtain the business behavior semantic understanding feature vector.
In a specific example, in the above system for intelligent industrial data processing, the fusion module is configured to: fusing the five-tuple semantic understanding feature vector, the external environment feature semantic understanding feature vector and the business behavior semantic understanding feature vector by using the following fusion formula to obtain a classification feature vector; wherein, the fusion formula is:
V s =λV a +βV b +αV c
wherein V is s Representing the classification feature vector, V a Representing the five-tuple semantic understanding feature vector, V b Representing the semantic understanding feature vector of the external environment features, V c Representing the semantic understanding feature vector of the business behavior, and "+" representing the five-tuple semantic understanding feature vector and the external environment featureAnd adding elements at corresponding positions of the feature semantic understanding feature vector and the service behavior semantic understanding feature vector, wherein lambda, beta and alpha represent weighting parameters for controlling balance among the five-tuple semantic understanding feature vector, the external environment feature semantic understanding feature vector and the service behavior semantic understanding feature vector.
In a specific example, in the above system for intelligent industrial data processing, the system further includes a training module for training the context encoder including the embedded layer and the classifier; wherein, training module includes: the training data acquisition unit is used for acquiring training data, wherein the training data comprises training quintuple information of industrial data, training external environment characteristics and training service behaviors, and a true value of a level label of the industrial data; the training quintuple encoding unit is used for enabling training quintuple information of the industrial data to pass through the context encoder comprising the embedded layer to obtain training quintuple semantic understanding feature vectors; the training external environment coding unit is used for enabling the training external environment characteristics to pass through the context coder containing the embedded layer to obtain training external environment characteristic semantic understanding characteristic vectors; the training business behavior coding unit is used for enabling the training business behaviors to pass through the context encoder comprising the embedded layer to obtain training business behavior semantic understanding feature vectors; the training fusion unit is used for fusing the training five-tuple semantic understanding feature vector, the training external environment feature semantic understanding feature vector and the training business behavior semantic understanding feature vector to obtain a training classification feature vector; the training optimization unit is used for optimizing the feature distribution of the training classification feature vector to obtain an optimized training classification feature vector; the loss function value calculation unit is used for enabling the optimized training classification feature vector to pass through the classifier to obtain a classification loss function value; and a training unit for training the context encoder including the embedded layer and the classifier based on the classification loss function value and by back propagation of gradient descent.
In a specific example, in the above system for intelligent industrial data processing, the training optimizing unit is configured to: performing Geng Beier normal periodic re-parameterization on the training classification feature vector by using the following optimization formula to obtain the optimized training classification feature vector; wherein, the optimization formula is:
wherein v is i The characteristic values of the positions of the training classification characteristic vector are represented, mu and sigma are respectively the mean value and the variance of the characteristic value set of the positions of the training classification characteristic vector, log represents a logarithmic function based on 2, arcsin(s) represents an arcsin function, arccos(s) represents an arccosine function, v i ' represents the feature values of the respective positions of the optimization training classification feature vector.
In a specific example, in the above-described system for intelligent industrial data processing, the loss function value calculation unit includes: a classification subunit, configured to process the optimized training classification feature vector by the classifier according to the following classification formula to generate a classification result, where the classification formula is: softmax { (W) n ,B n ):...:(W 1 ,B 1 ) X, where X represents the optimized training classification feature vector, W 1 To W n Is a weight matrix, B 1 To B n Representing a bias matrix; and a calculating subunit for calculating a cross entropy value between the classification result and a true value as the classification loss function value.
Here, it will be understood by those skilled in the art that the specific functions and operations of the respective units and modules in the above-described system for intelligent industrial data processing have been described in detail in the above description of the method for intelligent industrial data processing with reference to fig. 1 to 8, and thus, repetitive descriptions thereof will be omitted.
As described above, the system 200 for intelligent industrial data processing according to the embodiment of the present application may be implemented in various terminal devices, such as a server for intelligent industrial data processing, and the like. In one example, the system 200 for intelligent industrial data processing according to embodiments of the present application may be integrated into a terminal device as a software module and/or hardware module. For example, the intelligent industrial data processing system 200 may be a software module in the operating system of the terminal device, or may be an application developed for the terminal device; of course, the intelligent industrial data processing system 200 can also be one of a plurality of hardware modules of the terminal device.
Alternatively, in another example, the system 200 for intelligent industrial data processing and the terminal device may be separate devices, and the system 200 for intelligent industrial data processing may be connected to the terminal device through a wired and/or wireless network and transmit interactive information in a agreed data format.
The present application also provides a computer program product comprising instructions which, when executed, cause an apparatus to perform operations corresponding to the above-described method.
In one embodiment of the present application, there is also provided a computer-readable storage medium storing a computer program for executing the above-described method.
It should be appreciated that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the forms of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects may be utilized. Furthermore, the computer program product may take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
Methods, systems, and computer program products of embodiments of the present application are described in the flow diagrams and/or block diagrams. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The basic principles of the present application have been described above in connection with specific embodiments, however, it should be noted that the advantages, benefits, effects, etc. mentioned in the present application are merely examples and not intended to be limiting, and these advantages, benefits, effects, etc. are not to be considered as essential to the various embodiments of the present application. Furthermore, the specific details disclosed herein are for purposes of illustration and understanding only, and are not intended to be limiting, as the application is not necessarily limited to practice with the above described specific details.
The block diagrams of the devices, apparatuses, devices, systems referred to in the present application are only illustrative examples and are not intended to require or imply that the connections, arrangements, configurations must be made in the manner shown in the block diagrams. As will be appreciated by one of skill in the art, the devices, apparatuses, devices, systems may be connected, arranged, configured in any manner. Words such as "including," "comprising," "having," and the like are words of openness and mean "including but not limited to," and are used interchangeably therewith. The terms "or" and "as used herein refer to and are used interchangeably with the term" and/or "unless the context clearly indicates otherwise. The term "such as" as used herein refers to, and is used interchangeably with, the phrase "such as, but not limited to.
It is also noted that in the apparatus, devices and methods of the present application, the components or steps may be disassembled and/or assembled. Such decomposition and/or recombination should be considered as equivalent aspects of the present application.
The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present application. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the application. Thus, the present application is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or terminal device comprising the element.
The foregoing description has been presented for purposes of illustration and description. Furthermore, this description is not intended to limit embodiments of the application to the form disclosed herein. Although a number of example aspects and embodiments have been discussed above, a person of ordinary skill in the art will recognize certain variations, modifications, alterations, additions, and subcombinations thereof.

Claims (10)

1. A method of intelligent industrial data processing, comprising:
acquiring five-tuple information, external environment characteristics and business behaviors of industrial data;
passing the quintuple information of the industrial data through a context encoder comprising an embedded layer to obtain a quintuple semantic understanding feature vector;
passing the external environmental features through the context encoder comprising an embedded layer to obtain external environmental feature semantic understanding feature vectors;
the business behavior passes through the context encoder comprising the embedded layer to obtain a business behavior semantic understanding feature vector;
fusing the five-tuple semantic understanding feature vector, the external environment feature semantic understanding feature vector and the business behavior semantic understanding feature vector to obtain a classification feature vector; and
and passing the classification feature vector through a classifier to obtain a classification result, wherein the classification result is used for representing the grade label of the industrial data.
2. The method of intelligent industrial data processing according to claim 1, wherein passing the quintuple information of the industrial data through a context encoder comprising an embedded layer to obtain a quintuple semantic understanding feature vector, comprises:
performing word segmentation processing on the quintuple information of the industrial data to convert the quintuple information of the industrial data into a first word sequence consisting of a plurality of first words;
mapping each first word in the first word sequence to a word vector using an embedding layer of the context encoder including the embedding layer to obtain a sequence of first word vectors; and
and performing global-based context semantic coding on the sequence of the first word vector by using the context encoder comprising the embedded layer to obtain the five-tuple semantic understanding feature vector.
3. The method of intelligent industrial data processing according to claim 2, wherein globally based context semantic coding the sequence of first word vectors using the context encoder comprising an embedded layer to obtain the five-tuple semantic understanding feature vector, comprising:
one-dimensional arrangement is carried out on the sequence of the first word vectors so as to obtain global word feature vectors;
Calculating the product between the global word feature vector and the transpose vector of each first word vector in the sequence of first word vectors to obtain a plurality of self-attention association matrices;
respectively carrying out standardization processing on each self-attention correlation matrix in the plurality of self-attention correlation matrices to obtain a plurality of standardized self-attention correlation matrices;
obtaining a plurality of probability values by using a Softmax classification function through each normalized self-attention correlation matrix in the normalized self-attention correlation matrices; and
and weighting each first word vector in the sequence of the first word vectors by taking each probability value in the plurality of probability values as a weight so as to obtain the five-tuple semantic understanding feature vector.
4. The method of intelligent industrial data processing according to claim 3, wherein passing the external environmental feature through the context encoder including an embedded layer to obtain an external environmental feature semantic understanding feature vector, comprises:
word segmentation processing is carried out on the external environment characteristics so as to convert the external environment characteristics into a second word sequence composed of a plurality of second words;
mapping each second word in the sequence of second words to a word vector using an embedding layer of the context encoder including the embedding layer to obtain a sequence of second word vectors; and
And performing global-based context semantic coding on the sequence of the second word vectors by using the context encoder comprising the embedded layer to obtain the external environment feature semantic understanding feature vector.
5. The method of intelligent industrial data processing according to claim 4, wherein passing the business behavior through the context encoder including an embedded layer to obtain a business behavior semantic understanding feature vector comprises:
word segmentation processing is carried out on the business behaviors so as to convert the business behaviors into a third word sequence composed of a plurality of third words;
mapping each third word in the sequence of third words to a word vector using an embedding layer of the context encoder including the embedding layer to obtain a sequence of third word vectors; and
and performing global-based context semantic coding on the sequence of the third word vector by using the context encoder comprising the embedded layer to obtain the business behavior semantic understanding feature vector.
6. The method of intelligent industrial data processing according to claim 5, wherein fusing the five-tuple semantic understanding feature vector, the external environment feature semantic understanding feature vector, and the business behavior semantic understanding feature vector to obtain a classification feature vector, comprises:
Fusing the five-tuple semantic understanding feature vector, the external environment feature semantic understanding feature vector and the business behavior semantic understanding feature vector by using the following fusion formula to obtain a classification feature vector;
wherein, the fusion formula is:
V s =λV a +βV b +αV c
wherein V is s Representing the classification feature vector, V a Representing the five-tuple semantic understanding feature vector, V b Representing the external environmentFeature semantic understanding feature vector, V c Representing the service behavior semantic understanding feature vector, "+" represents the five-tuple semantic understanding feature vector, the external environment feature semantic understanding feature vector and elements at corresponding positions of the service behavior semantic understanding feature vector are added, and λ, β and α represent weighting parameters for controlling balance among the five-tuple semantic understanding feature vector, the external environment feature semantic understanding feature vector and the service behavior semantic understanding feature vector.
7. The method of intelligent industrial data processing according to claim 6, further comprising training the context encoder including an embedded layer and the classifier;
wherein training the context encoder including the embedded layer and the classifier comprises:
Acquiring training data, wherein the training data comprises training quintuple information of industrial data, training external environment characteristics and training business behaviors, and a true value of a level label of the industrial data;
passing the training quintuple information of the industrial data through the context encoder comprising the embedded layer to obtain training quintuple semantic understanding feature vectors;
passing the training external environment features through the context encoder comprising an embedded layer to obtain training external environment feature semantic understanding feature vectors;
passing the training business behavior through the context encoder comprising the embedded layer to obtain a training business behavior semantic understanding feature vector;
fusing the training quintuple semantic understanding feature vector, the training external environment feature semantic understanding feature vector and the training business behavior semantic understanding feature vector to obtain a training classification feature vector;
performing feature distribution optimization on the training classification feature vector to obtain an optimized training classification feature vector;
the optimized training classification feature vector passes through the classifier to obtain a classification loss function value; and
the embedded layer-containing context encoder and the classifier are trained based on the classification loss function values and by back propagation of gradient descent.
8. The method of intelligent industrial data processing according to claim 7, wherein performing feature distribution optimization on the training classification feature vector to obtain an optimized training classification feature vector, comprises:
performing Geng Beier normal periodic re-parameterization on the training classification feature vector by using the following optimization formula to obtain the optimized training classification feature vector;
wherein, the optimization formula is:
wherein v is i The characteristic values of the positions of the training classification characteristic vector are represented, mu and sigma are respectively the mean value and the variance of the characteristic value set of the positions of the training classification characteristic vector, log represents a logarithmic function based on 2, arcsin(s) represents an arcsin function, arccos(s) represents an arccosine function, v i And the feature values of the positions of the optimized training classification feature vector are represented.
9. The method of intelligent industrial data processing according to claim 8, wherein passing the optimized training classification feature vector through the classifier to obtain a classification loss function value comprises:
the classifier processes the optimized training classification feature vector with a classification formula to generate a classification result, wherein the classification formula is: softmax { (W) n ,B n ):…:(W 1 ,B 1 ) X, where X represents the optimized training classification feature vector, W 1 To w n Is a weight matrix, B 1 To B n Representing a bias matrix; and
and calculating a cross entropy value between the classification result and a true value as the classification loss function value.
10. A system for intelligent industrial data processing, comprising:
the data acquisition module is used for acquiring quintuple information, external environment characteristics and business behaviors of the industrial data;
the quintuple encoding module is used for enabling quintuple information of the industrial data to pass through a context encoder comprising an embedded layer to obtain a quintuple semantic understanding feature vector;
the external environment coding module is used for enabling the external environment characteristics to pass through the context coder containing the embedded layer to obtain external environment characteristic semantic understanding characteristic vectors;
the business behavior coding module is used for enabling the business behavior to pass through the context encoder comprising the embedded layer to obtain a business behavior semantic understanding feature vector;
the fusion module is used for fusing the five-tuple semantic understanding feature vector, the external environment feature semantic understanding feature vector and the business behavior semantic understanding feature vector to obtain a classification feature vector; and
And the grade generation module of the industrial data is used for passing the classification feature vector through a classifier to obtain a classification result, wherein the classification result is used for representing grade labels of the industrial data.
CN202310598570.4A 2023-05-19 2023-05-19 Intelligent industrial data processing method and system Pending CN116663499A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310598570.4A CN116663499A (en) 2023-05-19 2023-05-19 Intelligent industrial data processing method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310598570.4A CN116663499A (en) 2023-05-19 2023-05-19 Intelligent industrial data processing method and system

Publications (1)

Publication Number Publication Date
CN116663499A true CN116663499A (en) 2023-08-29

Family

ID=87714662

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310598570.4A Pending CN116663499A (en) 2023-05-19 2023-05-19 Intelligent industrial data processing method and system

Country Status (1)

Country Link
CN (1) CN116663499A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117314709A (en) * 2023-11-30 2023-12-29 吉林省拓达环保设备工程有限公司 Intelligent monitoring system for sewage treatment progress

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117314709A (en) * 2023-11-30 2023-12-29 吉林省拓达环保设备工程有限公司 Intelligent monitoring system for sewage treatment progress

Similar Documents

Publication Publication Date Title
CN116627708B (en) Storage fault analysis system and method thereof
Wilson et al. Deep learning-aided cyber-attack detection in power transmission systems
CN113420296B (en) C source code vulnerability detection method based on Bert model and BiLSTM
CN109635928A (en) A kind of voltage sag reason recognition methods based on deep learning Model Fusion
CN113961759B (en) Abnormality detection method based on attribute map representation learning
CN113094200A (en) Application program fault prediction method and device
Yuan et al. Learning-based real-time event identification using rich real PMU data
CN116245513B (en) Automatic operation and maintenance system and method based on rule base
Narodytska Formal Analysis of Deep Binarized Neural Networks.
CN116405326B (en) Information security management method and system based on block chain
CN114443899A (en) Video classification method, device, equipment and medium
CN114462520A (en) Network intrusion detection method based on traffic classification
CN115951883B (en) Service component management system of distributed micro-service architecture and method thereof
CN116663499A (en) Intelligent industrial data processing method and system
CN117237559B (en) Digital twin city-oriented three-dimensional model data intelligent analysis method and system
CN116341518A (en) Data processing method and system for big data statistical analysis
CN116663540A (en) Financial event extraction method based on small sample
Fonseca et al. Model-agnostic approaches to handling noisy labels when training sound event classifiers
CN115982037A (en) Software defect prediction method based on abstract syntax tree
CN111159424A (en) Method, device, storage medium and electronic equipment for labeling knowledge graph entities
CN116757773A (en) Clothing electronic commerce sales management system and method thereof
Yan et al. Electricity theft identification algorithm based on auto-encoder neural network and random forest
CN115423105A (en) Pre-training language model construction method, system and device
Meng et al. Classification of customer service tickets in power system based on character and word level semantic understanding
CN118152358A (en) Data storage method and system based on network technology

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination