CN112639828A

CN112639828A - Data processing method, method and equipment for training neural network model

Info

Publication number: CN112639828A
Application number: CN201980010339.0A
Authority: CN
Inventors: 李成
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2019-08-07
Filing date: 2019-08-07
Publication date: 2021-04-09
Also published as: WO2021022521A1

Abstract

A method of data processing, comprising: acquiring a plurality of data to be processed (501); processing the data to be processed by using a first neural network model to obtain a plurality of first vectors (502) which are in one-to-one correspondence with the data to be processed, wherein the first neural network model is obtained based on general data training; acquiring first incidence relation information (503), wherein the first incidence relation information is used for indicating at least one first vector group, and each first vector group comprises two first vectors meeting the prior hypothesis; the plurality of first vectors and the first incidence relation information are input into a second neural network model, and a processing result (504) aiming at first data to be processed is obtained, wherein the first data to be processed is any one of the plurality of data to be processed. The method for processing the data aims to weaken the dependence of the neural network model on the data to be trained.

Description

Data processing method, method and equipment for training neural network model

Technical Field

The present application relates to the field of neural networks, and in particular, to a method for processing data in a neural network system, and a method and an apparatus for training a neural network model.

Background

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. Deep Learning (DL), an important branch of artificial intelligence, has been widely noticed and deeply studied in academia and industry, resulting in not only many theoretical innovative results, but also many practical applications in industry, such as image processing, speech recognition, motion analysis, etc.

The trained neural network model sometimes depends on the data to be trained, and the problems in other fields except the field where the data to be trained are located cannot be solved. For example, data to be trained is input into a deep neural network model, and an obtained data processing result is often matched with the characteristics of the input data; when the deep neural network model is actually used, the matching degree between the output result and the characteristics of the input data is poor. Therefore, in order to weaken the dependence degree of the neural network model on the data to be trained, a new method for constructing the neural network model needs to be provided.

Disclosure of Invention

The application provides a data processing method, a method and equipment for training a neural network model, and aims to weaken the dependence of the neural network model on data to be trained.

In a first aspect, a method for data processing is provided, including: acquiring a plurality of data to be processed; processing the plurality of data to be processed by using a first neural network model to obtain a plurality of first vectors which are in one-to-one correspondence with the plurality of data to be processed, wherein the first neural network model is obtained based on general data training; acquiring first incidence relation information, wherein the first incidence relation information is used for indicating at least one first vector group, and each first vector group comprises two first vectors meeting a priori assumption; and inputting the plurality of first vectors and the first incidence relation information into a second neural network model to obtain a processing result aiming at first data to be processed, wherein the first data to be processed is any one of the plurality of data to be processed.

Optionally, the first neural network model is a convolutional neural network model or a graph neural network model. For example, the first neural network model may be one of a deep convolutional neural network model, a graph attention neural network model.

In a possible implementation, the second neural network model is a graph network model, and accordingly, the plurality of first vectors serve as nodes of the graph network model, and the first association relationship serves as an edge of the graph network model.

The first neural network model and the second neural network model can be two submodels of a certain neural network model.

The first neural network model and the second neural network model may be stored on two different devices, that is to say the steps in the method of data processing provided herein may be performed by a plurality of devices. For example, the first device stores a first neural network model, the first device may perform the steps of "obtaining a plurality of data to be processed" and "processing the plurality of data to be processed using the first neural network model to obtain a plurality of first vectors corresponding to the plurality of data to be processed one by one", the second device stores a second neural network model, the second device may perform "obtaining first association information, the first incidence relation information is used for indicating at least one first vector group, each first vector group comprises two first vectors meeting the prior hypothesis, and the first vectors and the first incidence relation information are input into a second neural network model to obtain a processing result aiming at the first data to be processed, the first data to be processed is any data "of the plurality of data to be processed. Wherein the plurality of first vectors may be transmitted over a communication interface between the first device and the second device.

In the embodiment of the application, the first neural network model is trained by using the general data, and a general model which is not influenced by a scene or is slightly influenced by the scene can be obtained, so that the first neural network model can be applied to various scenes. However, since the application of the first neural network model is not limited by the scene, it is difficult to achieve high-accuracy recognition of an arbitrary scene using only the first neural network model. Therefore, a plurality of feature vectors output by the first neural network model can be input into the second neural network model, so that the first neural network model can be applied in a relatively special scene, and the second neural network model can learn the difference and the association between a general scene and a special scene. The existing neural network model can only identify a certain special scene generally, and once the neural network model is applied to other fields, most parameters of the neural network model cannot be used continuously. Because the second neural network model can learn the difference and the association between the general scene and the special scene, and because the data input into the first neural network model can be general data, the method provided by the application can weaken the limitation of the scene where the data to be processed is located on the architecture and the parameters of the neural network model. In addition, in order to enhance the identification accuracy of the second neural network model, the data related to the first data to be processed is considered while the first data to be processed is identified, and the identification accuracy of the second neural network model is increased due to the increase of the processing data amount. And, since the relevance between the data is taken into consideration, the learning of the data relation by the second neural network model can be enhanced.

With reference to the first aspect, in certain implementations of the first aspect, the first association relation information is used to indicate N first vector groups, where N is an integer greater than 1, and before the inputting the plurality of first vectors and the first association relation information into a second neural network model to obtain a processing result for the first to-be-processed data, the method further includes: acquiring second incidence relation information, wherein the second incidence relation information is used for indicating N second vector groups, the N second vector groups belong to the N first vector groups, N is smaller than N, and N is a positive integer; inputting the plurality of first vectors and the first incidence relation information into a second neural network model to obtain a processing result for first data to be processed, including: and inputting the plurality of first vectors, the first incidence relation information and the second incidence relation information into the second neural network model to obtain a processing result aiming at the first to-be-processed data.

In the embodiment of the present application, when the first association relationship information only indicates that an association relationship exists between two first vectors, the first association relationship information cannot reflect the strength of the association between the two first vectors. The second association relation information may indicate one or more first vector groups of the plurality of first vector groups, which have a stronger association relation or a weaker association relation, so that the second neural network model may, in addition to considering the to-be-processed data associated with the first to-be-processed data, strengthen the influence of the to-be-processed data closely associated with the first to-be-processed data on the first to-be-processed data, or weaken the influence of the to-be-processed data distantly associated with the first to-be-processed data on the first to-be-processed data, and thus may obtain more data volume to identify the first to-be-processed data.

With reference to the first aspect, in certain implementations of the first aspect, the acquiring a plurality of pieces of data to be processed includes: acquiring target data, wherein the target data is one of the plurality of data to be processed; acquiring association data, wherein the association data and the target data have an association relation meeting the prior assumption, and the plurality of data to be processed comprise the association data.

In the embodiment of the application, the associated data can be flexibly introduced according to the data to be processed, so that the flexibility of acquiring the data to be processed is improved, and unnecessary redundant data is prevented from being introduced.

With reference to the first aspect, in certain implementations of the first aspect, the first association relationship information includes a second association relationship matrix, a vector in the second association relationship matrix in the first dimension includes multiple elements in one-to-one correspondence with the multiple first vectors, and a vector in the second association relationship matrix in the second dimension includes multiple elements in one-to-one correspondence with the multiple first vectors, where any element in the second association relationship matrix is used to indicate whether a correlation that satisfies the prior assumption exists between a vector corresponding to the any element in the first dimension and a vector corresponding to the any element in the second dimension.

In the embodiment of the application, the incidence relation among a plurality of first vectors is expressed by using the matrix, so that a plurality of different types of data structures are prevented from being introduced into the second neural network model, and the simplicity and convenience in calculation are facilitated.

With reference to the first aspect, in certain implementations of the first aspect, the processing the plurality of data to be processed using the first neural network model includes: processing the plurality of data to be processed and fifth incidence relation information by using the first neural network model, wherein the fifth incidence relation information is used for indicating at least one data group to be processed, and each data group to be processed comprises two data to be processed which meet the prior assumption.

In the embodiment of the application, in order to enhance the identification accuracy of the first neural network model, the data associated with the first data to be processed is considered while identifying the first data to be processed, so that the identification accuracy of the first neural network model is increased due to the increase of the processing data amount. And, since the relevance between the data is taken into consideration, the learning of the data relation by the first neural network model can be enhanced.

With reference to the first aspect, in certain implementations of the first aspect, the fifth association information includes a first association matrix, a vector in the first dimension in the first association matrix includes a plurality of elements in one-to-one correspondence with the plurality of pieces of data to be processed, and a vector in the second dimension in the first association matrix includes a plurality of elements in one-to-one correspondence with the plurality of pieces of data to be processed, where any element in the first association matrix is used to indicate whether a correlation that satisfies the prior assumption exists between a vector corresponding to the any element in the first dimension and a vector corresponding to the any element in the second dimension.

In the embodiment of the application, the incidence relation among a plurality of data to be processed is expressed by using the matrix, so that a plurality of different types of data structures are prevented from being introduced into the first neural network model, and the simplicity and convenience in calculation are facilitated.

With reference to the first aspect, in certain implementations of the first aspect, the weight parameters of the second neural network model are obtained by: acquiring a plurality of data to be trained; processing the multiple data to be trained by using the first neural network model to obtain multiple fourth vectors which correspond to the multiple data to be trained one by one; obtaining third association relation information, wherein the third association relation information is used for indicating at least one third vector group, and each third vector group comprises two fourth vectors meeting the prior hypothesis; and inputting the fourth vectors and the third correlation information into the second neural network model to obtain a first processing result aiming at first data to be trained, wherein the first data to be trained is any one of the data to be trained, and the first processing result is used for correcting the weight parameters of the second neural network model.

In the embodiment of the application, the first neural network model is trained by using the general data, and a general model which is not influenced by a scene or is slightly influenced by the scene can be obtained, so that the first neural network model can be applied to various scenes. And inputting a plurality of feature vectors output by the first neural network model into the second neural network model, so that the second neural network model can realize the identification of relatively special scenes on the basis of the identification result of the first neural network model. The second neural network model can thus learn the differences and associations between generic scenarios and special scenarios. In order to enhance the recognition accuracy of the second neural network model, the data associated with the first data to be trained is considered while recognizing the first data to be trained. The amount of processing data is increased, so that the identification accuracy of the second neural network model is increased. And, since the relevance between the data is taken into consideration, the learning of the data relation by the second neural network model can be enhanced.

With reference to the first aspect, in certain implementations of the first aspect, the obtaining a first processing result for the first data to be trained includes: obtaining the first processing result and a second processing result aiming at second data to be trained, wherein the label of the first data to be trained is a first label, and the label of the second data to be trained is a second label; the method further comprises the following steps: and matching the similarity between the first label and the second label with the similarity between the first processing result and the second processing result to obtain a matching result, wherein the matching result is used for correcting the weight parameter of the second neural network model.

In the embodiment of the application, whether the similarity between the two processing results is proper or not can be judged through the similarity between the tags, and the learning of the second neural network model on the incidence relation between the data can be strengthened.

With reference to the first aspect, in certain implementations of the first aspect, the third correlation information is used to indicate M third vector groups, where M is an integer greater than 1, and before the inputting the plurality of fourth vectors and the third correlation information into the second neural network model to obtain the first processing result for the first data to be trained, the method further includes: acquiring fourth incidence relation information, wherein the fourth incidence relation information is used for indicating M fourth vector groups, the M fourth vector groups belong to the M third vector groups, M is smaller than M, and M is a positive integer; inputting the fourth vectors and the third correlation information into the second neural network model to obtain a first processing result for first data to be trained, including: and inputting the fourth vectors, the third association relation information and the fourth association relation information into the second neural network model to obtain the first processing result.

In the embodiment of the present application, when the third association information only indicates that an association exists between two fourth vectors, the third association information cannot reflect the strength of the association between the two fourth vectors. The second association relationship information may indicate one or more third vector groups with stronger or weaker association relationships among the plurality of third vector groups, so that the second neural network model may, in addition to considering the data to be trained associated with the first data to be trained, strengthen the influence of the data to be trained closely associated with the first data to be trained on the first data to be trained, or weaken the influence of the data to be trained distantly associated with the first data to be trained on the first data to be trained, and thus may obtain more data volume to identify the first data to be trained.

With reference to the first aspect, in certain implementations of the first aspect, the first processing result is further used to modify a weight parameter of the first neural network model.

In the embodiment of the application, since the association relationship between the data and the data can be learned in the training process, if the first processing result is also used for correcting the first neural network model, the capability of the first neural network model for learning the association relationship between the data and the data can be strengthened.

With reference to the first aspect, in certain implementations of the first aspect, the plurality of data to be trained includes one or more target type data, each target type data having a label for modifying the weight parameter.

In an embodiment of the present application, training the second neural network model may use a semi-supervised learning approach. That is, a part of the plurality of data to be trained has a label, and another part may not have a label. The two parts of data can be fused according to the third association relation information. Even if data without a tag is included in the data to be trained, the data without a tag can still be considered when modifying the second neural network model. Therefore, the number of labels of the data to be trained can be reduced, and the data processing amount for training the second neural network model is simplified.

With reference to the first aspect, in certain implementations of the first aspect, the third association information includes a fourth association matrix, a vector in the fourth association matrix in the first dimension includes multiple elements in one-to-one correspondence with the multiple fourth vectors, and a vector in the fourth association matrix in the second dimension includes multiple elements in one-to-one correspondence with the multiple fourth vectors, where any element in the fourth association matrix is used to indicate whether a correlation satisfying the prior assumption exists between a vector corresponding to the any element in the first dimension and a vector corresponding to the any element in the second dimension.

In the embodiment of the application, the incidence relation among a plurality of fourth vectors is expressed by using the matrix, so that a plurality of different types of data structures are prevented from being introduced into the second neural network model, and the simplicity and convenience in calculation are facilitated.

With reference to the first aspect, in certain implementations of the first aspect, the processing the plurality of data to be trained using the first neural network model includes: and processing the plurality of data to be trained and sixth incidence relation information by using the first neural network model, wherein the sixth incidence relation information is used for indicating at least one data group to be trained, and each data group to be trained comprises two data to be trained which meet the prior assumption.

In the embodiment of the application, in order to enhance the identification accuracy of the first neural network model, the data associated with the first data to be trained is considered while identifying the first data to be trained. The quantity of processing data is increased, so that the identification accuracy of the first neural network model is increased. And, since the relevance between the data is taken into consideration, the learning of the data relation by the first neural network model can be enhanced.

With reference to the first aspect, in certain implementations of the first aspect, the sixth association information includes a third association matrix, a vector in the third association matrix in the first dimension includes multiple elements in one-to-one correspondence to the multiple data to be trained, and a vector in the third association matrix in the second dimension includes multiple elements in one-to-one correspondence to the multiple data to be trained, where any element in the third association matrix is used to indicate whether a correlation that satisfies the prior assumption exists between a vector corresponding to the any element in the first dimension and a vector corresponding to the any element in the second dimension.

In the embodiment of the application, the incidence relation among a plurality of data to be trained is expressed by using the matrix, so that a plurality of different types of data structures are prevented from being introduced into the first neural network model, and the simplicity and convenience in calculation are facilitated.

In a second aspect, a method for training a neural network model is provided, including: acquiring a plurality of data to be trained; processing the multiple data to be trained by using a first neural network model to obtain multiple fourth vectors which correspond to the multiple data to be trained one by one; obtaining third association relation information, wherein the third association relation information is used for indicating at least one third vector group, and each third vector group comprises two fourth vectors meeting the prior hypothesis; and inputting the fourth vectors and the third correlation information into a second neural network model to obtain a first processing result aiming at first data to be trained, wherein the first data to be trained is any one of the data to be trained, and the first processing result is used for correcting the weight parameters of the second neural network model.

In the embodiment of the present application, the first neural network model may be obtained by training the training data of scenario 1. Inputting the data to be trained of the scene 2 into the first neural network model, and outputting a plurality of feature vectors; and inputting the plurality of feature vectors into a second neural network model, so that the second neural network model can realize the identification of the scene 2 on the basis of the identification result of the first neural network model. Thus, the second neural network model may learn the differences and associations between scenario 1 and scenario 2. In order to enhance the recognition accuracy of the second neural network model, the data associated with the first data to be trained is considered while recognizing the first data to be trained. The amount of processing data is increased, so that the identification accuracy of the second neural network model is increased. And, since the relevance between the data is taken into consideration, the learning of the data relation by the second neural network model can be enhanced.

With reference to the second aspect, in some implementations of the second aspect, the obtaining a first processing result for the first data to be trained includes: obtaining the first processing result and a second processing result aiming at second data to be trained, wherein the label of the first data to be trained is a first label, the label of the second data to be trained is a second label, and the first data to be trained and the second data to be trained are any two data in the plurality of data to be trained; the method further comprises the following steps: and matching the similarity between the first label and the second label with the similarity between the first processing result and the second processing result to obtain a matching result, wherein the matching result is used for correcting the weight parameter of the second neural network model.

With reference to the second aspect, in certain implementations of the second aspect, the third correlation information is used to indicate M third vector groups, and before the inputting the plurality of fourth vectors and the third correlation information into the second neural network model to obtain the first processing result for the first data to be trained, the method further includes: acquiring fourth incidence relation information, wherein the fourth incidence relation information is used for indicating M fourth vector groups, the M fourth vector groups belong to the M third vector groups, M is smaller than M, and M is a positive integer; inputting the fourth vectors and the third correlation information into the second neural network model to obtain a first processing result for first data to be trained, including: and inputting the fourth vectors, the third association relation information and the fourth association relation information into the second neural network model to obtain the first processing result.

With reference to the second aspect, in certain implementations of the second aspect, the first processing result is further used to modify a weight parameter of the first neural network model.

With reference to the second aspect, in certain implementations of the second aspect, the plurality of data to be trained includes one or more target type data, each target type data having a label for modifying the weight parameter.

With reference to the second aspect, in certain implementations of the second aspect, the third association information includes a fourth association matrix, a vector in the fourth association matrix in the first dimension includes multiple elements in one-to-one correspondence to the multiple fourth vectors, and a vector in the fourth association matrix in the second dimension includes multiple elements in one-to-one correspondence to the multiple fourth vectors, where any element in the fourth association matrix is used to indicate whether a correlation satisfying the prior assumption exists between the vector corresponding to the any element in the first dimension and the vector corresponding to the any element in the second dimension.

With reference to the second aspect, in some implementations of the second aspect, the processing the plurality of data to be trained using the first neural network model includes: and processing the plurality of data to be trained and sixth incidence relation information by using the first neural network model, wherein the sixth incidence relation information is used for indicating at least one data group to be trained, and each data group to be trained comprises two data to be trained which meet the prior assumption.

With reference to the second aspect, in certain implementations of the second aspect, the sixth association relationship information includes a third association relationship matrix, a vector in the third association relationship matrix in the first dimension includes multiple elements in one-to-one correspondence to the multiple data to be trained, and a vector in the third association relationship matrix in the second dimension includes multiple elements in one-to-one correspondence to the multiple data to be trained, where any element in the third association relationship matrix is used to indicate whether a correlation that satisfies the prior assumption exists between a vector corresponding to the any element in the first dimension and a vector corresponding to the any element in the second dimension.

With reference to the second aspect, in certain implementations of the second aspect, the first neural network model is obtained based on generic data training.

In the embodiment of the application, the first neural network model is trained by using the general data, and a general model which is not influenced by a scene or is slightly influenced by the scene can be obtained, so that the first neural network model can be applied to various scenes. And inputting a plurality of feature vectors output by the first neural network model into the second neural network model, so that the second neural network model can realize the identification of relatively special scenes on the basis of the identification result of the first neural network model. The second neural network model can thus learn the differences and associations between generic scenarios and special scenarios.

In a third aspect, a method for training a neural network model is provided, including: acquiring a plurality of data to be trained; inputting the multiple data to be trained and the seventh incidence relation information into a second neural network model to obtain a first processing result for first data to be trained and a second processing result for second data to be trained, wherein a label of the first data to be trained is a first label, a label of the second data to be trained is a second label, and the first data to be trained and the second data to be trained are any two data in the multiple data to be trained; the method further comprises the following steps: and matching the similarity between the first label and the second label with the similarity between the first processing result and the second processing result to obtain a matching result, wherein the matching result is used for correcting the weight parameter of the second neural network model.

With reference to the third aspect, in certain implementations of the third aspect, the method further includes: acquiring seventh incidence relation information, wherein the seventh incidence relation information is used for indicating at least one first training data group, and each first training data group comprises two data to be trained which meet the prior assumption.

In the embodiment of the application, in order to enhance the identification accuracy of the second neural network model, the data associated with the first data to be trained is considered while identifying the first data to be trained. The amount of processing data is increased, so that the identification accuracy of the second neural network model is increased. And, since the relevance between the data is taken into consideration, the learning of the data relation by the second neural network model can be enhanced.

With reference to the third aspect, in certain implementations of the third aspect, the seventh association information is used to indicate H first training data sets, and before the inputting the plurality of data to be trained and the seventh association information into the second neural network model to obtain the first processing result for the first data to be trained, the method further includes: acquiring eighth incidence relation information, wherein the eighth incidence relation information is used for indicating H second data groups to be trained, the H second data groups to be trained belong to the H first training data groups, H is smaller than H, and H is a positive integer; inputting the multiple data to be trained and the seventh incidence relation information into the second neural network model to obtain a first processing result for the first data to be trained, including: and inputting the plurality of data to be trained, the seventh incidence relation information and the eighth incidence relation information into the second neural network model to obtain the first processing result.

With reference to the third aspect, in certain implementations of the third aspect, the plurality of data to be trained includes one or more target type data, each target type data having a label for modifying the weight parameter.

With reference to the third aspect, in certain implementations of the third aspect, the seventh association relationship information includes a fifth association relationship matrix, a vector in the fifth association relationship matrix in the first dimension includes multiple elements in one-to-one correspondence with the multiple data to be trained, and a vector in the fifth association relationship matrix in the second dimension includes multiple elements in one-to-one correspondence with the multiple data to be trained, where any element in the fifth association relationship matrix is used to indicate whether there is an association relationship, which satisfies the prior assumption, between the data to be trained, which corresponds to the any element in the first dimension, and the data to be trained, which corresponds to the any element in the second dimension.

In a fourth aspect, an apparatus for data processing is provided, the apparatus comprising means for performing the method of the first aspect or any possible implementation manner of the first aspect.

Optionally, the device may be a cloud server or a terminal device.

In a fifth aspect, there is provided an apparatus for training a neural network model, the apparatus comprising means for performing the method of the second aspect or any possible implementation manner of the second aspect.

Optionally, the device may be a cloud server or a terminal device.

In a sixth aspect, there is provided an apparatus for training a neural network model, the apparatus comprising means for performing the method of the third aspect or any possible implementation manner of the third aspect.

Optionally, the device may be a cloud server or a terminal device.

In a seventh aspect, there is provided an apparatus for data processing, the apparatus comprising: a memory for storing a program; a processor for executing the memory-stored program, the processor being configured to perform the method of any one of the implementations of the first aspect when the memory-stored program is executed.

Optionally, the device may be a cloud server or a terminal device.

In an eighth aspect, there is provided an apparatus for training a neural network model, the apparatus comprising: a memory for storing a program; a processor for executing the memory-stored program, the processor being configured to perform the method of any one of the implementations of the second aspect when the memory-stored program is executed.

Optionally, the device may be a cloud server or a terminal device.

In a ninth aspect, there is provided an apparatus for training a neural network model, the apparatus comprising: a memory for storing a program; a processor for executing the program stored in the memory, and when the program stored in the memory is executed, the processor is configured to perform the method in any one of the implementation manners of the third aspect.

Optionally, the device may be a cloud server or a terminal device.

A tenth aspect provides a computer readable medium storing program code for execution by a device, the program code comprising instructions for performing the method of any one of the implementations of the first to third aspects.

In an eleventh aspect, a computer program product containing instructions is provided, which when run on a computer causes the computer to perform the method in any one of the implementations of the first to third aspects.

In a twelfth aspect, a chip is provided, where the chip includes a processor and a data interface, and the processor reads instructions stored in a memory through the data interface, and executes the method in any one implementation manner of the first aspect to the third aspect.

Optionally, as an implementation manner, the chip may further include a memory, where instructions are stored in the memory, and the processor is configured to execute the instructions stored in the memory, and when the instructions are executed, the processor is configured to execute the method in any one implementation manner of the first aspect to the third aspect.

Drawings

Fig. 1 is a schematic diagram of a convolutional neural network architecture according to an embodiment of the present disclosure.

Fig. 2 is a schematic diagram of a graph model provided in an embodiment of the present application.

Fig. 3 is a schematic diagram of a system architecture according to an embodiment of the present application.

Fig. 4 is a schematic diagram of a hardware structure of a chip according to an embodiment of the present disclosure.

Fig. 5 is a schematic diagram of a system architecture according to an embodiment of the present application.

Fig. 6 is a schematic flow chart of a method for data processing according to an embodiment of the present application.

Fig. 7 is a schematic flow chart of a method for training a neural network model according to an embodiment of the present application.

Fig. 8 is a schematic block diagram of a data processing apparatus according to an embodiment of the present application.

Fig. 9 is a schematic block diagram of an apparatus for training a neural network model according to an embodiment of the present disclosure.

Detailed Description

The technical solution in the present application will be described below with reference to the accompanying drawings.

(1) Neural network

The neural network may be composed of neural units, which may be referred to as x_sAnd an arithmetic unit with intercept 1 as input, the output of which may be:

wherein s is 1, 2, … … n, n is a natural number greater than 1, and W is_sIs x_sB is the bias of the neural unit. f is an activation function (activation functions) of the neural unit for introducing a nonlinear characteristic into the neural network to convert an input signal in the neural unit into an output signal. The output signal of the activation function may be used as an input to the next convolutional layer. The activation function may be a sigmoid function. A neural network is a network formed by a number of the above-mentioned single neural units joined together, i.e. the output of one neural unit may be the input of another neural unit. The input of each neural unit can be connected with the local receiving domain of the previous layer to extract the characteristics of the local receiving domain, and the local receiving domain can be a region composed of a plurality of neural units.

(2) Deep neural network

Deep Neural Networks (DNNs), also known as multi-layer neural networks, can be understood as neural networks having many hidden layers, where "many" has no particular metric. From the division of DNNs by the location of different layers, neural networks inside DNNs can be divided into three categories: input layer, hidden layer, output layer. Generally, the first layer is an input layer, the last layer is an output layer, and the middle layers are hidden layers. The layers are all connected, that is, any neuron of the ith layer is necessarily connected with any neuron of the (i + 1) th layer. Although DNN appears complex, it is not really complex in terms of the work of each layer, simply the following linear relational expression:

wherein the content of the first and second substances,

is the input vector of the input vector,

is the output vector of the output vector,

is an offset vector, W is a weight matrix (also called coefficient), and α () is an activation function. Each layer is only for the input vector

Obtaining the output vector through such simple operation

The coefficient W and offset due to the large number of DNN layers(Vector)

The number of the same is large. The definition of these parameters in DNN is as follows: taking coefficient W as an example: assume that in a three-layer DNN, the linear coefficients of the 4 th neuron of the second layer to the 2 nd neuron of the third layer are defined as

The superscript 3 represents the number of layers in which the coefficient W is located, while the subscripts correspond to the third layer index 2 of the output and the second layer index 4 of the input. The summary is that: the coefficients of the kth neuron of the L-1 th layer to the jth neuron of the L-1 th layer are defined as

Note that the input layer is without the W parameter. In deep neural networks, more hidden layers make the network more able to depict complex situations in the real world. Theoretically, the more parameters the higher the model complexity, the larger the "capacity", which means that it can accomplish more complex learning tasks. The final goal of the process of training the deep neural network, i.e., learning the weight matrix, is to obtain the weight matrix (the weight matrix formed by the vectors W of many layers) of all the layers of the deep neural network that is trained.

(3) Convolutional neural network

A Convolutional Neural Network (CNN) is a deep neural Network with a Convolutional structure. The convolutional neural network includes a feature extractor consisting of convolutional layers and sub-sampling layers. The feature extractor may be viewed as a filter and the convolution process may be viewed as convolving an input image or convolved feature plane (feature map) with a trainable filter. The convolutional layer is a neuron layer for performing convolutional processing on an input signal in a convolutional neural network. In convolutional layers of convolutional neural networks, one neuron may be connected to only a portion of the neighbor neurons. In a convolutional layer, there are usually several characteristic planes, and each characteristic plane may be composed of several neural units arranged in a rectangular shape. The neural units of the same feature plane share weights, where the shared weights are convolution kernels. Sharing weights may be understood as the way in which image information is extracted is location independent. The underlying principle is: the statistics of a certain part of the image are the same as the other parts. Meaning that image information learned in one part can also be used in another part. The same learned image information can be used for all positions on the image. In the same convolution layer, a plurality of convolution kernels can be used to extract different image information, and generally, the greater the number of convolution kernels, the more abundant the image information reflected by the convolution operation.

The convolution kernel can be initialized in the form of a matrix of random size, and can be learned to obtain reasonable weights in the training process of the convolutional neural network. In addition, sharing weights brings the direct benefit of reducing connections between layers of the convolutional neural network, while reducing the risk of overfitting.

As shown in fig. 1, Convolutional Neural Network (CNN)400 may include an input layer 410, a convolutional/pooling layer 420 (where the pooling layer is optional), and a neural network layer 430.

Convolutional/pooling layers 420:

and (3) rolling layers:

the convolutional/pooling layer 420 shown in fig. 1 may include layers such as examples 421 and 426, for example: in one implementation, 421 layers are convolutional layers, 422 layers are pooling layers, 423 layers are convolutional layers, 424 layers are pooling layers, 425 are convolutional layers, 426 are pooling layers; in another implementation, 421, 422 are convolutional layers, 423 are pooling layers, 424, 425 are convolutional layers, and 426 are pooling layers. I.e., the output of a convolutional layer may be used as input to a subsequent pooling layer, or may be used as input to another convolutional layer to continue the convolution operation.

The inner working principle of one convolution layer will be described below by taking convolution layer 421 as an example.

Convolution layer 421 may include a plurality of convolution operators, also called kernels, whose role in image processing is equivalent to a filter for extracting specific information from the input image matrix, and the convolution operator may be essentially a weight matrix, which is usually predefined, and during the convolution operation on the image, the weight matrix is usually processed on the input image pixel by pixel (or two pixels by two pixels … …, depending on the value of step size stride) in the horizontal direction, so as to complete the task of extracting specific features from the image. The size of the weight matrix should be related to the size of the image, and it should be noted that the depth dimension (depth dimension) of the weight matrix is the same as the depth dimension of the input image, and the weight matrix extends to the entire depth of the input image during the convolution operation. Thus, convolving with a single weight matrix will produce a single depth dimension of the convolved output, but in most cases not a single weight matrix is used, but a plurality of weight matrices of the same size (row by column), i.e. a plurality of matrices of the same type, are applied. The outputs of each weight matrix are stacked to form the depth dimension of the convolved image, where the dimension is understood to be determined by "plurality" as described above. Different weight matrices may be used to extract different features in the image, e.g., one weight matrix to extract image edge information, another weight matrix to extract a particular color of the image, yet another weight matrix to blur unwanted noise in the image, etc. The plurality of weight matrices have the same size (row × column), the feature maps extracted by the plurality of weight matrices having the same size also have the same size, and the extracted feature maps having the same size are combined to form the output of the convolution operation.

The weight values in these weight matrices need to be obtained through a large amount of training in practical application, and each weight matrix formed by the trained weight values can be used to extract information from the input image, so that the convolutional neural network 400 can make correct prediction.

When convolutional neural network 400 has multiple convolutional layers, the initial convolutional layer (e.g., 421) tends to extract more general features, which may also be referred to as low-level features; as the depth of convolutional neural network 400 increases, the more convolutional layers (e.g., 426) later extract more complex features, such as features with high levels of semantics, the more highly semantic features are suitable for the problem to be solved.

A pooling layer:

since it is often desirable to reduce the number of training parameters, it is often desirable to periodically introduce pooling layers after the convolutional layer, where the

layers

421 and 426 as illustrated by 420 in fig. 1 may be one convolutional layer followed by one pooling layer, or multiple convolutional layers followed by one or more pooling layers. During image processing, the only purpose of the pooling layer is to reduce the spatial size of the image. The pooling layer may include an average pooling operator and/or a maximum pooling operator for sampling the input image to smaller sized images. The average pooling operator may calculate pixel values in the image over a certain range to produce an average as a result of the average pooling. The max pooling operator may take the pixel with the largest value in a particular range as the result of the max pooling. In addition, just as the size of the weighting matrix used in the convolutional layer should be related to the image size, the operators in the pooling layer should also be related to the image size. The size of the image output after the processing by the pooling layer may be smaller than the size of the image input to the pooling layer, and each pixel point in the image output by the pooling layer represents an average value or a maximum value of a corresponding sub-region of the image input to the pooling layer.

The neural network layer 430:

after processing by convolutional layer/pooling layer 420, convolutional neural network 400 is not sufficient to output the required output information. Since, as previously mentioned, the convolutional/pooling layer 420 will only extract features and reduce the parameters brought by the input image. However, to generate the final output information (class information required or other relevant information), convolutional neural network 400 needs to utilize neural network layer 430 to generate one or a set of the number of required classes of output. Accordingly, a plurality of hidden layers (e.g., 431, 432, 43n shown in fig. 1) and an output layer 440 may be included in the neural network layer 430, and parameters included in the plurality of hidden layers may be pre-trained according to associated training data of a specific task type, for example, the task type may include image recognition, image classification, image super-resolution reconstruction, and the like.

After the hidden layers in the neural network layer 430, i.e., the last layer of the whole convolutional neural network 400 is the output layer 440, the output layer 440 has a loss function similar to the classification cross entropy, and is specifically used for calculating the prediction error, once the forward propagation (i.e., the propagation from the direction 410 to 440 in fig. 1) of the whole convolutional neural network 400 is completed, the backward propagation (i.e., the propagation from the direction 440 to 410 in fig. 1 is the backward propagation) starts to update the weight values and the bias of the aforementioned layers, so as to reduce the loss of the convolutional neural network 400, and the error between the result output by the convolutional neural network 400 through the output layer and the ideal result.

It should be noted that the convolutional neural network 400 shown in fig. 1 is only an example of a convolutional neural network, and in a specific application, the convolutional neural network may also exist in the form of other network models.

(4) Recurrent Neural Networks (RNNs) are used to process sequence data. In the traditional neural network model, from the input layer to the hidden layer to the output layer, the layers are all connected, and each node between every two layers is connectionless. Although the common neural network solves a plurality of problems, the common neural network still has no capability for solving a plurality of problems. For example, you would typically need to use the previous word to predict what the next word in a sentence is, because the previous and next words in a sentence are not independent. The RNN is called a recurrent neural network, i.e., the current output of a sequence is also related to the previous output. The concrete expression is that the network memorizes the previous information and applies the previous information to the calculation of the current output, namely, the nodes between the hidden layers are not connected any more but connected, and the input of the hidden layer not only comprises the output of the input layer but also comprises the output of the hidden layer at the last moment. In theory, RNNs can process sequence data of any length. The training for RNN is the same as for conventional CNN or DNN. The error back-propagation algorithm is also used, but with a little difference: that is, if the RNN is network-deployed, the parameters therein, such as W, are shared; this is not the case with the conventional neural networks described above by way of example. And in the use of the gradient descent algorithm, the output of each step not only depends on the network of the current step, but also depends on the states of the networks of the previous steps. This learning algorithm is called a time-based back propagation time (BPTT).

(5) Loss function

In the process of training the deep neural network, because the output of the deep neural network is expected to be as close to the value really expected to be predicted as possible, the weight vector of each layer of the neural network can be updated according to the difference between the predicted value of the current network and the really expected target value (of course, an initialization process is usually carried out before the first updating, namely parameters are preset for each layer in the deep neural network), for example, if the predicted value of the network is high, the weight vector is adjusted to be slightly lower, and the adjustment is carried out continuously until the deep neural network can predict the really expected target value or the value which is very close to the really expected target value. Therefore, it is necessary to define in advance "how to compare the difference between the predicted value and the target value", which are loss functions (loss functions) or objective functions (objective functions), which are important equations for measuring the difference between the predicted value and the target value. Taking the loss function as an example, if the higher the output value (loss) of the loss function indicates the larger the difference, the training of the deep neural network becomes the process of reducing the loss as much as possible. The loss function is usually a multivariable function, the gradient can reflect the change rate of the output value of the loss function when the variable changes, the larger the absolute value of the gradient is, the larger the change rate of the output value of the loss function is, the gradient of the loss function when different parameters are updated can be calculated, the parameter is continuously updated along the direction in which the gradient is fastest to descend, and the output value of the loss function is reduced as soon as possible.

(6) Back propagation algorithm

The convolutional neural network can adopt a Back Propagation (BP) algorithm to correct the size of parameters in the initial super-resolution model in the training process, so that the reconstruction error loss of the super-resolution model is smaller and smaller. Specifically, error loss occurs when an input signal is transmitted in a forward direction until the input signal is output, and parameters in an initial super-resolution model are updated by reversely propagating error loss information, so that the error loss is converged. The back propagation algorithm is a back propagation motion with error loss as a dominant factor, aiming at obtaining the optimal parameters of the super-resolution model, such as a weight matrix.

(7) Generative countermeasure network

Generative Adaptive Networks (GANs) are a deep learning model. The model comprises at least two modules: one module is a generative model (generative model), and the other module is a discriminant model (discriminant model), and the two modules are used for mutually game learning, so that better output is generated. The generative model and the discriminant model may be both neural networks, specifically, deep neural networks, or convolutional neural networks. The basic principle of GAN is as follows: taking GAN for generating pictures as an example, assume that there are two networks, G (generator) and d (discriminator), where G is a network for generating pictures, which receives a random noise z, and generates pictures by this noise, denoted as G (z); d is a discrimination network for discriminating whether a picture is "real". The input parameter is x, x represents a picture, and the output D (x) represents the probability that x is a real picture, if the probability is 1, 100% of the picture is a real picture, and if the probability is 0, the picture cannot be a real picture. In the process of training the generating countermeasure network, the aim of generating the network G is to generate a real picture as much as possible to deceive the discrimination network D, and the aim of discriminating the network D is to distinguish the picture generated by G from the real picture as much as possible. Thus, G and D constitute a dynamic "gaming" process, i.e., "play" in a "generative play network". As a result of the final game, in an ideal state, G can generate enough pictures G (z) to be "fake" and D cannot easily determine whether the generated pictures are true or not, i.e., D (G (z)) is 0.5. This results in an excellent generative model G which can be used to generate pictures.

(8) Graph neural network

In computer science, a graph is a data structure composed of two parts, a node and an edge between the node and the node, and thus, the graph can be represented by the formula G ═ V, E, G is a graph, V is a node set, and E is an edge set, as shown in fig. 2. Nodes are sometimes also referred to as vertices. The edge between node n1 and node n2 may be represented as (n1, n 2). A Graph Neural Network (GNN) is a neural network that runs directly on a graph data structure. The label of the node n in the node set can be represented by a vector, and the label of the edge (n1, n2) in the edge set can also be represented by a vector. Thus, the characteristics of node n1 and/or n2 may be obtained from the labels of nodes n1, n2, and the labels of edges (n1, n 2). The graph neural network may include an input layer, an output layer, and one or more hidden layers.

The purpose of the graph neural network is to train out a state embedding function h_v＝f(x _v,x _co[v],h _ne[v],x _ne[v]). Wherein h is_vIs the state (state) of node v, x_vIs a characteristic representation of node v, x_co[v]Is a characteristic representation of an edge associated with node v, h_ne[v]Is the state of the other node associated with node v, x_ne[v]Is a characteristic representation of the other nodes associated with node v. Taking the node 1 shown in fig. 2 as an example, edges exist between the nodes 2, 3, 4, and 6 inside the dotted line and the node 1, and the nodes 2, 3, 4, and 6 are all nodes associated with the node 1. An edge exists between node v and node i, and node i is the node associated with node v, which may be referred to as a neighbor node of node v.

The output function of the neural network model is o_v＝g(h _v,x _v) Optimizing the neural network by the loss function loss,

wherein t is_vIs the label of node v.

(9) Graph convolution neural network

A graph convolution neural network (GCN) is a method capable of performing deep learning on graph data, which can be understood as an application of a graph neural network in a convolution neural network. Graph convolutional neural networks are generally divided into two categories, spectral methods (spectral approaches) and non-spectral methods (non-spectral approaches). The spectrum method is based on the spectrum representation of a graph, and defines convolution operation in a Fourier domain through the characteristic decomposition of a graph Laplacian operator, wherein the convolution operation needs intensive matrix calculation and non-local space filtering calculation. The non-spectral method is to directly convolve on the graph rather than on the spectrum of the graph. However, the graph convolution neural network depends on the structure information of the graph, so that the model trained on a specific graph structure cannot be directly used on other graph structures. The graph volume calculator may be:

wherein the content of the first and second substances,

representing the feature expression of node i at level l, c_ijRepresenting a normalization factor, related to the graph structure, N_iRepresenting a node associated with node i, which may comprise node i itself, R_jIndicating the type of node i. The expression capability of the model is enhanced by collecting the characteristic information of each node and making nonlinear change.

(10) Graph attention neural network

The graph attention network (GAT) comprises a graph attention core layer, attention is distributed to a neighbor node set in association relation with a node i through an implicit self-attention layer, different weights are distributed to the node i according to the characteristics of the neighbor nodes, and the characteristics of the neighbor nodes are weighted and summed. Unlike the graph-convolution neural network, the graph attention network may not depend on a specific graph structure. The attention network of the graph implements attention distribution on each node under the association structure of the graph through a multi-layer multi-head attention mechanism, so that the information obtained by each node from other related nodes can be calculated. The nature of the multi-head attention mechanism is essentially a weighted sum, with weights derived from the learned attention matrix and the node's own information. Therefore, the network is different from the graph convolution neural network, and the learned parameters of the network do not depend on a specific graph structure.

Referring to fig. 3, the present embodiment provides a system architecture 100. As shown in the system architecture 100, the data acquisition device 160 is configured to acquire data to be trained, where the data to be trained in this embodiment includes: image data, video data, audio data, text data, and the like; and stores the data to be trained in database 130, and training device 120 trains to obtain target model/rule 101 based on the data to be trained maintained in database 130. In the following, how the training device 120 obtains the target model/rule 101 based on the data to be trained will be described in more detail in an embodiment, where the target model/rule 101 can be used to implement the method for training the neural network model provided in the embodiment of the present application, that is, the target model/rule 101 may include a first neural network model and a second neural network model, the data to be trained is input into the first neural network model to obtain a plurality of fourth vectors, the plurality of fourth vectors are input into the second neural network model, and the weight parameters of the target model/rule 101 are adjusted through a loss function, so that the trained target model/rule 101 can be obtained. It should be noted that, in practical applications, the data to be trained maintained in the database 130 is not necessarily acquired by the data acquisition device 160, and may be received from other devices. It should be noted that, the training device 120 does not necessarily perform the training of the target model/rule 101 based on the data to be trained maintained by the database 130, and may also obtain the data to be trained from the cloud or other places for performing the model training, and the above description should not be taken as a limitation to the embodiments of the present application.

The target model/rule 101 obtained by training according to the training device 120 may be applied to different systems or devices, for example, the execution device 110 shown in fig. 3, where the execution device 110 may be a terminal, such as a mobile phone terminal, a tablet computer, a notebook computer, an AR/VR, a vehicle-mounted terminal, and may also be a server or a cloud. In fig. 3, the execution device 110 is configured with an input/output interface 112 for data interaction with an external device, and a user can input data to the input/output interface 112 through the client device 140, wherein the input data may include a plurality of data to be processed in the embodiment of the present application.

The preprocessing module 113 is configured to perform preprocessing according to input data (such as the image data, the video data, the audio data, the text data, and the like, which may be data to be processed in this embodiment) received by the input/output interface 112, and in this embodiment, the preprocessing module 113 may be configured to, for example, extract features of the input data.

In the process that the execution device 110 preprocesses the input data or in the process that the calculation module 111 of the execution device 110 executes the calculation or other related processes, the execution device 110 may call the data, the code, and the like in the data storage system 150 for corresponding processes, and may store the data, the instruction, and the like obtained by corresponding processes in the data storage system 150.

Finally, the input/output interface 112 returns the processing result to the client device 140, thereby providing it to the user.

It should be noted that the training device 120 may generate corresponding target models/rules 101 based on different data to be trained for different targets or different tasks, and the corresponding target models/rules 101 may be used to achieve the targets or complete the tasks, so as to provide the user with the required results.

In the case shown in fig. 3, the user may manually give input data, which may be operated through an interface provided by the input/output interface 112. Alternatively, the client device 140 may automatically send the input data to the input/output interface 112, and if the client device 140 is required to automatically send the input data to obtain authorization from the user, the user may set the corresponding rights in the client device 140. The user can view the result output by the execution device 110 at the client device 140, and the specific presentation form can be display, sound, action, and the like. The client device 140 may also be used as a data collection terminal, which collects the input data of the input/output interface 112 and the output result of the output input/output interface 112 as new sample data and stores the new sample data in the database 130. Of course, the input data of the input/output interface 112 and the output result of the output input/output interface 112 as shown in the figure may be directly stored in the database 130 as new sample data by the input/output interface 112 without being collected by the client device 140.

It should be noted that fig. 3 is only a schematic diagram of a system architecture provided by an embodiment of the present application, and the position relationship between the devices, modules, and the like shown in the diagram does not constitute any limitation, for example, in fig. 3, the data storage system 150 is an external memory with respect to the execution device 110, and in other cases, the data storage system 150 may also be disposed in the execution device 110.

As shown in fig. 3, the target model/rule 101 is obtained by training according to the training device 120, and the target model/rule 101 in this embodiment may include a first neural network model and a second neural network model in this embodiment, where the first neural network model may be a convolutional neural network model or a graph neural network model, and the second neural network model may be a graph neural network model.

A hardware structure of a chip provided in an embodiment of the present application is described below.

Fig. 4 is a hardware structure of a chip provided in an embodiment of the present application, where the chip includes a neural network processor 20.

The Neural-Network Processing Unit (NPU) 20 may be mounted as a coprocessor on a Host Central Processing Unit (Host CPU), and tasks are allocated by the Host CPU. The core portion of the NPU is an arithmetic circuit 203, and a controller 204 controls the arithmetic circuit 203 to extract data in a memory (weight memory or input memory) and perform an operation.

In some implementations, the arithmetic circuitry 203 includes a plurality of processing units (PEs) internally. In some implementations, the operational circuitry 203 is a two-dimensional systolic array. The arithmetic circuitry 203 may also be a one-dimensional systolic array or other electronic circuitry capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuitry 203 is a general-purpose matrix processor.

For example, assume that there is an input matrix A, a weight matrix B, and an output matrix C. The arithmetic circuit fetches the data corresponding to the matrix B from the weight memory 202 and buffers it in each PE in the arithmetic circuit. The arithmetic circuit takes the matrix a data from the input memory 201 and performs matrix operation with the matrix B, and partial or final results of the obtained matrix are stored in an accumulator (accumulator) 208.

The vector calculation unit 207 may further process the output of the operation circuit 203, such as vector multiplication, vector addition, exponential operation, logarithmic operation, magnitude comparison, and the like. For example, the vector calculation unit 207 may be used for network calculations of non-convolution/non-FC layers in a neural network, such as Pooling (Pooling), Batch Normalization (Batch Normalization), Local Response Normalization (Local Response Normalization), and the like.

In some implementations, the vector calculation unit 207 can store the processed output vector to the unified buffer 206. For example, the vector calculation unit 207 may apply a non-linear function to the output of the arithmetic circuit 203, such as a vector of accumulated values, to generate the activation value. In some implementations, the vector calculation unit 207 generates normalized values, combined values, or both. In some implementations, the vector of processed outputs can be used as activation inputs to the arithmetic circuitry 203, for example for use in subsequent layers in a neural network.

Some or all of the steps of the methods provided herein may be performed by the arithmetic circuitry 203 or the vector calculation unit 207.

The unified memory 206 is used to store input data as well as output data.

The weight data directly passes through a Memory cell Access Controller 205 (DMAC) to transfer input data in the external Memory to the input Memory 201 and/or the unified Memory 206, store the weight data in the external Memory into the weight Memory 202, and store data in the unified Memory 206 into the external Memory.

A Bus Interface Unit (BIU) 210 for implementing the interaction between the main CPU, the DMAC, and the instruction fetch memory 209 through a Bus.

An instruction fetch buffer (issue fetch buffer)209 coupled to the controller 204 is used to store instructions used by the controller 204.

And the controller 204 is used for calling the instructions cached in the instruction fetching memory 209 to realize the control of the working process of the operation accelerator.

Generally, the unified Memory 206, the input Memory 201, the weight Memory 202, and the instruction fetch Memory 209 are On-Chip memories, the external Memory is a Memory which is private outside the NPU, and the external Memory may be a Double Data Rate Synchronous Dynamic Random Access Memory (DDR SDRAM), a High Bandwidth Memory (HBM), or other readable and writable memories.

As shown in fig. 5, the present embodiment provides a system architecture 300. The system architecture includes a local device 301, a local device 302, and an execution device 310 and a data storage system 350, wherein the local device 301 and the local device 302 are connected with the execution device 310 through a communication network.

The execution device 310 may be implemented by one or more servers. Optionally, the execution device 310 may be used with other computing devices, such as: data storage, routers, load balancers, and the like. The enforcement devices 310 may be disposed on one physical site or distributed across multiple physical sites. The execution device 310 may use data in the data storage system 350 or call program code in the data storage system 350 to implement the method of searching for a neural network structure of the embodiments of the present application.

Specifically, the execution device 310 may build an image recognition neural network, which may be used for image recognition or image processing, etc.

The user may operate respective user devices (e.g., local device 301 and local device 302) to interact with the execution device 310. Each local device may represent any computing device, such as a personal computer, computer workstation, smartphone, tablet, smart camera, smart car or other type of cellular phone, media consumption device, wearable device, set-top box, gaming console, and so forth.

The local devices of each user may interact with the enforcement device 310 via a communication network of any communication mechanism/standard, such as a wide area network, a local area network, a peer-to-peer connection, etc., or any combination thereof.

The execution device 310 may also be referred to as a cloud device, and in this case, the execution device 310 is generally deployed in the cloud.

As described above, the neural network model may depend on the data to be trained. For the data to be trained, the output result of the neural network model is closer to the characteristic of the data to be trained, and the accuracy is high; when the trained neural network model is applied to actual use, the recognition result output by the trained neural network model is far away from the characteristics of the input data, and the accuracy is low. In order to reduce the degree of dependence of the neural network model on the data to be trained, the application provides a data processing method, so that the trained neural network model can realize high-accuracy identification when being applied to a certain specific scene.

Fig. 6 is a schematic flow chart of a method for data processing according to an embodiment of the present application. The method 500 may be performed by the performing device 110 as shown in fig. 3. The method 500 may be performed by the neural network processor 20 as shown in fig. 4. The method 500 may be performed by the performing device 310 as shown in fig. 5.

501, acquiring a plurality of data to be processed.

The data to be processed can be understood as data to be input into the neural network model and processed by the neural network model. The data to be processed may be text data, image data, video data, audio data, etc., such as a text file, a segment of words in a text file, a picture file, an image block in a picture file, a frame of picture in a video file, a segment of video in a video file, an audio file, a segment of audio in an audio file. The plurality of data to be processed may be a plurality of text files, a plurality of segments of words in one text file, a plurality of picture files, a plurality of image blocks in one picture file, a plurality of frames of pictures in one video file, a plurality of video files, a plurality of segments of video in one video file, a plurality of audio files, a plurality of segments of audio in one audio file, and the like. The type of data to be processed is not limited in this application.

The manner in which the data to be processed is obtained may be varied in many ways. In one example, the database stores the plurality of data to be processed, so the device executing the method 500 can retrieve the plurality of data to be processed directly from the database. In one example, a camera is provided on the device executing the method 500, and the plurality of data to be processed can be acquired by using a camera shooting method. In one example, the cloud device has the plurality of to-be-processed data stored thereon, so the device executing the method 500 can receive the plurality of to-be-processed data sent by the cloud device through the communication network.

502, processing the multiple data to be processed by using a first neural network model to obtain multiple first vectors corresponding to the multiple data to be processed one by one, wherein the first neural network model is obtained based on general data training.

That is, a plurality of pieces of data to be processed are input to the first neural network model, and the plurality of pieces of data to be processed are subjected to processing operations such as feature screening (useful feature screening), feature fusion (merging of a plurality of features), and the like using the first neural network model, and a plurality of first vectors corresponding one-to-one to the plurality of pieces of data to be processed are output. Taking the convolutional neural network shown in fig. 1 as an example, the processing of the multiple pieces of data to be processed may be inputting the multiple pieces of data to be processed from an input layer, performing data processing through hidden layers such as a convolutional layer and/or a pooling layer, and outputting multiple first vectors corresponding to the multiple pieces of data to be processed one by one from an output layer of the first neural network model. The first vector may be a single number, or may be a vector including a plurality of numbers.

The type of the first neural network model may be a convolutional neural network model, a graph convolutional neural network model, a graph attention neural network model, or the like. The application does not limit the type of the first neural network model.

In particular, the first neural network model may be a conventional convolutional neural network model. The output layer of a conventional convolutional neural network is a fully-connected layer, which is sometimes referred to as a classifier. That is, the conventional convolutional neural network model can directly output the recognition result of the data to be processed. For example, the data to be processed is an image, and the conventional convolutional neural network model can directly output the recognition result of whether a person is present in the image, the person being a male or a female, and the like. The recognition result can only represent the probability that the data to be processed belongs to a certain characteristic.

In particular, the first neural network model may also be a special convolutional neural network model that does not include a fully-connected layer, which may output the calculation results of convolutional layers or pooling layers. That is, the first neural network model may output a processing result belonging to an intermediate calculation result in the conventional convolutional neural network model. For the sake of simplicity of description, the processing result output by this particular convolutional neural network model is referred to as an intermediate calculation result. Typically, the intermediate calculation results can be used to characterize some or all of the information of the data to be processed.

In particular, the first neural network model may be a graph neural network model.

Optionally, the processing the plurality of data to be processed by using the first neural network model includes: processing the plurality of data to be processed and fifth incidence relation information by using the first neural network model, wherein the fifth incidence relation information is used for indicating at least one data group to be processed, and each data group to be processed comprises two data to be processed which meet the prior assumption.

The data group to be processed comprises two data to be processed with incidence relation. Namely, there is an association relationship between two data to be processed in the data group to be processed, which satisfies the a priori assumption. For example, if the data group to be processed is (data to be processed 1, data to be processed 2), there is an association between the data to be processed 1 and the data to be processed 2 that satisfies the a priori assumption. That is to say, a plurality of pieces of data to be processed and fifth incidence relation information reflecting incidence relations among the plurality of pieces of data to be processed are input into the first neural network model, the first neural network model can determine whether the data and the data have influence according to the fifth incidence relation information, and reflect the degree of influence between the data and the data through the weight parameters in the first neural network model, so as to obtain a plurality of first vectors capable of reflecting data relevance, and the plurality of first vectors correspond to the plurality of pieces of data to be processed one by one.

Hypothesis/hypothesis (hypothesisis) means the predetermined interpretation of a certain phenomenon, i.e. the guessing and description of the studied natural phenomenon and its regularity based on known scientific facts and principles, and the data is classified, summarized and analyzed in detail to obtain a temporary but acceptable interpretation.

Prior probability (prior probability) occurs in bayesian statistical inference and refers to the prior probability distribution (often referred to as a priori) of a random variable, i.e., the probability distribution that expresses a person's belief in that variable before some evidence is considered.

The prior hypothesis is a prior probability distribution for all hypothesis in the hypothesis space. Taking text data as an example, the plurality of data to be processed may be a plurality of segments of words, where a segment of words may include a plurality of sentences. Generally, different sections of words express different topics, so that the relevance between a plurality of sentences in one section of words is high, and the relevance between a plurality of sentences belonging to different sections is weak or no relevance exists. Then there may be an a priori assumption such as the existence of a correlation between sentences belonging to the same paragraph.

Taking the picture data as an example, the plurality of data to be processed may be multi-frame pictures. In general, as time shifts, the longer the time interval between two frames of pictures is, the smaller the correlation between the two frames of pictures is; the shorter the time interval between two frames of pictures is, the greater the correlation between the two frames of pictures is. Then there may be an a priori assumption that there is a correlation between two frames whose interval duration is less than a preset threshold. The preset threshold may be 8s, for example.

Taking video data as an example, the data to be processed can be a plurality of sections of videos, wherein along with the time migration, the longer the interval between two sections of videos is, the smaller the relevance between the two sections of videos is; the shorter the time interval between two video segments is, the greater the correlation between the two video segments is. Then there may be an a priori assumption that there is a correlation between two segments of video with a minimum separation duration less than a preset threshold. The preset threshold may be 8s, for example.

Taking audio data as an example, the data to be processed may be multiple segments of audio, wherein, along with the time shift, the longer the interval between two segments of audio, the smaller the relevance between the two segments of audio; the shorter the duration of the two pieces of audio are separated, the greater the correlation between the two pieces of audio. Then there may be an a priori assumption that there is a correlation between two pieces of audio whose minimum separation duration is less than a preset threshold. The preset threshold may be 8s, for example.

The fifth association relation information may be a matrix. Compared with other information types, the matrix operation is more convenient.

Optionally, the fifth incidence relation information includes a first incidence relation matrix, a vector in the first dimension in the first incidence relation matrix includes a plurality of elements in one-to-one correspondence with the plurality of pieces of data to be processed, and a vector in the second dimension in the first incidence relation matrix includes a plurality of elements in one-to-one correspondence with the plurality of pieces of data to be processed, where any element in the first incidence relation matrix is used to indicate whether there is an incidence relation that satisfies the prior assumption between the vector corresponding to the any element in the first dimension and the vector corresponding to the any element in the second dimension.

Assuming that the first incidence relation matrix is P,

wherein, P is a k multiplied by k matrix, the ith column corresponds to the data i to be processed, the jth row corresponds to the data j to be processed, and the elements P of the ith column and the jth row_i,jAnd the correlation relationship which meets the prior assumption exists between the data i to be processed and the data j to be processed. When the data i to be processed and the data j to be processed have an association relationship, the element p of the ith column and the jth row_i,jThe value can be 1, no association exists between the data i to be processed and the data j to be processed, and the element p in the ith column and the jth row_i,jThe value may be 0. Or, when there is an association relationship between the data i to be processed and the data j to be processed, the element p in the ith column and the jth row_i,jThe value can be 0, no association exists between the data i to be processed and the data j to be processed, and the element p in the ith column and the jth row_i,jThe value may be 1.

In one example, the matrix P is obtained after the matrix P is converted into rank^TThe same as the matrix P. That is, p_i,j＝p _j,i. The association relationship between the data i to be processed and the data j to be processed at this time may be non-directional.

In one example, the matrix P is obtained after the matrix P is converted into rank^TUnlike the matrix P. That is, p_i,j≠p _j,i. At this time, the association relationship between the data i to be processed and the data j to be processed is directional. For example, p_i,jIndicates that a data to be processed i exists between the data to be processed jThe data i to be processed points to the incidence relation, p, of the data j to be processed_j,iAnd the data to be processed i and the data to be processed j are associated in a way that the data to be processed j points to the data to be processed i. Or, p_i,jIndicating that an incidence relation pointing to the data to be processed i from the data to be processed j exists between the data to be processed i and the data to be processed j, p_j,iAnd the data to be processed i and the data to be processed j are associated in a way that the data to be processed i points to the data to be processed j.

There are at least two cases of the degree of correlation between a plurality of data to be processed.

In one example, the plurality of to-be-processed data is composed of to-be-processed data 1 and several to-be-processed data associated with the to-be-processed data 1. As shown in fig. 2, there are nodes 1, 2, 3, 4, and 6, where there are an edge connecting between nodes 1 and 2, an edge connecting between nodes 1 and 3, an edge connecting between nodes 1 and 4, and an edge connecting between nodes 1 and 6.

In one example, the plurality of to-be-processed data includes to-be-processed data 1, a number of to-be-processed data associated with the to-be-processed data 1, and a number of to-be-processed data not associated with the to-be-processed data 1. As shown in fig. 2, there are nodes 1, 4, 5, and 6, where there are an edge connected between node 1 and node 4 and an edge connected between node 1 and node 6, an edge connected between node 5 and node 4 and an edge connected between node 5 and node 6, and no edge connected between node 1 and node 5.

For the two cases, there may be different ways of acquiring the plurality of data to be processed.

In one example, a plurality of data to be processed is obtained, and whether an association relationship exists between any two data to be processed in the plurality of data to be processed is determined according to an a priori assumption.

In one example, one piece of data to be processed is obtained, and other pieces of data to be processed having an association relation with the one piece of data to be processed are determined according to an a priori assumption.

Optionally, the acquiring a plurality of data to be processed includes: acquiring target data, wherein the target data is one of the plurality of data to be processed; obtaining associated data having a priori assumption with the target data, wherein the plurality of data to be processed comprises the associated data.

That is, the device performing method 500 first obtains the target data and then introduces the associated data related to the target data according to a priori assumptions.

Taking text data as an example, the target data may be sentence 1, and when it is assumed that there is a relation between a plurality of sentences belonging to the same paragraph in the first experiment, other sentences except for sentence 1 in the paragraph where sentence 1 is located are introduced as related data.

Taking picture data as an example, the target data may be picture 1 in a piece of video. When the first check assumes that there is a correlation between two frames of pictures whose interval duration is less than 8s, a picture spaced less than 8s from this picture 1 is taken as correlation data.

Taking video data as an example, the target data may be video 1, and when it is assumed in the first experiment that there is a correlation between two pieces of video with the minimum interval duration less than 8s, the video with the minimum interval less than 8s with the video 1 is taken as the correlation data.

Taking audio data as an example, the target data may be audio 1, and if it is assumed in the first experiment that there is a correlation between two pieces of audio whose minimum interval duration is less than 8s, the audio whose minimum interval is less than 8s with the audio 1 is taken as the correlation data.

The time interval 8s is taken as an example to obtain the associated data in the above example, and it can be understood by those skilled in the art that the time interval can be adjusted according to different scenarios.

Additionally, to reduce the dependence of the neural network model on the data to be trained, the first neural network model may be trained using generic data. The general-purpose data may be data that is not affected by a scene or data that has low dependency on a scene. For example, a first neural network model is used to identify character features in an image, and its training data set may include various scenes that may occur, such as street scenes, meeting scenes, vehicle-mounted scenes, country scenes, asian scenes, african scenes, european and american scenes, and so on. The plurality of data to be processed may be data applied within a particular scene. That is, the special data may be processed using a first neural network model capable of processing general-purpose data.

The process of training the first neural network model may be inputting general data into the first neural network model, and the first neural network model may perform data processing operations such as feature screening and feature fusion on the general data to obtain a feature vector. And performing matrix operation on the feature vector and a weight matrix containing the weight parameters to obtain a data training result corresponding to the general data. And then calculating the distance between the data training result and the label of the general data, thereby correcting the weight parameter of the first neural network model. The distance between the data training result and the label of the generic data can be understood as the degree of similarity between the data training result and the label of the generic data. The specific calculation method of the information distance can be in modes of cross entropy, KL divergence, JS divergence and the like.

For example, in order to obtain a large amount of picture training data, data is collected in a video mode in the data collection process, and the training data may be labeled, so as to obtain labeled data required in the training process. The specific labeling process and label definitions are common technical contents in the field of deep learning, and are not described in detail in the embodiments of the present application.

When the data training result is the recognition result of the general data, the distance between the data training result and the label of the general data can be obtained according to the recognition result. For example, the recognition result of the general data 1 is: the confidence that generic data 1 belongs to feature 1 is 0.7 and the confidence that generic data 1 belongs to feature 2 is 0.3. The labels of the general data 1 are: tag 1, tag 1 corresponds to feature 1. Then, the identification result of the general data 1 may be represented by (0.7, 0.3), and the label of the general data 1 may be represented by (1, 0). The distance between the data training result and the label of the generic data may be the distance between the vector (0.7, 0.3) and the vector (1, 0).

When the data training result is an intermediate calculation result, the label of the general data may be a vector having the same dimension as the intermediate calculation result, and the distance between the data training result and the label of the general data may be obtained through vector calculation.

First association relation information is obtained 503, where the first association relation information is used to indicate at least one first vector group, and each first vector group includes two first vectors that satisfy an a priori assumption.

That is, the first association relation information reflects whether or not an association relation exists between the plurality of first vectors. The first vector group includes two first vectors having an association relationship. I.e. there is a correlation between two first vectors within the first set of vectors that satisfies the a priori assumption. For example, the first vector group indicates (first vector 1, first vector 2), then there is an association between first vector 1 and first vector 2 that satisfies the a priori assumption. The first association relation information reflects whether the plurality of first vectors have influence on each other, so that a data processing result capable of reflecting data association can be obtained according to the first association relation information. It should be understood that the first vector may have an association with itself.

In one example, since the plurality of first vectors correspond to the plurality of data to be processed one to one, the first association relationship information may be determined according to an association relationship between the plurality of data to be processed. That is, the first association information is the same as or substantially the same as the fifth association information.

In another example, the first association information is different from the fifth association information above. For example, whether an association relationship exists between any two first vectors in the plurality of first vectors may be determined according to the similarity between the any two first vectors. The greater the similarity, the greater the association; the smaller the similarity, the smaller the association. Then, the prior assumption corresponding to the first association relationship information may be that, when the similarity exceeds a preset value, an association relationship exists between any two first vectors; when the similarity does not exceed the preset value, it can be considered that no association exists between any two first vectors.

The first association relation information may be reflected by the graph model. As shown in fig. 2, node 1, node 2, and node 3 may correspond to first vector 1, first vector 2, and first vector 3, respectively. There is an edge connecting between node 1 and node 2, so there is an association between first vector 1 and first vector 2; there is an edge connecting between node 2 and node 3, so there is an association between first vector 2 and first vector 3; there is no edge connecting between node 1 and node 3, so there is no association between the first vector 1 and the first vector 3.

Optionally, the first association relationship information includes a second association relationship matrix, a vector in the second association relationship matrix in the first dimension includes a plurality of elements in one-to-one correspondence with the plurality of first vectors, and a vector in the second association relationship matrix in the second dimension includes a plurality of elements in one-to-one correspondence with the plurality of first vectors, where any element in the second association relationship matrix is used to indicate whether there is an association relationship, which satisfies the prior assumption, between the vector corresponding to the any element in the first dimension and the vector corresponding to the any element in the second dimension.

Assuming that the second correlation matrix is Q,

wherein Q is a matrix of l × l, the ith column corresponds to a first vector i, the jth row corresponds to a first vector j, and the ith column and the jth row have elements Q_i,jIndicating whether there is an association between the first vector i and the first vector j that satisfies the a priori assumption. When the first vector i and the first vector j have an association relationship, the element q of the ith column and the jth row_i,jThe value can be 1, the first vector i and the first vector j have no association relation, and the element q of the ith column and the jth row_i,jThe value may be 0. Or, when there is an association between the first vector i and the first vector j,element q of ith column and jth row_i,jThe value can be 0, the first vector i and the first vector j have no association relation, and the element q of the ith column and the jth row_i,jThe value may be 1.

In one example, the matrix Q is obtained after the matrix Q is converted into the rank^TThe same as matrix Q. That is, q_i,j＝q _j,i. The association between the first vector i and the first vector j at this time may be non-directional.

In one example, the matrix Q is obtained after the matrix Q is converted into the rank^TUnlike the matrix Q. That is, q_i,j≠q _j,i. The association between the first vector i and the first vector j is directional. For example, q_i,jQ represents the existence of an association relationship between the first vector i and the first vector j, wherein the first vector i points to the first vector j_j,iIndicating that an association exists between the first vector i and the first vector j, wherein the first vector j points to the first vector i. Or, q_i,jQ represents the existence of an association between the first vector i and the first vector j, wherein the first vector j points to the first vector i_j,iIndicating that an association exists between the first vector i and the first vector j, wherein the first vector i points to the first vector j.

In order to avoid the difficulty in calculation caused by the overlarge number of the matrixes, the second incidence relation matrix can be compressed to obtain a matrix with a smaller dimension.

In one example, assuming that the second incidence relation matrix Q is an l × l matrix, values of all elements on the second incidence relation matrix Q, which are more than l 'elements apart from the diagonal line of the second incidence relation matrix Q, are all 0 or 1, and l' < l, the second incidence relation matrix Q may be divided into a plurality of small matrices, where the maximum row number of the small matrices is l 'and the maximum column number of the small matrices is l'. This process may also be referred to as sparsifying the second incidence relation matrix Q.

In one example, assuming that the second incidence relation matrix Q cannot be thinned, the second incidence relation matrix Q may be compressed according to a spectral clustering method.

It should be understood that the a priori assumptions may indicate a forward correlation as well as a reverse correlation. For example, since the shorter the inter-picture frame interval time is, the more relevant the content in the picture is, in general, when the prior hypothesis indicates that there is a correlation between picture frames within 8s, it can be understood that the prior hypothesis indicates a forward correlation; when the a priori assumption indicates that there is a correlation between picture frames other than 8s, it can be understood that the a priori assumption indicates a reverse correlation.

And 504, inputting the plurality of first vectors and the first incidence relation information into a second neural network model to obtain a processing result for first to-be-processed data, wherein the first to-be-processed data is any one of the plurality of to-be-processed data.

That is, the output result of the first neural network model and the association relation inside the output result are input to the second neural network model. The plurality of first vectors are input into the second neural network model, which may be understood as inputting the plurality of features of the data to be processed into the second neural network model. The first incidence relation information is input into the second neural network model, and it can be understood that information on whether any two first vectors in the plurality of first vectors have influence on each other is input into the second neural network model. The plurality of first vectors may be understood as nodes in the graph model, and the first association information may be used to indicate whether edges exist between the nodes. Thus, the second neural network model may be a graph neural network model.

The second neural network model processes the plurality of first vectors and the first incidence relation information, and may determine whether any two first vectors have influence and what specific influence degree is according to a weight parameter in the second neural network model, so as to obtain a processing result of the first data to be processed. The processing result of the first data to be processed may be a characteristic representation of the first data to be processed, or may be an identification result of the first data to be processed. The processing result of the first data to be processed may be a vector.

Assume that the first vectors are first vectorsVectors, each with x₁，…，x _lAnd (4) showing. Wherein l is not less than i and not more than l, l is not less than t and not more than s,

then, combining the plurality of first vectors may result in a matrix X, where X ═ X₁，…，x _i，…，x _l}. The first correlation information is assumed to be the second correlation matrix Q mentioned above. ,

firstly, h weight matrixes W to be trained are assumed₁、W ₂、…、W _h。W ₁、W ₂、…、W _hAll dimensions of (are s_h. Means W₁、W ₂、…、W _hAll contain s_hA weight parameter. s_hS/h, where h is used to represent the number of heads of the graph attention neural network (the number of heads may also be referred to as the number of slices). s_hCommonly referred to as the single head dimension.

At this time, U is calculated respectively₁＝X·W ₁，U ₂＝X·W ₂，…，U _h＝X·W _h. Obviously, U is now₁、U ₂、…、U _hAll dimensions of (are l s_h。

Then calculate V_i,j＝U _i·U _j ^TI is not equal to j, i is more than or equal to 1 and less than or equal to h, and j is more than or equal to 1 and less than or equal to h. At this time V_i,jThe dimension is already l x l. Then to V_i,jApplying Softmax function to each row to obtain normalized probability to obtain R_i,j。R _i,jStill known as the l x l matrix, this matrix can be understood as the matrix of the intensity of mutual attention between each point.

Then to R_i,jMultiplying the E by the Q implementation matrix counterpoint element to obtain E after Q relation mask_i,j。E _i,jCan be understood as according to the edgeThe relationship screens out the associated points, and retains attention between them, and the attention of the unrelated points is not retained. The matrix contains a large amount of mutual correlation information of the nodes, so that the information content is rich. Then with E_i,j·U _iThe final expression U of each point updated by other point information can be obtained_inew。U _inewIs l s_h。

Finally, U is put_1new，…，U _inew，…，U _hnewSplicing together to obtain an X' matrix, wherein X ═ U_1new,…,U _inew,…,U _hnewDimension of X' is l s. It can be seen that X' contains information about the correlation between nodes and the weight parameter.

The above process is a data processing process of a layer of network. If the depth of the graph attention neural network model is h ', namely the graph attention neural network model comprises an h' layer network, the X 'output by the current layer can be input into the next layer network, namely the X' output by the current layer is regarded as the X of the next layer network, and the same or similar data processing process as the above is carried out.

It can be seen that X 'is unchanged in size from X, but each element in X' contains information for one or more elements in X. By integrating the data with the incidence relation, the second neural network model can acquire more information quantity when identifying a certain characteristic, and the identification accuracy is improved. And performing matrix operation on the matrix X' and the weight parameter matrix to obtain a processing result of the first data to be processed.

In one example, the plurality of data to be processed includes first data to be processed, the first data to be processed may be target data in the foregoing, the plurality of data to be processed further includes one or more associated data associated with the first data to be processed, and the second neural network model may combine an influence of the associated data on the first data to be processed according to the first association relation information, so as to obtain a processing result corresponding to the first data to be processed. In other words, the second neural network model not only extracts the features of the first to-be-processed data, but also extracts the features of other to-be-processed data related to the first to-be-processed data, so that the input data volume in the prediction process is expanded, and the identification accuracy is improved.

In one example, the plurality of data to be processed includes first data to be processed, the first data to be processed may correspond to a target vector, the plurality of first vectors further includes one or more associated vectors associated with the target vector, and the plurality of data to be processed includes data to be processed in one-to-one correspondence with the one or more associated vectors. The second neural network model can obtain a processing result corresponding to the first to-be-processed data by combining the influence of the association vector on the target vector according to the first association relation information. In other words, the second neural network model not only extracts the features of the target vector, but also extracts the features of the associated vector which is associated with the target vector, so that the data processing amount in the prediction process is expanded, and the identification accuracy rate is improved.

In addition, the second neural network model may output a plurality of processing results in one-to-one correspondence with the plurality of data to be processed. That is to say, the second neural network model integrates the plurality of first vectors and the association relationship between the respective first vectors, and outputs a plurality of processing results corresponding to the plurality of data to be processed one by one.

Assuming a scenario, a first association relationship exists between the first vector a and the first vector B, and a second association relationship exists between the first vector a and the first vector C, then the two association relationships may be the same or different. For example, two sentences spaced farther apart within the same paragraph are associated less closely, and two sentences spaced closer together within the same paragraph are associated more closely. For another example, the two frames with longer interval duration have a lower degree of closeness of association, and the two frames with shorter interval duration have a higher degree of closeness of association. In order to express the magnitude of the closeness of the two relations, various expression modes are available.

In one example, the first association relation information is a matrix, and the numerical size of the elements in the matrix is used to indicate how close the association relation is, and the larger the numerical value is, the more close the association relation is. However, determining the specific size of the numerical value often introduces redundant human setting or increases the difficulty of training the neural network model.

In one example, when there are two kinds of first vector groups with close association and distant association in the first association information, second association information may be established, where the second association information is used to indicate the first vector group with close association. That is, the degree of influence between two first vectors having close association can be strengthened by the second association information.

Optionally, the first association relation information is used to indicate N first vector groups, where N is an integer greater than 1, before the inputting the first vectors and the first association relation information into a second neural network model to obtain a processing result for the first to-be-processed data, the method further includes: acquiring second association relation information, wherein the second association relation information is used for indicating N second vector groups, the N second vector groups belong to the N first vector groups, N is smaller than N, and N is a positive integer; inputting the plurality of first vectors and the first incidence relation information into a second neural network model to obtain a processing result for first data to be processed, including: and inputting the plurality of first vectors, the first incidence relation information and the second incidence relation information into the second neural network model to obtain a processing result aiming at the first to-be-processed data.

The information indicated in the second association relation information is contained in the first association relation information. That is, there must be a correlation between two first vectors within each second vector group that satisfies the a priori assumption.

Assuming that the first association relationship information is the same as or substantially the same as the fifth association relationship information, the first association relationship information may reflect an association relationship between a plurality of pieces of data to be processed, and the second association relationship information may reflect whether a close association relationship exists between a plurality of pieces of data to be processed.

Taking the text data as an example, when it is assumed in the first experiment that there is a correlation between sentences belonging to the same paragraph, the first correlation information may indicate that there is a correlation between different sentences in the same paragraph, and the second correlation information may indicate that there is a close correlation between adjacent sentences in the same paragraph.

Taking the picture data as an example, when it is assumed in the first experiment that there is a correlation between two frames of pictures with an interval less than 8s, the first association relationship information may indicate that there is a correlation between two frames of pictures with an interval less than 8s, and the second association relationship information may indicate that there is a close correlation between two frames of pictures with an interval less than 2 s.

Taking video data as an example, when it is assumed in the first experiment that there is an association between two pieces of video with a minimum interval smaller than 8s, the first association relationship information may indicate that there is an association between two pieces of video with a minimum interval smaller than 8s, and the second association relationship information may indicate that there is a close association between two pieces of video with a minimum interval smaller than 2 s.

Taking the audio data as an example, when it is assumed in advance that there is an association between two pieces of audio with the minimum interval smaller than 8s, the first association relationship information may indicate that there is an association between two pieces of audio with the minimum interval smaller than 8s, and the second association relationship information may indicate that there is a close association between two pieces of audio with the minimum interval smaller than 2 s.

Assuming that the first association information is different from the fifth association information in the above, the first association information may reflect a similarity between the plurality of first vectors, and the second association information may reflect two first vectors of the plurality of first vectors having a higher similarity.

For example, when it is assumed in the first experiment that the similarity between two first vectors exceeds a preset value, the first association relationship information may indicate that an association exists between two first vectors whose similarity exceeds the preset value 1, and the second association relationship information may indicate that an association exists between two first vectors whose similarity exceeds a preset value 2, where the preset value 2 is greater than the preset value 1.

It is to be understood that, similar to the first association information, the second association information may contain a matrix for representing n second vector groups.

It should be understood that the first neural network model and the second neural network model may be two submodels in one neural network model.

The method for training the second neural network model and obtaining the weight parameters of the second neural network model is described in detail below with reference to fig. 7. The method 600 may be performed by the training device 120 as shown in fig. 3.

601, obtaining a plurality of data to be trained.

The data to be trained may be understood as data to be input into the neural network model for training the neural network model. Some or all of the plurality of data to be trained have a label. The neural network model processes the data to be trained to obtain a data processing result, and the weight parameters of the neural network model can be corrected by calculating the distance between the label and the data processing result. The distance between the data processing result and the tag can be understood as the similarity between the data processing result and the tag. The specific calculation method of the information distance can be in modes of cross entropy, KL divergence, JS divergence and the like.

The data to be trained may be text data, image data, video data, audio data, etc., such as a text file, a segment of text in a text file, a picture file, an image block in a picture file, a frame of picture in a video file, a segment of video in a video file, an audio file, a segment of audio in an audio file. The plurality of data to be trained may be a plurality of text files, a plurality of segments of words in one text file, a plurality of picture files, a plurality of image blocks in one picture file, a plurality of frames of pictures in one video file, a plurality of video files, a plurality of segments of video in one video file, a plurality of audio files, a plurality of segments of audio in one audio file, and the like. The type of data to be trained is not limited in this application.

The manner of acquiring the data to be trained may be various. In one example, the database stores the plurality of data to be trained, so the device performing the method 600 may retrieve the plurality of data to be trained directly from the database. In one example, a camera is provided on the device executing the method 600, and the plurality of data to be trained can be obtained by using a camera shooting method. In one example, the cloud device has the plurality of data to be trained stored thereon, so the device executing the method 600 can receive the plurality of data to be trained sent by the cloud device through the communication network.

And 602, processing the multiple data to be trained by using a first neural network model to obtain multiple fourth vectors corresponding to the multiple data to be trained one by one.

Wherein, the plurality of data to be trained can be general data.

And inputting the data 1 to be trained into the first neural network model to obtain a fourth vector 1. And inputting the data 2 to be trained into the first neural network model to obtain a fourth vector 2.

The third association information is used to indicate an association between data. Assuming that the third vector group indicated by the third association information includes (fourth vector 1, fourth vector 2), an association exists between the fourth vector 1 and the fourth vector 2.

And inputting the fourth vector 1 and the third correlation information into a second neural network model to obtain a first processing result 1. Therefore, at least the influence and contribution of the data to be trained 2 on the data to be trained 1 can be obtained.

That is, a plurality of data to be trained are input into the first neural network model, and the first neural network model is used to perform processing operations such as feature screening (useful feature screening), feature fusion (combining a plurality of features) and the like on the plurality of data to be trained, and a plurality of fourth vectors corresponding to the plurality of data to be trained one to one are output. Taking the convolutional neural network shown in fig. 1 as an example, the processing on the multiple data to be trained may be inputting the multiple data to be trained from an input layer, performing data processing through hidden layers such as a convolutional layer and/or a pooling layer, and outputting multiple fourth vectors corresponding to the multiple data to be trained one by one from an output layer of the first neural network model. The fourth vector may be a single number, or may be a vector including a plurality of numbers.

In one example, the first neural network model is a neural network model to be trained. The first neural network model can perform data processing operations such as feature screening and feature fusion on the multiple data to be trained to obtain feature vectors. And performing matrix operation on the characteristic vectors and the weight matrix containing the weight parameters to obtain a plurality of fourth vectors which correspond to the plurality of data to be trained one by one. The plurality of fourth vectors are used for modifying the weight parameters of the first neural network model, for example, the distances between the fourth vectors and the labels of the plurality of data to be trained may be calculated, and the weight parameters of the first neural network model may be modified in combination with the loss function.

In one example, the first neural network model is a trained neural network model.

To reduce the dependence of the neural network model on the data to be trained, the first neural network model may be trained using generic data. The general-purpose data may be data that is not affected by a scene or data that has low dependency on a scene. For example, a first neural network model is used to identify character features in an image, and its training data set may include various scenes that may occur, such as street scenes, meeting scenes, vehicle-mounted scenes, country scenes, asian scenes, african scenes, european and american scenes, and so on. Then the plurality of data to be trained may be data that is applied within a particular scenario. In other words, the first neural network model capable of processing the general data is migrated to a special scene, and the second neural network model capable of processing the special scene is obtained by the method of training the neural network model.

The process of training the first neural network model may be inputting general data into the first neural network model, and the first neural network model may perform data processing operations such as feature screening and feature fusion on the general data to obtain a feature vector. And performing matrix operation on the feature vector and a weight matrix containing the weight parameters to obtain a data training result corresponding to the general data. And then calculating the distance between the data training result and the label of the general data, and correcting the weight parameter of the first neural network model. The distance between the data training result and the label of the general data can be understood as the similarity between the data training result and the label of the general data. The specific calculation method of the information distance can be in modes of cross entropy, KL divergence, JS divergence and the like.

In particular, the first neural network model may be a conventional convolutional neural network model. The output layer of a conventional convolutional neural network is a fully-connected layer, which is sometimes referred to as a classifier. That is, the conventional convolutional neural network model may input the recognition result of the data to be trained into the loss function through the full-link layer. For example, the data to be trained is an image, and the full-link layer of the conventional convolutional neural network model can directly output the recognition result of whether the image has a person, where the person is a male or a female. The recognition result can only represent the probability that the data to be trained belongs to a certain characteristic.

In particular, the first neural network model may also be a special convolutional neural network model that does not include a fully-connected layer, and the calculation results of the convolutional layer or the pooling layer may be input to the loss function. That is, the first neural network model may input the processing results belonging to the intermediate calculation results in the conventional convolutional neural network model into the loss function. For simplicity of description, the processing result of the input loss function of this particular convolutional neural network model is referred to as an intermediate calculation result. Typically, the intermediate calculation results can be used to characterize some or all of the information of the data to be trained. That is, the intermediate calculation results generally contain more information content than the recognition results.

Optionally, the processing the plurality of data to be trained by using the first neural network model includes: and processing the plurality of data to be trained and sixth incidence relation information by using the first neural network model, wherein the sixth incidence relation information is used for indicating at least one data group to be trained, and each data group to be trained comprises two data to be trained which meet the prior assumption.

The data group to be trained comprises two data to be trained with incidence relation. Namely, there is an association relationship between two data to be trained in the data group to be trained, which satisfies the prior assumption. For example, if the data group to be trained is (data 1 to be trained, data 2 to be trained), there is an association between data 1 to be trained and data 2 to be trained that satisfies the a priori assumption. That is to say, a plurality of data to be trained and sixth incidence relation information reflecting incidence relations among the plurality of data to be trained are input into the first neural network model, the first neural network model can determine whether the data and the data are influenced or not according to the sixth incidence relation information, and the influence degree between the data and the data is reflected through the weight parameters in the first neural network model, so that a plurality of first vectors capable of reflecting data incidence relations are obtained, and the plurality of first vectors correspond to the plurality of data to be trained one to one.

Taking text data as an example, the plurality of data to be trained may be a plurality of segments of words, wherein a segment of words may include a plurality of sentences. Generally, different sections of words express different topics, so that the relevance between a plurality of sentences in one section of words is high, and the relevance between a plurality of sentences belonging to different sections is weak or no relevance exists. Then there may be an a priori assumption such as the existence of a correlation between sentences belonging to the same paragraph.

Taking the picture data as an example, the plurality of data to be trained may be a plurality of frames. In general, as time shifts, the longer the time interval between two frames of pictures is, the smaller the correlation between the two frames of pictures is; the shorter the time interval between two frames of pictures is, the greater the correlation between the two frames of pictures is. Then there may be an a priori assumption that there is a correlation between two frames whose interval duration is less than a preset threshold. The preset threshold may be 8s, for example.

Taking video data as an example, the data to be trained can be a plurality of sections of videos, wherein along with the time migration, the longer the interval between two sections of videos is, the smaller the relevance between the two sections of videos is; the shorter the time interval between two video segments is, the greater the correlation between the two video segments is. Then there may be an a priori assumption that there is a correlation between two segments of video with a minimum separation duration less than a preset threshold. The preset threshold may be 8s, for example.

Taking audio data as an example, the data to be trained can be a plurality of sections of audio, wherein along with the time migration, the longer the interval between two sections of audio is, the smaller the relevance between the two sections of audio is; the shorter the duration of the two pieces of audio are separated, the greater the correlation between the two pieces of audio. Then there may be an a priori assumption that there is a correlation between two pieces of audio whose minimum separation duration is less than a preset threshold. The preset threshold may be 8s, for example.

The sixth association relation information may be a matrix. Compared with other information types, the matrix operation is more convenient.

Optionally, the sixth association relationship information includes a third association relationship matrix, a vector in the third association relationship matrix in the first dimension includes a plurality of elements in one-to-one correspondence with the plurality of data to be trained, and a vector in the third association relationship matrix in the second dimension includes a plurality of elements in one-to-one correspondence with the plurality of data to be trained, where any element in the third association relationship matrix is used to indicate whether there is an association relationship, which satisfies the prior assumption, between a vector corresponding to the any element in the first dimension and a vector corresponding to the any element in the second dimension.

Assuming that the third correlation matrix is a,

wherein, A is a k multiplied by k matrix, the ith column corresponds to a first vector i, the jth row corresponds to a first vector j, and the ith column and the jth row have elements a_i,jIndicating whether there is an association between the first vector i and the first vector j that satisfies the a priori assumption. When the first vector i and the first vector j have an association relationship, the element a of the ith column and the jth row_i,jThe value can be 1, the first vector i and the first vector j have no association relation, and the element a of the ith column and the jth row_i,jThe value may be 0. Or when the first vector i and the first vector j have an association relationship, the element a of the ith column and the jth row_i,jThe value can be 0, the first vector i and the first vector j have no association relationship, and the element a of the ith column and the jth row_i,jThe value may be 1.

In one example, matrix A is obtained after matrix A is converted into rank^TThe same as matrix a. That is, a_i,j＝a _j,i. The association between the first vector i and the first vector j at this time may be non-directional.

In one example, matrix A is obtained after matrix A is converted into rank^TUnlike matrix a. That is, a_i,j≠a _j,i. The association between the first vector i and the first vector j is directional. For example, a_i,jIndicating the existence of a first vector i between the first vector i and the first vector jAn association, a, pointing to a first vector j_j,iIndicating that an association exists between the first vector i and the first vector j, wherein the first vector j points to the first vector i. Or, a_i,jIndicating that there is an association between the first vector i and the first vector j, a_j,iIndicating that an association exists between the first vector i and the first vector j, wherein the first vector i points to the first vector j.

603, third association information is obtained, where the third association information is used to indicate at least one third vector group, and each third vector group includes two fourth vectors that satisfy the prior assumption.

That is, the third association information reflects whether or not an association exists between the plurality of fourth vectors. The third vector group includes two fourth vectors having an association relationship. I.e. there is a correlation between two fourth vectors within the third set of vectors that satisfies the a priori assumption. For example, the third vector group indicates (fourth vector 1, fourth vector 2), then there is a correlation between the fourth vector 1 and the fourth vector 2 that satisfies the a priori assumption. The third association relation information reflects whether the plurality of fourth vectors have influence, so that a data processing result capable of reflecting the data association can be obtained according to the third association relation information. It should be understood that the fourth vector may have an association with itself.

In one example, since the plurality of fourth vectors correspond to the plurality of data to be trained one to one, the third association information may be determined according to an association between the plurality of data to be trained. That is, the third association information is the same as or substantially the same as the sixth association information described above.

In one example, the third association information is different from the sixth association information above. For example, whether there is an association relationship between any two fourth vectors in the plurality of fourth vectors may be determined according to the similarity between the any two fourth vectors. The greater the similarity, the greater the association; the smaller the similarity, the smaller the association. Then, the prior assumption corresponding to the third association information may be that, when the similarity exceeds a preset value, an association relationship exists between any two fourth vectors; when the similarity does not exceed the preset value, it can be considered that no association exists between any two fourth vectors.

The third association information may be reflected by the graph model. As shown in fig. 2, node 1, node 2, and node 3 may correspond to a fourth vector 1, a fourth vector 2, and a fourth vector 3, respectively. There is an edge connecting between node 1 and node 2, so there is an association between the fourth vector 1 and the fourth vector 2; there is an edge connecting between node 2 and node 3, so there is an association between fourth vector 2 and fourth vector 3; there is no edge connecting between node 1 and node 3, so there is no association between the fourth vector 1 and the fourth vector 3.

Optionally, the third association information includes a fourth association matrix, a vector in the fourth association matrix in the first dimension includes a plurality of elements in one-to-one correspondence with the plurality of fourth vectors, and a vector in the fourth association matrix in the second dimension includes a plurality of elements in one-to-one correspondence with the plurality of fourth vectors, where any element in the fourth association matrix is used to indicate whether there is an association that satisfies the prior assumption between the vector corresponding to the any element in the first dimension and the vector corresponding to the any element in the second dimension.

Assuming that the fourth incidence relation matrix is B,

wherein B is a matrix of l × l, the ith column corresponds to a fourth vector i, the jth row corresponds to a fourth vector j, and the ith column and the jth row have elements B_i,jIndicating whether an association exists between the fourth vector i and the fourth vector j that satisfies the a priori assumption. When the fourth vector i and the fourth vector j have an association relationship, the element b of the ith column and the jth row_i,jThe value can be1, the fourth vector i and the fourth vector j have no association relation, and the element b of the ith column and the jth row_i,jThe value may be 0. Or when the fourth vector i and the fourth vector j have an association relationship, the element b of the ith column and the jth row_i,jThe value can be 0, no association exists between the fourth vector i and the fourth vector j, and the element b of the ith column and the jth row_i,jThe value may be 1.

In one example, matrix B is converted to rank and the resulting matrix B is used^TThe same as matrix B. That is, b_i,j＝b _j,i. The association between the fourth vector i and the fourth vector j at this time may be non-directional.

In one example, matrix B is converted to rank and the resulting matrix B is used^TUnlike matrix B. That is, b_i,j≠b _j,i. The correlation between the fourth vector i and the fourth vector j is directional at this time. E.g. b_i,jIndicating that there is an association between the fourth vector i and the fourth vector j, b_j,iIndicating that an association exists between the fourth vector i and the fourth vector j, wherein the fourth vector j points to the fourth vector i. Or, b_i,jIndicating that there is an association between the fourth vector i and the fourth vector j, b_j,iIndicating that an association exists between the fourth vector i and the fourth vector j, wherein the fourth vector i points to the fourth vector j.

In order to avoid the difficulty in calculation caused by the overlarge number of the matrixes, the fourth incidence relation matrix can be compressed to obtain a matrix with a smaller dimension.

In one example, assuming that the fourth incidence relation matrix B is an l × l matrix, values of all elements on the fourth incidence relation matrix B, which are diagonally spaced from the fourth incidence relation matrix B by more than l 'elements, are all 0 or 1, and l' < l, the matrix may be divided into a plurality of small matrices, where the maximum row number of the small matrices is l 'and the maximum column number of the small matrices is l'. This process may also be referred to as thinning the fourth incidence relation matrix B.

In one example, assuming that the fourth incidence relation matrix B cannot be thinned, the fourth incidence relation matrix B may be compressed according to a spectral clustering method.

604, inputting the fourth vectors and the third correlation information into the second neural network model to obtain a first processing result for first data to be trained, where the first data to be trained is any one of the data to be trained, and the first processing result is used to correct the weight parameter of the second neural network model.

That is, the output result of the first neural network model and the association relation inside the output result are input to the second neural network model. Inputting a plurality of fourth vectors into the second neural network model, it can be understood that a plurality of features of the data to be trained are input into the second neural network model. Inputting the third correlation information into the second neural network model may be understood as inputting information whether any two fourth vectors of the plurality of fourth vectors have an influence between them into the second neural network model. The plurality of fourth vectors may be understood as nodes in the graph model, and the third association information may be used to indicate whether edges exist between the nodes. Thus, the second neural network model may be a graph neural network model.

The second neural network model processes the plurality of fourth vectors and the third correlation information, and may determine whether any two fourth vectors have influence and what specific influence degree is according to a weight parameter in the second neural network model, so as to obtain a processing result of the first data to be trained. The processing result of the first data to be trained may be a feature representation of the first data to be trained, or may be a recognition result of the first data to be trained. The result of the processing of the first data to be trained may be a vector.

Assuming that the plurality of fourth vectors are l fourth vectors, each using y₁，…，y _lAnd (4) showing. Wherein l is not less than i and not more than l, l is not less than t and not more than s,

then, combining the plurality of fourth vectors may result in a matrix Y, Y ═ Y₁，…，y _i，…，y _l}. Assume that the third correlation information is the fourth correlation matrix Q mentioned above.

Firstly, assume two weight matrixes W to be trained₁、W ₂、…、W _h。W ₁、W ₂、…、W _hAll dimensions of (are s_h. Means W₁、W ₂、…、W _hAll contain s_hA weight parameter. s_hS/h, where h is used to represent the number of heads of the graph attention neural network (the number of heads may also be referred to as the number of slices). s_hCommonly referred to as the single head dimension.

At this time, U is calculated respectively₁＝Y·W ₁，U ₂＝Y·W ₂，…，U _h＝Y·W _h. Obviously, U is now₁、U ₂、…、U _hAll dimensions of (are l s_h。

Then calculate V_i,j＝U _i·U _j ^TI is not equal to j, i is more than or equal to 1 and less than or equal to h, and j is more than or equal to 1 and less than or equal to h. At this time V_i,jThe dimension is already l x l. Then to V_i,jApplying Softmax function to each row to obtain normalized probability to obtain R_i,j。R _i,jStill as a matrix of l x lThis matrix can be understood as a matrix of the intensity of mutual attention between each point.

Then to R_i,jMultiplying the E by the Q implementation matrix counterpoint element to obtain E after Q relation mask_i,j。E _i,jIt can be understood that the related points are screened out according to the edge relation, the attention among the related points is kept, and the attention of the unrelated points is not kept. The matrix contains a large amount of mutual correlation information of the nodes, so that the information content is rich. Then with E_i,j·U _iThe final expression U of each point updated by other point information can be obtained_inew。U _inewIs l s_h。

Finally, U is put_1new，…，U _inew，…，U _hnewSplicing to obtain Y' matrix, Y ═ U_1new,…,U _inew,…,U _hnewDimension of Y' is l s. It can be seen that Y' contains information about the correlation between nodes and the weight parameter.

If the depth of the graph attention neural network model is h ', namely the graph attention neural network model comprises an h' layer network, Y 'output by the current layer can be input into the next layer network, namely Y' output by the current layer is regarded as Y of the next layer network, and the same or similar data processing process as the above is carried out.

It can be seen that Y 'is unchanged in matrix size compared to Y, but each element in Y' contains information of one or more elements in Y. By integrating the data with the incidence relation, the second neural network model can acquire more information quantity when identifying a certain characteristic, and the identification accuracy is improved.

In one example, the plurality of data to be trained includes first data to be trained, the plurality of data to be trained further includes one or more associated data associated with the first data to be trained, and the second neural network model may combine the influence of the associated data on the first data to be trained according to the third associated relationship information, so as to obtain a processing result corresponding to the first data to be trained. In other words, the second neural network model not only extracts the features of the first data to be trained, but also extracts the features of other data to be trained which have a relationship with the first data to be trained, so that the input data volume in the prediction process is expanded, and the identification accuracy is improved.

In one example, the plurality of data to be trained includes first data to be trained, the first data to be trained may correspond to a target vector, the plurality of fourth vectors further includes one or more associated vectors associated with the target vector, and the plurality of data to be trained includes data to be trained in one-to-one correspondence with the one or more associated vectors. The second neural network model can obtain a processing result corresponding to the first data to be trained by combining the influence of the association vector on the target vector according to the third association relation information. In other words, the second neural network model not only extracts the features of the target vector, but also extracts the features of the associated vector which is associated with the target vector, so that the data processing amount in the prediction process is expanded, and the identification accuracy rate is improved.

In addition, the second neural network model may output a plurality of processing results in one-to-one correspondence with the plurality of data to be trained. That is to say, the second neural network model integrates the plurality of fourth vectors and the incidence relation among the fourth vectors, and outputs a plurality of processing results corresponding to the plurality of data to be trained one by one.

Assuming a scenario, a first association relationship exists between the fourth vector a and the fourth vector B, and a second association relationship exists between the fourth vector a and the fourth vector C, then the degrees of closeness of the associations may be the same or different. For example, two sentences spaced farther apart within the same paragraph are associated less closely, and two sentences spaced closer together within the same paragraph are associated more closely. For another example, the two frames with longer interval duration have a lower degree of closeness of association, and the two frames with shorter interval duration have a higher degree of closeness of association. In order to express the magnitude of the closeness of the two relations, various expression modes are available.

In one example, the third association information is a matrix, and the numerical size of the elements in the matrix is used to indicate how close the association is, and the larger the numerical value is, the more close the association is. However, determining the specific size of the numerical value often introduces redundant human setting or increases the difficulty of training the neural network model.

In one example, in the case where two types of fourth vector groups with close association and distant association exist in the third association information, the fourth association information may be established, and the fourth association information is used to indicate the fourth vector group with close association. That is, the degree of influence between two fourth vectors having close association can be strengthened by the fourth association information.

Optionally, the third correlation information is used to indicate M fourth vector groups, where M is an integer greater than 1, and before the plurality of fourth vectors and the third correlation information are input into the second neural network model to obtain a first processing result for the first data to be trained, the method further includes: acquiring fourth incidence relation information, wherein the fourth incidence relation information is used for indicating M fifth vector groups, the M fifth vector groups belong to the M fourth vector groups, M is smaller than M, and M is a positive integer; inputting the fourth vectors and the third correlation information into the second neural network model to obtain a first processing result for first data to be trained, including: and inputting the fourth vectors, the third association relation information and the fourth association relation information into the second neural network model to obtain the first processing result.

Information indicated in the fourth association information is contained in the third association information. That is, there must be a correlation between two fourth vectors in each fourth vector group that satisfies the a priori assumption.

Assuming that the third association relationship information is the same as or substantially the same as the sixth association relationship information, the third association relationship information may reflect an association relationship between a plurality of data to be trained, and the fourth association relationship information may reflect whether a close association relationship exists between a plurality of data to be trained.

Taking the text data as an example, when it is assumed in the first experiment that there is a correlation between sentences belonging to the same paragraph, the third correlation information may indicate that there is a correlation between different sentences in the same paragraph, and the fourth correlation information may indicate that there is a close correlation between adjacent sentences in the same paragraph.

Taking the picture data as an example, when it is assumed in the first experiment that there is a correlation between two frames of pictures having an interval smaller than 8s, the third correlation information may indicate that there is a correlation between two frames of pictures having an interval smaller than 8s, and the fourth correlation information may indicate that there is a close correlation between two frames of pictures having an interval smaller than 2 s.

Taking video data as an example, when it is assumed in the first experiment that there is an association between two pieces of video with a minimum interval smaller than 8s, the third association relationship information may indicate that there is an association between two pieces of video with a minimum interval smaller than 8s, and the fourth association relationship information may indicate that there is a close association between two pieces of video with a minimum interval smaller than 2 s.

Taking the audio data as an example, when it is assumed in advance that there is an association between two pieces of audio with the minimum interval smaller than 8s, the third association relationship information may indicate that there is an association between two pieces of audio with the minimum interval smaller than 8s, and the fourth association relationship information may indicate that there is a close association between two pieces of audio with the minimum interval smaller than 2 s.

Assuming that the third association information is different from the sixth association information in the above, the third association information may reflect a similarity between the plurality of fourth vectors, and the fourth association information may reflect two fourth vectors of the plurality of fourth vectors, which have a higher similarity.

For example, when it is assumed in the first experiment that the similarity between the two fourth vectors exceeds the preset value, the third association relationship information may indicate that an association exists between the two fourth vectors whose similarity exceeds the preset value 1, and the fourth association relationship information may indicate that an association exists between the two fourth vectors whose similarity exceeds the preset value 2, where the preset value 2 is greater than the preset value 1.

It is to be understood that, similarly to the third correlation information, the fourth correlation information may contain a matrix for representing the m fourth vector groups.

After a first processing result aiming at the first data to be trained is obtained, the weight parameters of the second neural network model can be corrected through a loss function.

In one example, the weight parameters of the second neural network model may be modified using a loss function based on a distance between the label of the first data to be trained and the first processing result. For example, when the distance between the label of the first data to be trained and the first processing result is closer (i.e. the similarity degree is higher), it indicates that the weight parameter is more appropriate, and the correction amplitude of the weight parameter is smaller; when the distance between the label of the first data to be trained and the first processing result is farther (i.e. the similarity degree is lower), it indicates that the weight parameter is not suitable, and the correction amplitude of the weight parameter can be increased.

In one example, the plurality of fourth vectors and the third correlation information are input into the second neural network model, so as to obtain a first processing result for first data to be trained and a second processing result for second data to be trained, where the first data to be trained and the second data to be trained are any two data in the plurality of data to be trained, and a similarity between the first processing result and the second processing result is used to correct a weight parameter of the second neural network model. For example, the similarity between the first processing result and the second processing result is similarity 1, the fourth vector corresponding to the first processing result is fourth vector 1, the fourth vector corresponding to the second processing result is fourth vector 2, and the similarity between the fourth vector 1 and the fourth vector 2 is similarity 2. When the difference between the similarity 1 and the similarity 2 is smaller, the weight parameter is more appropriate, and the correction amplitude of the weight parameter is smaller; when the difference between the similarity 1 and the similarity 2 is small, the weight parameter is not suitable, and the correction amplitude of the weight parameter can be increased.

Optionally, the obtaining a first processing result for the first data to be trained includes: obtaining the first processing result and a second processing result aiming at second data to be trained, wherein the label of the first data to be trained is a first label, the label of the second data to be trained is a second label, and the first data to be trained and the second data to be trained are any two data in the plurality of data to be trained; the method further comprises the following steps: and matching the similarity between the first label and the second label with the similarity between the first processing result and the second processing result to obtain a matching result, wherein the matching result is used for correcting the weight parameter of the second neural network model.

The sixth association relation information mentioned above may not include information of the similarity between the first tag and the second tag, that is, the association relation between the first to-be-processed data and the second to-be-processed data may be unrelated to the similarity between the first tag and the second tag. The sixth association relation information mentioned above may associate a plurality of data for which there is a possibility of association, increasing the data amount of the second neural network model processing data. The similarity between the first label and the second label is used for evaluating whether the first processing result and the second processing result are accurate or not.

Taking the text data as an example, when the first label is a prose and the second label is a treatise, it means that the similarity between the first processing result and the second processing result should be low. When the first processing result is environmental management, the second processing result is energy supply, and the similarity between the first processing result and the second processing result is high, the weight parameter of the second neural network model is not appropriate, and the loss function can be used for correcting the weight parameter of the second neural network model.

Taking the image data as an example, when the first tag is a rabbit and the second tag is a rabbit, it means that the similarity between the first processing result and the second processing result should be higher. When the first processing result is a long ear and the second processing result is a short ear, the similarity between the first processing result and the second processing result is low, which indicates that the weight parameter of the second neural network model may not be appropriate, and the weight parameter of the second neural network model may be corrected using the loss function.

Taking the video data as an example, when the first tag is a conference and the second tag is a vehicle, it means that the similarity between the first processing result and the second processing result should be low. When the first processing result is project investigation and the second processing result is road traffic, the similarity between the first processing result and the second processing result is low, which indicates that the weight parameters of the second neural network model may be appropriate, and the correction amplitude of the weight parameters of the second neural network model with the loss function is small.

Taking the audio data as an example, when the first tag is a bug sound, the second tag is also a bug sound, which means that the similarity between the first processing result and the second processing result should be higher. When the first treatment result is mosquito and the second treatment result is fly, the similarity between the first treatment result and the second treatment result is high, which indicates that the weight parameters of the second neural network model may be appropriate, and the correction amplitude of the weight parameters of the second neural network model with the loss function is small.

One possible form of the loss function loss is given below.

Wherein, y_i' denotes a processing result i, y for data i to be trained_j' denotes a processing result j, z for data j to be trained_iLabels i, z representing data i to be trained_jLabel j representing data j to be trained. Function C (y)_i’,y _j') indicates the similarity between the processing result i and the processing result j, and a function C (z)_i,z _j) Indicating the similarity of label i to label j. The matrix D may be a matrix for amplifying the similarity of the processing result i and the processing result j.

For example, there are labels a, b, c. When the label of the data i to be trained includes the label a, and does not include the labels b and c, the label of the data i to be trained can be represented by (1, 0, 0). For example, there are labels a, b, c. When the label of the data i to be trained includes the label b, and does not include the labels a and c, the label of the data i to be trained can be represented by (0, 1, 0). When the label of the data i to be trained includes the label a and the label c, but does not include the label b, the label of the data i to be trained can be represented by (1, 0, 1). When the labels of the data i to be trained include label a, label b, and label c, the labels of the data i to be trained can be represented by (1, 1, 1).

Optionally, the plurality of data to be trained includes one or more target type data, and each target type data has a label for modifying the weight parameter.

That is, the plurality of data to be trained includes first type data and second type data, the data to be trained belonging to the first type data has a label, and the data to be trained belonging to the second type data does not have a label. Therefore, the weight parameter of the second neural network model can be corrected according to the distance between the processing result of the first type of data and the tag of the first type of data. The distance between the processing result of the first type data and the label of the first type data can be understood as the similarity between the processing result of the first type data and the label of the first type data. The specific calculation method of the information distance can be in modes of cross entropy, KL divergence, JS divergence and the like. The second type of data does not have a tag, but may be introduced in the process of obtaining the processing result of the first type of data, since there may be an association between the first type of data and the second type of data. That is, the second neural network model may be a semi-supervised model, i.e., the plurality of data to be trained may include data without tags. In order to ensure the training reliability of the second neural network model, the proportion of the first type data to the plurality of data to be trained is generally not less than 5% -10%.

Optionally, the first processing result is further used for modifying the weight parameter of the first neural network model.

That is, the first processing result may be used to modify the weight parameters of the first neural network model in addition to the weight parameters of the second neural network model.

In one example, the first processing result and the label of the first data to be trained may be input into a loss function of the first neural network model, and the weight parameter of the first neural network model may be modified.

Before inputting a plurality of data to be trained into the first neural network model, the first neural network model may be a neural network model that is not limited by a scenario or is less constrained by a scenario. The plurality of data to be trained may be data of a specific scene, and therefore, the weight parameter of the first neural network model may be modified according to the first processing result, so that the first neural network model can adapt to the specific scene.

The effects that the first neural network model and the second neural network model can achieve in training and prediction are described below by specific examples.

Example 1

All pictures shot by all cameras of a certain company in a certain month are obtained, and the total number of the pictures is about 10 ten thousand. 9 of the 10 pictures are input into the first neural network model as a plurality of data to be trained, wherein each picture can be one data to be trained. The remaining 1 million pictures may be used as verification data for verifying whether the weight parameters of the second neural network model are appropriate. For convenience of description, the 9 ten thousand pictures constitute a training data set, and the 1 ten thousand pictures constitute a verification data set.

And 1 ten thousand pictures in the training data set are selected as the first type data with the labels, and then the remaining 8 ten thousand pictures in the training data set are the second type data without the labels. A tag for the first type of data is obtained.

And processing the training data set by using the first neural network model to obtain 9 ten thousand fourth vectors which are in one-to-one correspondence with the training data set. The first neural network model may be a Multiple Granularity Network (MGN) model. The multi-granular network model is a convolutional neural network model. Each fourth vector may comprise 1024 elements, each fourth vector being a feature representation of one picture.

A priori assumptions are obtained. The a priori assumptions may be, for example, one or more of the following:

(1) an association relationship exists between two pictures within the interval duration of 8 s.

(2) There is an association between two pictures originating from the same camera.

(3) And the two pictures with the image similarity larger than 50% have an association relation.

It should be understood that the specific content of the a priori assumption is related to the scenario to which the first neural network model and the second neural network model are applied, and is not limited herein.

From the a priori assumption, third association information indicating associations between 9 ten thousand fourth vectors may be determined.

Inputting 9 thousands of fourth vectors and third association relation information into a second neural network model to obtain a processing result aiming at the first type of data. And the first type data and the second type data can be related, so that the content of the second type data is considered in the processing result of the first type data.

The parameters of the second neural network model may be modified by matching the results of the processing of the first type of data with the tags of the first type of data.

Inputting data in the verification data set into the first neural network model to obtain a plurality of fourth vectors aiming at the verification data set; and inputting a plurality of fourth vectors aiming at the verification data set into the second neural network model, and inputting the incidence relation among the plurality of fourth vectors aiming at the verification data set into the second neural network model according to the prior assumption to obtain a data processing result aiming at the verification data set. And matching the data processing result with the label of the verification data set to obtain the identification capability of the first neural network model and the second neural network model. And (3) scoring the trained neural network model by adopting mean average precision (mAP) through practical application. Compared with the traditional neural network model, the scoring result can be improved by 4-20 points. That is, the method of training a neural network model provided herein may enhance the neural network model.

Example two

The method comprises the steps of obtaining Chinese text problems collected in a certain month by a certain company robot, wherein the total number of the Chinese text problems is about 1.5 ten thousand. Inputting 0.8 ten thousand Chinese text questions in the 1.5 ten thousand Chinese text questions as a plurality of data to be trained into the first neural network model, wherein each Chinese text question can be one data to be trained. The remaining 0.7 ten thousand Chinese text questions can be used as verification data for verifying whether the weight parameters of the second neural network model are appropriate. For example and for ease of description, the 0.8 ten thousand Chinese text questions constitute a training data set, and the 0.7 ten thousand Chinese text questions constitute a verification data set.

Selecting 0.2 ten thousand Chinese text problems in the training data set as the first type data with the label, and then, the remaining 0.6 ten thousand Chinese text problems in the training data set are the second type data without the label. A tag for the first type of data is obtained.

And processing the training data set by using the first neural network model to obtain 0.8 ten thousand fourth vectors which are in one-to-one correspondence with the training data set. The first neural network model may be a transformer-based bi-directional encoder representation (BERT) model. The BERT model may be a convolutional neural network model. Each fourth vector may include 768 elements, each fourth vector being a feature representation of a chinese text question.

(1) an incidence relation exists between two Chinese text problems with the same key words in the text.

(2) An association relationship exists between two Chinese text problems with text similarity greater than 50%.

Third association information indicating an association between 0.8 ten thousand fourth vectors may be determined according to an a priori assumption.

And inputting 0.8 ten thousand fourth vectors and the third correlation information into the second neural network model to obtain a processing result aiming at the first type of data. And the first type data and the second type data can be related, so that the content of the second type data is considered in the processing result of the first type data.

Inputting data in the verification data set into the first neural network model to obtain a plurality of fourth vectors aiming at the verification data set; and inputting a plurality of fourth vectors aiming at the verification data set into the second neural network model, and inputting the incidence relation among the plurality of fourth vectors aiming at the verification data set into the second neural network model according to the prior assumption to obtain a data processing result aiming at the verification data set. And matching the data processing result with the label of the verification data set to obtain the identification capability of the first neural network model and the second neural network model. Through practical application, the trained neural network model is scored by adopting a mean average precision (mAP), and compared with the traditional neural network model, the scoring result can be improved by 10-15 points. That is, the method of training a neural network model provided herein may enhance the neural network model.

Fig. 8 is a hardware structure diagram of a data processing device according to an embodiment of the present application. The data processing apparatus 700 shown in fig. 8 (the apparatus 700 may be a computer apparatus) includes a memory 701, a processor 702, a communication interface 703, and a bus 704. The memory 701, the processor 702, and the communication interface 703 are communicatively connected to each other via a bus 704.

The memory 701 may be a Read Only Memory (ROM), a static memory device, a dynamic memory device, or a Random Access Memory (RAM). The memory 701 may store a program, and when the program stored in the memory 701 is executed by the processor 702, the processor 702 is configured to execute the steps of the method for processing data shown in fig. 6 in the embodiment of the present application. Optionally, the processor 702 is further configured to perform the steps of the method for training a neural network model shown in fig. 7 in the embodiment of the present application.

The processor 702 may be a general-purpose Central Processing Unit (CPU), a microprocessor, an Application Specific Integrated Circuit (ASIC), a Graphics Processing Unit (GPU), or one or more integrated circuits, and is configured to execute related programs to implement the data processing method shown in fig. 6 in the embodiment of the present application. Alternatively, the processor 702 may adopt a general-purpose Central Processing Unit (CPU), a microprocessor, an Application Specific Integrated Circuit (ASIC), a Graphics Processing Unit (GPU) or one or more integrated circuits, and is configured to execute a relevant program to implement the method for training a neural network model shown in fig. 7 in this embodiment of the present application.

The processor 702 may also be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the data processing method shown in fig. 6 in the embodiment of the present application may be implemented by integrated logic circuits of hardware in the processor 702 or instructions in the form of software. Alternatively, in the embodiment of the present application, each step of the method for training the neural network model shown in fig. 7 may be performed by an integrated logic circuit of hardware in the processor 702 or an instruction in the form of software.

The processor 702 may also be a general-purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, or discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 701, and the processor 702 reads information in the memory 701, and completes functions required to be performed by units included in the data processing device according to the embodiment of the present application in combination with hardware thereof, or performs the method of data processing shown in fig. 6 according to the embodiment of the present application. Optionally, the method is also used for executing the method for training the neural network model shown in fig. 7 in the embodiment of the present application

Communication interface 703 enables communication between device 700 and other devices or communication networks using transceiver devices, such as, but not limited to, transceivers. For example, information of the neural network to be constructed and data to be processed (such as data to be processed in the embodiment shown in fig. 6) required in the process of constructing the neural network can be acquired through the communication interface 703. Optionally, information of the neural network to be constructed and data to be trained (such as the data to be trained in the embodiment shown in fig. 7) required in the process of constructing the neural network may be acquired through the communication interface 703.

Bus 704 may include a pathway to transfer information between various components of device 700, such as memory 701, processor 702, and communication interface 703.

It is to be understood that the obtaining means in the data processing device may correspond to the communication interface 703 in the data processing device 700; the processing module in the data processing device may correspond to the processor 702.

Fig. 9 is a hardware structural diagram of an apparatus for training a neural network model according to an embodiment of the present application. The apparatus 800 for training a neural network model shown in fig. 9 (the apparatus 800 may be a computer apparatus) includes a memory 801, a processor 802, a communication interface 803, and a bus 804. The memory 801, the processor 802, and the communication interface 803 are communicatively connected to each other via a bus 804.

The memory 801 may be a Read Only Memory (ROM), a static memory device, a dynamic memory device, or a Random Access Memory (RAM). The memory 801 may store a program, and when the program stored in the memory 801 is executed by the processor 802, the processor 802 is configured to perform the steps of the method for training a neural network model shown in fig. 7 in the embodiment of the present application.

The processor 802 may be a general-purpose Central Processing Unit (CPU), a microprocessor, an Application Specific Integrated Circuit (ASIC), a Graphics Processing Unit (GPU), or one or more integrated circuits, and is configured to execute related programs to implement the method for training a neural network model shown in fig. 7 in this embodiment of the present application.

The processor 802 may also be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the method for training a neural network model shown in fig. 7 in the embodiment of the present application may be implemented by integrated logic circuits of hardware in the processor 802 or instructions in the form of software.

The processor 802 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, or discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 801, and the processor 802 reads information in the memory 801, and completes functions required to be performed by units included in the neural network model training apparatus according to the embodiment of the present application in combination with hardware of the processor, or performs the method for training a neural network model shown in fig. 7 according to the embodiment of the present application.

The communication interface 803 enables communication between the device 800 and other devices or communication networks using transceiver devices, such as, but not limited to, transceivers. For example, information of the neural network to be constructed and training data (such as the data to be trained in the embodiment shown in fig. 7) required in constructing the neural network can be acquired through the communication interface 803.

Bus 804 may include a pathway to transfer information between various components of device 800, such as memory 801, processor 802, and communication interface 803.

It should be understood that the obtaining module in the neural network model training device may correspond to the communication interface 803 in the neural network model training device 800; the processing module in the neural network model training device may correspond to the processor 802.

It should be noted that although the above-described devices 700, 800 show only memories, processors, and communication interfaces, in particular implementations, those skilled in the art will appreciate that the devices 700, 800 may also include other components necessary to achieve proper operation. Also, those skilled in the art will appreciate that the apparatus 700, 800 may also include hardware components to implement other additional functions, according to particular needs. Furthermore, those skilled in the art will appreciate that the apparatus 700, 800 may also include only those components necessary to implement the embodiments of the present application, and not necessarily all of the components shown in fig. 8, 9.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

A method of data processing, comprising:

acquiring a plurality of data to be processed;

processing the plurality of data to be processed by using a first neural network model to obtain a plurality of first vectors which are in one-to-one correspondence with the plurality of data to be processed, wherein the first neural network model is obtained based on general data training;

acquiring first incidence relation information, wherein the first incidence relation information is used for indicating at least one first vector group, and each first vector group comprises two first vectors meeting a priori assumption;

and inputting the plurality of first vectors and the first incidence relation information into a second neural network model to obtain a processing result aiming at first data to be processed, wherein the first data to be processed is any one of the plurality of data to be processed.
The method according to claim 1, wherein the first association information is used to indicate N first vector groups, where N is an integer greater than 1, and before the inputting the plurality of first vectors and the first association information into a second neural network model to obtain a processing result for the first data to be processed, the method further comprises:

acquiring second incidence relation information, wherein the second incidence relation information is used for indicating N second vector groups, the N second vector groups belong to the N first vector groups, N is smaller than N, and N is a positive integer;

inputting the plurality of first vectors and the first incidence relation information into a second neural network model to obtain a processing result for first data to be processed, including:

and inputting the plurality of first vectors, the first incidence relation information and the second incidence relation information into the second neural network model to obtain a processing result aiming at the first to-be-processed data.
The method of claim 1 or 2, wherein the obtaining a plurality of data to be processed comprises:

acquiring target data, wherein the target data is one of the plurality of data to be processed;

acquiring association data, wherein the association data and the target data have an association relation meeting the prior assumption, and the plurality of data to be processed comprise the association data.
The method according to any one of claims 1 to 3, wherein the first association information includes an association matrix, a vector in a first dimension in the association matrix includes a plurality of elements in one-to-one correspondence with the plurality of first vectors, and a vector in a second dimension in the association matrix includes a plurality of elements in one-to-one correspondence with the plurality of first vectors, wherein any element in the association matrix is used to indicate whether there is an association between the vector corresponding to the any element in the first dimension and the vector corresponding to the any element in the second dimension, which satisfies the prior assumption.
The method of any one of claims 1 to 4, wherein the weight parameters of the second neural network model are obtained by:

acquiring a plurality of data to be trained;

processing the multiple data to be trained by using the first neural network model to obtain multiple fourth vectors which correspond to the multiple data to be trained one by one;

obtaining third association relation information, wherein the third association relation information is used for indicating at least one third vector group, and each third vector group comprises two fourth vectors meeting the prior hypothesis;

and inputting the fourth vectors and the third correlation information into the second neural network model to obtain a first processing result aiming at first data to be trained, wherein the first data to be trained is any one of the data to be trained, and the first processing result is used for correcting the weight parameters of the second neural network model.
The method of claim 5, wherein obtaining the first processing result for the first data to be trained comprises:

obtaining the first processing result and a second processing result aiming at second data to be trained, wherein the label of the first data to be trained is a first label, the label of the second data to be trained is a second label, and the first data to be trained and the second data to be trained are any two data in the plurality of data to be trained;

the method further comprises the following steps:

and matching the similarity between the first label and the second label with the similarity between the first processing result and the second processing result to obtain a matching result, wherein the matching result is used for correcting the weight parameter of the second neural network model.
The method according to claim 5 or 6, wherein the third correlation information is used to indicate M third vector groups, M being an integer greater than 1, and before the inputting the plurality of fourth vectors and the third correlation information into the second neural network model to obtain the first processing result for the first data to be trained, the method further comprises:

acquiring fourth incidence relation information, wherein the fourth incidence relation information is used for indicating M fourth vector groups, the M fourth vector groups belong to the M third vector groups, M is smaller than M, and M is a positive integer;

inputting the fourth vectors and the third correlation information into the second neural network model to obtain a first processing result for first data to be trained, including:

and inputting the fourth vectors, the third association relation information and the fourth association relation information into the second neural network model to obtain the first processing result.
The method of any of claims 5 to 7, wherein the first processing result is further used to modify a weight parameter of the first neural network model.
The method of any one of claims 5 to 8, wherein the plurality of data to be trained comprises one or more target type data, each target type data having a label for modifying the weight parameter.
A method of training a neural network model, comprising:

acquiring a plurality of data to be trained;

processing the multiple data to be trained by using a first neural network model to obtain multiple fourth vectors which correspond to the multiple data to be trained one by one;

obtaining third association relation information, wherein the third association relation information is used for indicating at least one third vector group, and each third vector group comprises two fourth vectors meeting the prior hypothesis;

and inputting the fourth vectors and the third correlation information into a second neural network model to obtain a first processing result aiming at first data to be trained, wherein the first data to be trained is any one of the data to be trained, and the first processing result is used for correcting the weight parameters of the second neural network model.
The method of claim 10, wherein obtaining the first processing result for the first data to be trained comprises:

obtaining the first processing result and a second processing result aiming at second data to be trained, wherein the label of the first data to be trained is a first label, the label of the second data to be trained is a second label, and the first data to be trained and the second data to be trained are any two data in the plurality of data to be trained;

the method further comprises the following steps:

and matching the similarity between the first label and the second label with the similarity between the first processing result and the second processing result to obtain a matching result, wherein the matching result is used for correcting the weight parameter of the second neural network model.
The method according to claim 10 or 11, wherein the third correlation information is used to indicate M third vector groups, and before the inputting the plurality of fourth vectors and the third correlation information into the second neural network model to obtain the first processing result for the first data to be trained, the method further comprises:

acquiring fourth incidence relation information, wherein the fourth incidence relation information is used for indicating M fourth vector groups, the M fourth vector groups belong to the M third vector groups, M is smaller than M, and M is a positive integer;

inputting the fourth vectors and the third correlation information into the second neural network model to obtain a first processing result for first data to be trained, including:

and inputting the fourth vectors, the third association relation information and the fourth association relation information into the second neural network model to obtain the first processing result.
The method of any one of claims 10 to 12, wherein the first processing result is further used to modify a weight parameter of the first neural network model.
The method of any one of claims 10 to 13, wherein the plurality of data to be trained comprises one or more target type data, each target type data having a label for modifying the weight parameter.
An apparatus for data processing, comprising:

the acquisition module is used for acquiring a plurality of data to be processed;

the processing module is used for processing the data to be processed by using a first neural network model to obtain a plurality of first vectors which are in one-to-one correspondence with the data to be processed, wherein the first neural network model is obtained based on general data training;

the obtaining module is further configured to obtain first association relationship information, where the first association relationship information is used to indicate at least one first vector group, and each first vector group includes two first vectors that satisfy a priori assumption;

the processing module is further configured to input the plurality of first vectors and the first incidence relation information into a second neural network model, so as to obtain a processing result for first to-be-processed data, where the first to-be-processed data is any one of the plurality of to-be-processed data.
The apparatus of claim 15, wherein the first association information is used to indicate N first vector groups, N being an integer greater than 1, before the processing module inputs the plurality of first vectors and the first association information into a second neural network model to obtain a processing result for the first data to be processed,

the obtaining module is further configured to obtain second association relationship information, where the second association relationship information is used to indicate N second vector groups, where the N second vector groups belong to the N first vector groups, N is smaller than N, and N is a positive integer;

the processing module is specifically configured to input the plurality of first vectors, the first incidence relation information, and the second incidence relation information into the second neural network model, so as to obtain a processing result for the first to-be-processed data.
The device according to claim 15 or 16, wherein the obtaining module is specifically configured to:

acquiring target data, wherein the target data is one of the plurality of data to be processed;

acquiring association data, wherein the association data and the target data have an association relation meeting the prior assumption, and the plurality of data to be processed comprise the association data.
The apparatus according to any one of claims 15 to 17, wherein the first association information comprises an association matrix, a vector in a first dimension of the association matrix comprises a plurality of elements in one-to-one correspondence with the plurality of first vectors, and a vector in a second dimension of the association matrix comprises a plurality of elements in one-to-one correspondence with the plurality of first vectors, wherein any element of the association matrix is used to indicate whether there is an association between the vector corresponding to the any element in the first dimension and the vector corresponding to the any element in the second dimension, which satisfies the prior assumption.
The apparatus according to any one of claims 15 to 18,

the acquisition module is further used for acquiring a plurality of data to be trained;

the processing module is further configured to process the multiple data to be trained by using the first neural network model to obtain multiple fourth vectors corresponding to the multiple data to be trained one by one;

the obtaining module is further configured to obtain third association relationship information, where the third association relationship information is used to indicate at least one third vector group, and each third vector group includes two fourth vectors that satisfy the prior hypothesis;

the processing module is further configured to input the plurality of fourth vectors and the third correlation information into the second neural network model to obtain a first processing result for first data to be trained, where the first data to be trained is any one of the plurality of data to be trained, and the first processing result is used to correct a weight parameter of the second neural network model.
The apparatus of claim 19,

the processing module is specifically configured to obtain the first processing result and a second processing result for second data to be trained, where a label of the first data to be trained is a first label, and a label of the second data to be trained is a second label;

the processing module is further configured to match the similarity between the first tag and the second tag with the similarity between the first processing result and the second processing result to obtain a matching result, and the matching result is used to correct the weight parameter of the second neural network model.
The apparatus of claim 19 or 20, wherein the third correlation information is used to indicate M third vector groups, M being an integer greater than 1, before the processing module inputs the plurality of fourth vectors and the third correlation information into the second neural network model to obtain the first processing result for the first data to be trained,

the obtaining module is further configured to obtain fourth association relationship information, where the fourth association relationship information is used to indicate M fourth vector groups, the M fourth vector groups belong to the M third vector groups, M is smaller than M, and M is a positive integer;

the processing module is specifically configured to input the plurality of fourth vectors, the third association relationship information, and the fourth association relationship information into the second neural network model, so as to obtain the first processing result.
The apparatus of any one of claims 19 to 21, wherein the first processing result is further used to modify a weight parameter of the first neural network model.
The apparatus of any one of claims 19 to 22, wherein the plurality of data to be trained comprises one or more target type data, each target type data having a label for modifying the weight parameter.
An apparatus for training a neural network model, comprising:

the acquisition module is used for acquiring a plurality of data to be trained;

the processing module is used for processing the data to be trained by using a first neural network model to obtain a plurality of fourth vectors which are in one-to-one correspondence with the data to be trained;

the obtaining module is further configured to obtain third association relationship information, where the third association relationship information is used to indicate at least one third vector group, and each third vector group includes two fourth vectors that satisfy the prior hypothesis;

the processing module is further configured to input the plurality of fourth vectors and the third correlation information into a second neural network model to obtain a first processing result for first data to be trained, where the first data to be trained is any one of the plurality of data to be trained, and the first processing result is used to correct a weight parameter of the second neural network model.
The apparatus according to claim 24, wherein the processing module is specifically configured to obtain the first processing result and a second processing result for second data to be trained, where a label of the first data to be trained is a first label, and a label of the second data to be trained is a second label;

the processing module is further configured to match the similarity between the first tag and the second tag with the similarity between the first processing result and the second processing result to obtain a matching result, where the matching result is used to correct the weight parameter of the second neural network model.
The apparatus according to claim 24 or 25, wherein the third correlation information is used to indicate M third vector groups, before the processing module is used to input the plurality of fourth vectors and the third correlation information into the second neural network model to obtain the first processing result for the first data to be trained,

the obtaining module is further configured to obtain fourth association relationship information, where the fourth association relationship information is used to indicate M fourth vector groups, the M fourth vector groups belong to the M third vector groups, M is smaller than M, and M is a positive integer;

the processing module is specifically configured to input the plurality of fourth vectors, the third association relationship information, and the fourth association relationship information into the second neural network model, so as to obtain the first processing result.
The apparatus of any one of claims 24 to 26, wherein the first processing result is further used to modify a weight parameter of the first neural network model.
The apparatus of any one of claims 24 to 27, wherein the plurality of data to be trained comprises one or more target type data, each target type data having a label for modifying the weight parameter.
A computer-readable storage medium, characterized in that the computer-readable medium stores program code for execution by a device, the program code comprising instructions for performing the method of any of claims 1-14.
A chip comprising a processor and a data interface, the processor reading instructions stored on a memory through the data interface to perform the method of any one of claims 1-14.