WO2021014746A1 - Procédé de traitement d'informations, dispositif de traitement d'informations et programme de traitement d'informations - Google Patents

Procédé de traitement d'informations, dispositif de traitement d'informations et programme de traitement d'informations Download PDF

Info

Publication number
WO2021014746A1
WO2021014746A1 PCT/JP2020/020612 JP2020020612W WO2021014746A1 WO 2021014746 A1 WO2021014746 A1 WO 2021014746A1 JP 2020020612 W JP2020020612 W JP 2020020612W WO 2021014746 A1 WO2021014746 A1 WO 2021014746A1
Authority
WO
WIPO (PCT)
Prior art keywords
information processing
error correction
vector
neural network
label
Prior art date
Application number
PCT/JP2020/020612
Other languages
English (en)
Japanese (ja)
Inventor
井手 直紀
アンドリュー シン
顕生 早川
Original Assignee
ソニー株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ソニー株式会社 filed Critical ソニー株式会社
Publication of WO2021014746A1 publication Critical patent/WO2021014746A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • This disclosure relates to information processing methods, information processing devices, and information processing programs.
  • a technique for estimating the category to which the data belongs from the data of the classification problem using a learning model constructed by machine learning such as supervised learning see, for example, Patent Document 1).
  • Machine learning that uses a neural network (deep neural network) as a learning model is often called deep learning.
  • a neural network that handles a classification problem is composed of an arithmetic unit that calculates a feature vector from data and an arithmetic unit that calculates the probability that it belongs to each category of data from the feature vector.
  • this disclosure proposes an information processing method, an information processing device, and an information processing program that can improve the estimation accuracy of the classification problem and reduce the parameters and the amount of calculation of the model.
  • an information processing method uses a neural network to estimate the category to which the input data belongs.
  • the neural network calculates a feature vector from the input data, and based on the feature vector, calculates the probability or score of the category to which the input data belongs by using a decoding operation corresponding to a predetermined error correction coding. To do.
  • Deep learning In the world of machine learning, deep learning technology that learns neural network parameters from a large amount of data is sweeping. In deep learning, parameter learning is realized by repeating parameter updates using a parameter gradient (gradient method) so as to reduce the objective function called loss (loss function).
  • gradient method gradient method
  • loss loss function
  • the label to be derived from the data is the category (class) to which the data belongs, the label will be the category ID (discrete value).
  • a classification problem with a plurality of classes is sometimes called a multi-class classification problem. Multiclass classification problems usually assume that the data is in one of a predetermined class.
  • the neural network calculates a vector (hereinafter, also referred to as a logit vector) representing classiness from the input data.
  • the parameters are learned so as to reduce the error between the classification executed by the neural network and the correct answer.
  • the class to be classified is determined based on the vector representing the class-likeness.
  • a classification problem in which multiple labels are assigned to one piece of data is called a multi-label problem.
  • the data is allowed to be one or more of the predetermined classes.
  • a vector having the same number of dimensions as the number of classes, where the part represented by the label is 1 and the other parts are 0 is referred to as a label vector.
  • the label vector In a normal multi-class problem, the label vector has 1 for only one component and 0 for the others. Such a vector is called a one-hot vector. On the other hand, in the multi-label problem, the label vector has 1 at a plurality of components. Such a vector is called a multi-hot vector.
  • an object of the present disclosure is to provide an information processing method for reducing errors in a classification problem of machine learning.
  • the neural network that estimates the category to which the data belongs from the data is a calculation unit that calculates the feature vector from the data and the calculation that calculates the degree of belonging / non-affiliation (hereinafter referred to as class score, probability) for each category from the feature vector. It consists of parts.
  • an object of the present disclosure is to reduce the parameters and operations of the process of calculating the class score from the feature vector in the deep learning neural network.
  • the error correction code is known as a technique for accurately transmitting information in communication technology.
  • information is binary-encoded between remote locations, further transmitted as a physical signal, and the "original information" is restored from the obtained signal. At this time, an error may occur in the transmitted information due to noise in the transmission of the physical signal.
  • Error correction is one of the techniques for reducing the error of this information.
  • the classification problem is considered to be a problem of restoring a class that is "original information” from "physical signals" such as image data and audio data obtained by converting "information" that represents a class. Then, since the classification error can be considered as an information error, it can be expected that the classification error can be reduced by using the error correction technology. Therefore, the present disclosure provides a method of reducing errors in a classification problem in machine learning by using an error correction technique in the communication field.
  • ECOC Error Correcting Output Code
  • the high-performance decoding method for error correction is not applicable because ECOC uses the Hamming distance for decoding using probability. Therefore, the present disclosure provides a method of reducing errors by using an error correction technique using probability in a classification problem of deep learning.
  • development framework for deep learning In the development of deep learning, it is common to use an application called a development framework that selects and combines function layer groups required for a neural network to be deep-learned and optimization solver groups to be used during learning.
  • the stacking of layers of a neural net is programmed in a script language (for example, Python), or visually programmed via a graphical user interface (GUI).
  • a script language for example, Python
  • GUI graphical user interface
  • the present disclosure provides the following method as a solution to a problem of improving performance by using probabilistic error correction in deep learning.
  • the first solution is a method of combining learning by the ECOC method with probability decoding as run-time error correction.
  • LDPC Low Density Parity Check
  • turbo coding is used as the coding method in order to use probability decoding.
  • LDPC Low Density Parity Check
  • turbo coding is used as the coding method in order to use probability decoding.
  • the maximum posteriori decoding As the actual probability decoding, the maximum posteriori decoding, the BCJR (Bahl Cocke Jelinek Raviv) method, or the thumb product method is used. Furthermore, by preparing a function that realizes these processes as a deep learning layer that can be used in an execution network, it is possible to realize a consistent configuration as a deep learning neural network. This makes it possible to correct errors using probabilities and generate neural networks using frameworks.
  • the second solution is a deep learning layer that allows "error backpropagation" so that the error correction layer by maximum a posteriori decoding, BCJR method, thumb product decoding, etc. in the first solution can be used in the learning network.
  • the computer learns the neural network using error backpropagation.
  • the error correction layer By performing error back propagation through the error correction layer in this way, a coding method corresponding to the reverse processing of error correction is embedded in the parameters of the neural network.
  • the error correction layer is realized only by a combination of functions capable of error back propagation. Then, this combination itself is put together to form an error correction layer again.
  • this disclosure is solved by adding an "error correction layer" to the framework as a method of easily using the error correction technology in the deep learning framework.
  • FIG. 1 is an explanatory diagram of a learning neural network according to the first embodiment of the present disclosure.
  • t shown in FIG. 1 is a label of the classification problem.
  • X is the data of the classification problem.
  • Loss is the value of the loss function.
  • the configuration of the learning neural network shown in FIG. 1 is as follows.
  • Label Binary vectorization (encoded layer) ⁇ Parity vectorization (encoded layer) ⁇ Encoded label.
  • Data ⁇ Feature extraction network (feature extraction layer) ⁇ Logit calculation network (fully connected layer) ⁇ Logit vector. Loss function (loss layer) ⁇ Cross entropy of logit vector and vector with parity.
  • Binary vectorization of labels can be considered, for example, one-hot vectorization, multi-hot vectorization, binary numbering, and the like.
  • binary vectorization can be considered as one-hot vectorization or binary numbering.
  • the binary vectorization is performed as follows.
  • One-hot vector (0,0,0,1,0,0,0,0,0,0,0,0)
  • binary vector (0,0,1,1).
  • multi-hot vectorization can be considered. For example, when the number of classes is 10 and the labels are 3 and 7, the binary vectorization is performed as follows. Multi-hot vector: (0,0,0,1,0,0,0,1,0,0). Further, in the case of a problem including an unknown class, the multi-hot vector is set to 0 vector. Multi-hot vector: (0,0,0,0,0,0,0,0,0,0,0).
  • the binary vector of the label is encoded with parity.
  • the parity is an error check code generated from the original information. Parity is usually a binary code sequence, like the original information.
  • a low density parity check code is connected to a binary-encoded label, and this is used as a label vector with parity. Label vectors with parity are usually multi-hot vectors.
  • the error correction code used for the error correction coding is a low density parity check code (hereinafter, referred to as “parity check code”).
  • a method of connecting the parity check code to the binary vector of the label for example, Hamming coding (LDPC coding) or turbo coding can be considered.
  • LDPC coding Hamming coding
  • turbo coding a turbo code can be used as the error correction code used for the error correction coding.
  • a memory for storing a pair of a generator matrix and a check matrix is prepared.
  • the length of the original code is m
  • the length of the parity code is k
  • n m + k.
  • a matrix as shown in FIG. 2 is used.
  • a matrix represented by the following equation (2) can be considered.
  • the inspection matrix corresponding to the generator matrix shown in FIG. 2 is the matrix shown in FIG.
  • the parity-signed vector c (channel code) can be generated from the binary vector m (message) by the following equation (3).
  • m T G is a matrix product operation
  • mod2 represents the remainder obtained by dividing by two.
  • the vector with parity thus generated is used as the label vector with parity.
  • the parity may be added by the convolutional code or the turbo code.
  • the feature extraction network for calculating the feature amount from the data
  • the same one as the feature extraction vector calculation network of a normal neural network is used.
  • the input data is image or voice
  • a neural network that combines a convolution layer and a nonlinear active layer can be considered.
  • the input data is a symbol string such as a language
  • a neural network that combines an embed layer or the like can be considered.
  • a fully connected network For the neural network that calculates the logit vector (logarithmic odds vector) from the features, for example, a fully connected network (fully connected layer) is used.
  • the number of dimensions of the input vector is the number of dimensions of the feature vector
  • the number of dimensions of the output vector is the number of classes in the fully connected network.
  • the number of dimensions of the output vector of the fully coupled network is the number of dimensions obtained by adding the number of parity to the number of classes.
  • the label is a binary vector
  • the number of dimensions of the output vector of the fully connected network is the dimension obtained by adding the number of parity to the dimension of the binary representation of the number of classes.
  • the feature vector corresponds to the code length of the error correction code used for the error correction coding (message length (original code length) + parity length (parity code length)). It is a vector with a number of dimensions.
  • the loss function is a classification problem, we use the standard cross entropy.
  • the encoded vector is a binary vector, the binary cross entropy is calculated.
  • the value obtained by averaging the samples for each sample represented by the following equation (5) is used as the loss.
  • is a sigmoid function.
  • learning parameters are embedded in the feature extraction network.
  • the training parameters are updated to the trained parameters.
  • FIG. 4 is a flowchart showing an example of the learning algorithm according to the first embodiment of the present disclosure.
  • a learning network is configured (step S101). Subsequently, the label and the data are sampled from the training data set to generate a mini-batch (step S102).
  • step S103 the gradient regarding the parameters of the loss function is obtained (step S103), and the parameters are updated according to the gradient (step S104). Subsequently, it is determined whether or not the predetermined convergence condition is satisfied (step S105).
  • step S105 when the convergence condition is not satisfied (steps S105, No), the process is moved to step S102, the processing of steps S102 to S105 is repeated until the convergence condition is satisfied, and when the convergence condition is satisfied (steps S105, Yes). ), End the process.
  • steps S102 to S105 may be repeated up to a predetermined maximum number of repetitions.
  • the computer executes the learning algorithm shown in FIG. 4 by using the learning neural network shown in FIG. Specifically, the computer converts the label of the classification problem for learning into a binary vector, adds a parity check code to the binary vector, and executes a process of calculating the label vector with parity.
  • the computer executes a process of extracting the feature amount of the data from the data of the classification problem for learning. Subsequently, the computer converts the feature quantity into a logit vector, adds a parity check code to the logit vector, and executes a process of calculating the logit vector with parity.
  • the computer learns so that the value of the loss function between the logit vector with parity and the label vector with parity is minimized, and executes a process of updating the parameters of the neural network used for label estimation.
  • the label and classification problem data used for learning are vectorized, and a parity check code is further added to machine-learn the neural network parameters.
  • a parity check code is further added to machine-learn the neural network parameters.
  • FIG. 5 is an explanatory diagram of an execution neural network according to the first embodiment of the present disclosure.
  • "x” shown in FIG. 5 is the data of the classification problem.
  • “R” is the estimation result of the label of the classification problem.
  • the configuration of the execution neural network is as follows. Data ⁇ Feature extraction network (feature extraction layer) ⁇ Logit calculation network with parity (fully coupled layer) ⁇ Error correction network (error correction layer) ⁇ Logit vector.
  • the feature extraction network has the same configuration as the feature extraction network of the learning neural network.
  • the logit calculation network with parity has the same configuration as the logit calculation network in the learning neural network.
  • the parameters of these networks the parameters learned in the learning network are used.
  • an error correction network (error correction layer) is newly prepared.
  • An error correction network is a network that calculates a logit vector without parity from a logit vector with parity. This process is a process of calculating the logarithmic odds related to the original code from the logarithmic odds generated from the communication path signal in the error correction network.
  • the decoding operation processing corresponding to this error correction coding posterior distribution maximization decoding (maximum likelihood decoding algorithm), thumb product decoding (sum product algorithm), BCJR decoding (BCJR algorithm), and the like can be considered.
  • the thumb product decoding includes, for example, a probability region thumb product decoding and a logarithmic region thumb product decoding.
  • Logarithmic region thumb-product decoding calculates log odds based on the input channel code and repeats variable node processing and inspection node processing to obtain the log odds of each bit of the "original" parity-coded string. It is an estimation algorithm.
  • FIG. 6 is an image diagram of log-space thumb product decoding according to the first embodiment of the present disclosure.
  • FIG. 7 is an image diagram of a neural network that estimates labels from the classification problem according to the first embodiment of the present disclosure.
  • variable node is shown as a rectangle and the inspection node is shown as a circle. Further, the straight line group connecting the variable node and the inspection node shown in FIGS. 6 and 7 shows a parity inspection matrix.
  • variable node performs the variable node processing described later on the input data and outputs it to the inspection node in the subsequent stage.
  • the inspection node performs the inspection node processing described later using the inspection matrix on the input data and outputs it to the variable node in the subsequent stage.
  • the log odds are calculated by the following equation (6) using the noise intensity and the signal intensity of the Gaussian channel.
  • x j is the intensity of the j-th signal
  • n is the intensity of noise. If you do not know the noise intensity, set a value around 1.
  • variable node processing updates the logarithmic odds based on the following equation (7).
  • ⁇ i has an initial value of 0.
  • the check node process performs a parity check based on the following equation (8). If this process is repeated an appropriate number of times, the log odds rj of the posterior probability of the original parity code is calculated.
  • the log odds vector excluding the parity part corresponds to the log odds of the original code string.
  • this log-space thumb product method is used as an error correction layer.
  • the input vector x of this layer is the logarithmic odds vector h with parity obtained by the equation (4).
  • the output is the posterior probability logarithmic odds vector r obtained after repeating the equations (7) and (8).
  • the output may be a vector obtained by cutting out the number of original classes from this vector.
  • the variance and the like may be calculated for each code at the time of learning, and the noise intensity may be corrected from the variance. ..
  • the probability of the class and the classification result can be estimated.
  • the probability of the class is determined by passing through the sigmoid function.
  • the label can be estimated by discriminating the logarithmic odds and the probability for each class with an appropriate threshold value.
  • the method of estimating the probability and label for each class as described above can be obtained by directly calculating the posterior probability, MAP (Maximum A Posteriori probability) decoding using the calculation result, or BCJR decoding algorithm. realizable.
  • FIG. 8 is an explanatory diagram of the turbo code BCJR decoding algorithm according to the present disclosure. As shown in FIG. 8, in the BCJR decoding algorithm, for example, data interleaving, convolutional code propagation, and logarithmic odds calculation are sequentially repeated.
  • the computer estimates the label from the data of the classification problem by using the execution neural network shown in FIG. Specifically, the computer executes a process of extracting the feature amount of the data from the data of the classification problem.
  • the computer converts the feature quantity into a logit vector, adds a parity check code to the logit vector, and executes a process of calculating the logit vector with parity. After that, the computer executes a process of performing error correction of the logit vector with parity based on the parity check code.
  • the computer executes a process of estimating the label of the classification problem based on the logit vector obtained by excluding the parity check code from the logit vector with parity after error correction.
  • the data of the classification problem is vectorized, a parity check code is added, and error correction is performed based on the parity check code to estimate the label. Therefore, the noise, diversity, and parameters of the data It is possible to perform a robust classification for deviations. Therefore, according to the first embodiment, it is possible to improve the estimation accuracy of the label of the classification problem.
  • the second embodiment is an example of using the execution neural network of the first embodiment as a learning neural network.
  • FIG. 9 is an explanatory diagram of the learning neural network according to the second embodiment of the present disclosure.
  • x shown in FIG. 9 is the data of the classification problem.
  • T is the label of the classification problem.
  • H is a feature vector extracted from the data of the classification problem.
  • R is the class score of the label after error correction.
  • the configuration of the learning neural network shown in FIG. 9 is as follows.
  • Data ⁇ Feature extraction network (feature extraction layer) ⁇ Logit calculation network with parity (feature extraction layer) ⁇ Error correction network (error correction layer) ⁇ Logit vector.
  • Label Binary vectorization (only for multi-label problems and problems with unknown class data) Loss function (loss layer) ⁇ Cross entropy between logit vector and label.
  • the feature amount is extracted and the parity check code is added in the feature extraction layer. Therefore, a part of the feature vector input to the error correction layer corresponds to the original code, and a part corresponds to the inspection code (parity).
  • the error correction layer can perform error correction of the original code by, for example, thumb product decoding and BCJR decoding.
  • the difference from the first embodiment is that if the classification problem is a problem other than the multi-label problem or the problem with unknown class data, the parity coding of the label is not performed.
  • the sigmoid cross entropy of Eq. (5) is used in the multi-label problem and the unknown class problem.
  • the softmax cross entropy calculated by the following equation (9) is used.
  • r is a vector obtained by cutting out only the number of classes of the posterior probability logarithmic odds vector r obtained after repeating the equations (7) and (8).
  • t is a label that has not been converted into a binary vector.
  • the parity coding of the label is not performed, but the parity coding information is a process of performing the coding equivalent by transmitting the gradient to the parameter by the error back propagation of the equations (7) and (8).
  • Equations (7) and (8) are both functions capable of error backpropagation of the gradient, that is, a function in which error back propagation is defined.
  • the decoding operation corresponding to the error correction coding is composed of a combination of operations capable of error back propagation.
  • the execution neural network the one having the same configuration as the execution neural network of the first embodiment may be used.
  • the computer learns the parameters of the learning neural network using the learning neural network shown in FIG. Specifically, the computer executes a process of extracting the feature amount of the data from the data of the classification problem for learning.
  • the computer converts the feature quantity into a logit vector, adds a parity check code to the logit vector to calculate the logit vector with parity, and corrects the error of the logit vector with parity based on the parity check code. To execute.
  • the computer updates the parameters of the neural network used for estimating the label by learning so that the value of the loss function between the logit vector with parity after error correction and the label of the classification problem for learning is minimized. To execute.
  • the second embodiment it is possible to construct a neural network capable of robust classification for data noise, diversity, and parameter deviation without performing binary vectorization of labels used for learning. Can be done.
  • FIG. 10 is an explanatory diagram showing a correspondence relationship between the execution procedure of the classification problem according to the present disclosure and the information communication procedure.
  • the original code before transmission in information communication can be considered as a class in the classification problem.
  • the procedure of coding and modulating the original code in the information communication and transmitting it via the physical layer serving as the transmission path corresponds to the procedure of extracting the feature amount from the data of the classification problem by the feature amount extraction layer.
  • noise may be mixed in the received signal at the physical layer.
  • a phenomenon corresponds to a phenomenon in which noise is mixed in the feature amount.
  • the received signal is demodulated and decoded to correct errors, and the code probability is calculated to eliminate the influence of noise.
  • the feature amount extracted from the data of the classification problem is vectorized, a parity check code is added as in the case of information communication, and the error correction of the feature amount vector is performed by the parity layer which is the error correction layer described above. Go and calculate the class probability.
  • a computer estimates the category to which the input data belongs by using a neural network.
  • the neural network calculates a feature vector from the input data, and based on the feature vector, calculates the probability or score of the category to which the input data belongs by using a decoding operation corresponding to a predetermined error correction coding.
  • the third embodiment describes a method of realizing unsupervised learning in combination with the first embodiment and the second embodiment. Since the execution network is the same as that of the first embodiment and the second embodiment, the description thereof is omitted here.
  • the learning network is a combination of the first embodiment and the second embodiment as follows.
  • FIG. 11 is an explanatory diagram of the learning neural network according to the third embodiment of the present disclosure.
  • “x” shown in FIG. 11 is the data of the classification problem.
  • “Loss” is the value of the loss function.
  • the configuration of the learning network shown in FIG. 11 is as follows. Data ⁇ Feature extraction network (feature extraction layer) ⁇ Logit calculation network with parity (fully coupled layer) ⁇ Error correction network (error correction layer) ⁇ Logit vector (encoding layer) ⁇ Prediction label.
  • Predictive label ⁇ binary vectorization (encoding layer) ⁇ label vectorization with parity (encoding layer). Loss function (loss layer) ⁇ Cross entropy of logit vector with logit and label vector with parity.
  • the correct label is not necessary, and instead the label estimated from the data through error correction is used. Then, a label vector with parity created based on this estimated label is used for loss calculation.
  • the prediction label in the error correction layer it may be calculated as a prediction label with parity. In this case, there is no need to process the predicted label to the label vectorization with parity.
  • FIG. 12 is a flowchart showing an example of the learning algorithm according to the third embodiment of the present disclosure. As shown in FIG. 12, in the learning algorithm according to the third embodiment, first, a learning network is configured (step S201).
  • a mini-batch is sampled from the unlabeled data set (step S202). After that, the mini-batch is input to the learning network, and the logit vector with parity and the prediction label are estimated (step S203).
  • the label vector with parity is calculated from the predicted label (step S204).
  • the logit vector with parity and the label vector with parity are input to the loss function (step S205).
  • the error is back-propagated from the loss to each parameter to update the parameter (step S206).
  • step S207 it is determined whether or not the predetermined convergence condition is satisfied. Then, when the convergence condition is not satisfied (steps S207, No), the process is moved to step S202, the processing of steps S202 to S206 is repeated until the convergence condition is satisfied, and when the convergence condition is satisfied (steps S207, Yes). ), End the process.
  • steps S202 to S206 may be repeated up to a predetermined maximum number of repetitions.
  • the computer executes the learning algorithm shown in FIG. 12 by using the learning neural network shown in FIG. Specifically, the computer executes a process of extracting the feature amount of the data from the data of the classification problem for learning.
  • the computer converts the feature quantity into a logit vector, adds a parity check code to the logit vector, and executes a process of calculating the logit vector with parity. After that, the computer executes a process of performing error correction of the logit vector with parity based on the parity check code.
  • the computer converts the logit vector with parity after error correction into a binary vector, adds a parity check code to the binary vector, and calculates the label vector with parity.
  • the computer updates the parameters of the neural network used for estimating the label by learning so that the value of the loss function of the logit vector with parity and the label vector with parity after error correction is minimized.
  • the third embodiment it is possible to perform unsupervised learning and to construct a neural network capable of robust classification against data noise, diversity, and parameter deviation.
  • semi-supervised learning is performed by combining supervised learning and unsupervised learning.
  • the supervised learning part is performed by the machine learning described in the first embodiment or the second embodiment
  • the unsupervised learning is performed by the machine learning described in the third embodiment.
  • unsupervised learning self-learning in which a neural network that can be classified correctly by supervised learning is used to generate a label from the data and learn the neural network according to the label. ) Can be realized.
  • FIG. 13 is a flowchart showing an example of the learning algorithm according to the fourth embodiment of the present disclosure. As shown in FIG. 13, in the learning algorithm according to the fourth embodiment, first, the learning network A of the first or second embodiment is configured (step S301).
  • the learning network B of the third embodiment that shares the parameters with the learning network A is configured (step S302).
  • the mini-batch is then sampled from the labeled dataset (step S303).
  • the mini-batch is input to the learning network A to update the parameters of the learning network (step S304).
  • the mini-batch is then sampled from the unlabeled dataset (step S305).
  • the mini-batch is input to the learning network B to update the parameters of the learning network (step S306). Then, it is determined whether or not the predetermined convergence condition is satisfied (step S307).
  • step S307, No when the convergence condition is not satisfied (step S307, No), the process is moved to step S303, the processing of steps S303 to S306 is repeated until the convergence condition is satisfied, and when the convergence condition is satisfied (step S307, Yes). ), End the process.
  • steps S303 to S306 may be repeated up to a predetermined maximum number of repetitions.
  • the fourth embodiment it is possible to perform semi-supervised learning and to construct a neural network capable of robust classification against data noise, diversity, and parameter deviation.
  • Multi-label learning enables weak supervised learning with multiple instances.
  • As an application of weak supervised learning by multiple instances for example, position estimation of a specified object on an image can be considered.
  • the learning network for object position estimation by normal multiple instance learning has the following configuration.
  • Data ⁇ Feature vector calculation network ⁇ Map calculation network by class ⁇ Global max pooling by map ⁇ Logit vector by class.
  • a multi-label (multi-hot vector) is used as the label.
  • the loss function uses the sigmoid cross entropy of the class-specific logit vector and the multi-label vector.
  • the feature vector calculation network uses a network composed of convolution layers as described above.
  • the convolution network is composed of a convolution process in which filters are swept vertically and horizontally in a map (RGB for input data) and a process in which weighted additions are performed between maps.
  • f k, u, and v are input variables (number of input maps K ⁇ vertical ⁇ horizontal tensor), and hm , u, v are output variables (number of output maps M ⁇ vertical ⁇ horizontal tensor).
  • W m, k, p, q are filters having a filter size of P ⁇ Q, which is the number of input maps ⁇ the number of output maps.
  • the class-specific map calculation network is a network that uses this convolution layer to perform convolution with the same number of maps as the number of classes. Maps for each class are expected to be learned to represent the object-likeness of each corresponding object location.
  • This information is information indicating whether or not an object of the specified class exists and where it is.
  • the multi-label information is only information on whether the object exists or not. For this reason, the information on where the object is located is not necessary for the information on whether the object exists or not, so this is removed.
  • Global Max Pooling is a process that takes the maximum value in the map for each map. Applying this to the output of a class-based map (object-likeness map), the score of the place where the object seems to be most is calculated.
  • the learning network will be changed as follows.
  • Data ⁇ Feature vector calculation network ⁇ Map calculation network by class with parity ⁇ Global max pooling by map ⁇ Logit vector by class with parity ⁇ Error correction layer ⁇ Logit vector by class.
  • the label uses a multi-label (multi-hot vector).
  • the loss function uses the sigmoid cross entropy of the class-specific logit vector and the multi-label vector.
  • the computer extracts the features of the data from the data of the classification problem for learning with multiple labels, converts the features into a logit vector, adds a parity check code to the logit vector, and labels the data. Executes the process of calculating the map for each class with parity for each class to which it belongs.
  • the computer After that, the computer performs global max pooling on the map for each class with parity to calculate the logit vector for each class with parity, corrects the error of the logit vector for each class with parity based on the parity check code, and corrects the error for the logit vector for each class. Executes the process of calculating.
  • the computer executes a process of learning to minimize the value of the loss function between the error-corrected class-specific logit vector and the vectors of a plurality of labels and updating the parameters of the neural network used for label estimation. To do. By doing so, it can be expected that the information on whether or not the object exists becomes more robust and the object position estimation is realized with high accuracy. In this way, the computer can estimate the positions of a plurality of specified objects on the image with high accuracy by performing weak supervised learning by multiple instances.
  • the deep learning framework according to the present disclosure is aligned in a lineup so that a coding layer and an error correction layer can be added to a commonly used neural network for classification problems.
  • a coding layer a humming layer, an LDPC layer, a turbo layer and the like can be considered.
  • the humming layer and LDPC layer should be able to input the generator matrix.
  • the turbo layer enables input of the Viterbi encoder configuration and the interleave matrix.
  • a posterior probability layer As the error correction layer, a posterior probability layer, a BCJR layer, a thumb product decoding layer, and the like can be considered. Of these, the BCJR layer and the thumb product layer have iterative processing, so the maximum number of repetitions can be specified.
  • the posterior probability layer should be able to input the generator matrix.
  • the BCJR layer is capable of inputting the Viterbi encoder configuration and the interleave matrix.
  • the thumb product layer allows the parity check matrix corresponding to the generator matrix to be input.
  • FIG. 14 is a schematic explanatory view of the configuration of the information processing apparatus according to the present embodiment.
  • the information processing device 1 shown in FIG. 14 is realized by, for example, a computer such as a GPU (Graphics Processing Unit) or a CPU (Central Processing Unit).
  • a computer such as a GPU (Graphics Processing Unit) or a CPU (Central Processing Unit).
  • the information processing device 1 includes an information processing unit 2 that estimates the label of the classification problem using the neural network 3 whose parameters are machine-learned.
  • the neural network 3 includes a feature extraction layer 31, an LDPC / turbo decoding unit 32, a loss layer 33, and an LDPC / turbo coding unit 34.
  • the data of the classification problem is input, and the feature amount of the data is extracted from the data of the classification problem.
  • the LDPC / turbo decoding unit 32 converts the feature quantity into a logit vector, adds a parity check code to the logit vector to calculate the parity check code, and based on the parity check code, an error in the parity check vector. Functions as a layer for correction.
  • the loss layer estimates the label of the classification problem based on the logit vector with the parity check code removed from the error-corrected logit vector with parity.
  • the label of the classification problem is input from the outside.
  • the LDPC / turbo coding unit 34 functions as a coding layer shown in FIG. 1 by inputting a label of a classification problem from the outside when, for example, the information processing unit 2 performs supervised learning of the first embodiment. Then, the label is encoded and the like is performed.
  • Turbo code and LDPC code encoders and decoders are often already equipped with high-speed hardware for use in communication technology.
  • the LDPC / turbo decoding unit 32 and the LDPC / turbo coding unit 34 are not limited to the execution environment of deep learning, and for example, hardware such as ASIC (Application Specific Integrated Circuit) is adopted to directly calculate. You may.
  • ASIC Application Specific Integrated Circuit
  • the computer executes a process of extracting the feature amount of the data from the data of the classification problem.
  • the computer converts the feature quantity into a logit vector, adds a parity check code to the logit vector, and executes a process of calculating the logit vector with parity.
  • the computer executes an error correction process of the logit vector with parity based on the parity check code.
  • the computer performs a process of estimating the label of the classification problem based on the logit vector obtained by excluding the parity check code from the error-corrected logit vector with parity.
  • the computer vectorizes the data of the classification problem, adds a parity check code, corrects the error based on the parity check code, and estimates the label. Therefore, the data noise, diversity, and parameter deviation are dealt with. It is possible to perform a robust classification. Therefore, the computer can improve the estimation accuracy of the label of the classification problem.
  • the computer converts the label of the classification problem for learning into a binary vector, adds a parity check code to the binary vector, and executes a process of calculating the label vector with parity.
  • the computer executes a process of extracting data features from the data of the classification problem for learning.
  • the computer converts the feature quantity into a logit vector, adds a parity check code to the logit vector, and executes a process of calculating the logit vector with parity.
  • the computer performs a process of learning to minimize the value of the loss function between the logit vector with parity and the label vector with parity and updating the parameters of the neural network used for label estimation.
  • the computer executes a process of extracting the feature amount of the data from the data of the classification problem for learning.
  • the computer converts the feature quantity into a logit vector, adds a parity check code to the logit vector, and executes a process of calculating the logit vector with parity.
  • the computer executes an error correction process of the logit vector with parity based on the parity check code.
  • the computer performs the process of learning to minimize the value of the loss function between the error-corrected logit vector with parity and the label of the classification problem for training and updating the parameters of the neural network used to estimate the label. To do.
  • the computer can construct a neural network that can perform robust classification against data noise, diversity, and parameter deviation without performing binary vectorization of labels used for learning.
  • the computer executes a process of extracting the feature amount of the data from the data of the classification problem for learning.
  • the computer converts the feature quantity into a logit vector, adds a parity check code to the logit vector, and executes a process of calculating the logit vector with parity.
  • the computer executes an error correction process of the logit vector with parity based on the parity check code.
  • the computer converts the logit vector with parity after error correction into a binary vector, adds a parity check code to the binary vector, and executes a process of calculating a label vector with parity.
  • the computer performs a process of learning to minimize the value of the loss function between the error-corrected logit vector with parity and the label vector with parity, and updating the parameters of the neural network used for label estimation.
  • the computer can perform unsupervised learning and can construct a neural network that can perform robust classification against data noise, diversity, and parameter deviation.
  • the computer updates the parameters of the neural network by the information processing method according to the first embodiment or the information processing method according to the second embodiment, and obtains the updated parameters.
  • the process of updating according to the information processing method according to the second embodiment is executed.
  • the computer can perform semi-supervised learning and can construct a neural network that can perform robust classification against data noise, diversity, and parameter deviation.
  • the computer executes a process of extracting the feature amount of the data from the data of the classification problem for learning having a plurality of labels.
  • the computer converts the feature quantity into a logit vector, adds a parity check code to the logit vector, and executes a process of calculating a class-based map with parity for each class to which the label belongs.
  • the computer performs global max pooling on the class-based map with parity to calculate the logit vector for each class with parity.
  • the computer executes a process of calculating an error of the class-specific logit vector with parity based on the parity check code and calculating the class-specific logit vector.
  • the computer performs a process of learning to minimize the value of the loss function between the error-corrected class-based logit vector and the vectors of a plurality of labels and updating the parameters of the neural network used for label estimation.
  • the computer can estimate the positions of a plurality of specified objects on the image with high accuracy by performing weak supervised learning using multiple instances.
  • the computer corrects errors by using a function in which error back propagation is defined.
  • the computer uses a function in which error back propagation is defined to perform error correction on the data of the classification problem, thereby making the neural network parameters the reverse of the error correction. It is possible to embed a coding method corresponding to the processing of.
  • the information processing program causes a computer to execute a process of extracting data features from the data of the classification problem.
  • a computer is made to perform a process of converting a feature quantity into a logit vector, adding a parity check code to the logit vector, and calculating a logit vector with parity.
  • the computer vectorizes the data of the classification problem, adds a parity check code, corrects the error based on the parity check code, and estimates the label. Therefore, the data noise, diversity, and parameter deviation are dealt with. It is possible to perform a robust classification. Therefore, the computer can improve the estimation accuracy of the label of the classification problem.
  • the information processing device 1 has an information processing unit 2 that estimates the label of the classification problem by using the machine-learned neural network 3.
  • the neural network 3 includes a layer that extracts the feature amount of the data from the data of the classification problem, a layer that converts the feature amount into a logit vector, adds a parity check code to the logit vector, and calculates a logit vector with parity, and parity. It has a layer for error correction of a logit vector with parity based on an inspection code.
  • the information processing unit estimates the label of the classification problem based on the logit vector obtained by excluding the parity check code from the logit vector with parity after error correction.
  • the information processing device vectorizes the data of the classification problem, adds a parity check code, corrects an error based on the parity check code, and estimates the label. Therefore, data noise, diversity, and parameter deviations. Can be categorized as robust. Therefore, the information processing device can improve the estimation accuracy of the label of the classification problem.
  • each component of each device shown in the figure is a functional concept, and does not necessarily have to be physically configured as shown in the figure. That is, the specific form of distribution / integration of each device is not limited to the one shown in the figure, and all or part of the device is functionally or physically distributed / physically in any unit according to various loads and usage conditions. It can be integrated and configured.
  • FIG. 15 is a hardware configuration diagram showing an example of a computer 1000 that realizes the functions of the information processing device 1.
  • the computer 1000 includes a CPU 1100, a RAM 1200, a ROM (Read Only Memory) 1300, an HDD (Hard Disk Drive) 1400, a communication interface 1500, and an input / output interface 1600.
  • Each part of the computer 1000 is connected by a bus 1050.
  • the CPU 1100 operates based on the program stored in the ROM 1300 or the HDD 1400, and controls each part. For example, the CPU 1100 expands the program stored in the ROM 1300 or the HDD 1400 into the RAM 1200 and executes processing corresponding to various programs.
  • the ROM 1300 stores a boot program such as a BIOS (Basic Input Output System) executed by the CPU 1100 when the computer 1000 is started, a program that depends on the hardware of the computer 1000, and the like.
  • BIOS Basic Input Output System
  • the HDD 1400 is a computer-readable recording medium that non-temporarily records a program executed by the CPU 1100 and data used by the program.
  • the HDD 1400 is a recording medium for recording an information processing program according to the present disclosure, which is an example of program data 1450.
  • the communication interface 1500 is an interface for the computer 1000 to connect to an external network 1550 (for example, the Internet).
  • the CPU 1100 receives data from another device or transmits data generated by the CPU 1100 to another device via the communication interface 1500.
  • the input / output interface 1600 is an interface for connecting the input / output device 1650 and the computer 1000.
  • the CPU 1100 receives data from an input device such as a keyboard or mouse via the input / output interface 1600. Further, the CPU 1100 transmits data to an output device such as a display, a speaker, or a printer via the input / output interface 1600. Further, the input / output interface 1600 may function as a media interface for reading a program or the like recorded on a predetermined recording medium (media).
  • the media is, for example, an optical recording medium such as a DVD (Digital Versatile Disc) or PD (Phase change rewritable disk), a magneto-optical recording medium such as an MO (Magneto-Optical disk), a tape medium, a magnetic recording medium, or a semiconductor memory.
  • an optical recording medium such as a DVD (Digital Versatile Disc) or PD (Phase change rewritable disk)
  • a magneto-optical recording medium such as an MO (Magneto-Optical disk)
  • a tape medium such as a magnetic tape
  • magnetic recording medium such as a magnetic recording medium
  • semiconductor memory for example, an optical recording medium such as a DVD (Digital Versatile Disc) or PD (Phase change rewritable disk), a magneto-optical recording medium such as an MO (Magneto-Optical disk), a tape medium, a magnetic recording medium, or a semiconductor memory.
  • the CPU 1100 of the computer 1000 realizes the functions of the information processing unit 2 and the like by executing the information processing program loaded on the RAM 1200.
  • the information processing program according to the present disclosure and the data in the storage unit 120 are stored in the HDD 1400.
  • the CPU 1100 reads the program data 1450 from the HDD 1400 and executes the program, but as another example, these programs may be acquired from another device via the external network 1550.
  • the present technology can also have the following configurations.
  • the neural network calculates a feature vector from the input data and Based on the feature vector, the probability or score of the category to which the input data belongs is calculated using a decoding operation corresponding to a predetermined error correction coding.
  • the feature vector is a vector having a number of dimensions corresponding to the code length of the error correction code used for the predetermined error correction coding.
  • (3) The decoding operation corresponding to the predetermined error correction coding is composed of a combination of operations capable of error back propagation.
  • a computer learns the neural network using the backpropagation method.
  • the error correction code used for the predetermined error correction coding is a low density parity check code.
  • the error correction code used for the predetermined error correction coding is a turbo code.
  • the decoding operation corresponding to the predetermined error correction coding is the maximum likelihood decoding algorithm.
  • the decoding operation corresponding to the predetermined error correction coding is the BCJR algorithm.
  • the decoding operation corresponding to the predetermined error correction coding is a thumb product algorithm.
  • the information processing method according to any one of (1) to (6) above.
  • An information processing device that estimates the category to which input data belongs using a neural network.
  • the neural network calculates a feature vector from the input data and Based on the feature vector, the probability or score of the category to which the input data belongs is calculated using a decoding operation corresponding to a predetermined error correction coding.
  • the feature vector is a vector having a number of dimensions corresponding to the code length of the error correction code used for the predetermined error correction coding.
  • the information processing device according to (10) above.
  • the decoding operation corresponding to the predetermined error correction coding is composed of a combination of operations capable of error back propagation.
  • the neural network is learned by using the error back propagation method.
  • the error correction code used for the predetermined error correction coding is a low density parity check code.
  • the error correction code used for the predetermined error correction coding is a turbo code.
  • the decoding operation corresponding to the predetermined error correction coding is the maximum likelihood decoding algorithm.
  • the decoding operation corresponding to the predetermined error correction coding is the BCJR algorithm.
  • the decoding operation corresponding to the predetermined error correction coding is a thumb product algorithm.
  • the neural network calculates a feature vector from the input data and Based on the feature vector, the probability or score of the category to which the input data belongs is calculated using a decoding operation corresponding to a predetermined error correction coding.
  • Information processing device 2 Information processing unit 3
  • Neural network 31 Feature extraction layer 32
  • Loss layer 34 LDPC / Turbo coding unit

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

L'invention concerne un procédé de traitement d'informations, un dispositif de traitement d'informations et un programme de traitement d'informations qui permettent d'améliorer la précision d'inférence pour un problème de classification et de réduire des paramètres de modèle et la quantité de calcul. Dans le procédé de traitement d'informations selon la présente invention, un ordinateur (dispositif de traitement d'informations 1) utilise un réseau neuronal (3) pour inférer la catégorie à laquelle appartiennent des données d'entrée. Le réseau neuronal (3) calcule un vecteur de caractéristiques à partir des données d'entrée et, sur la base du vecteur de caractéristiques, calcule une probabilité ou un score pour la catégorie à laquelle appartiennent les données d'entrée, à l'aide d'un calcul de décodage correspondant à un codage de correction d'erreur prescrit.
PCT/JP2020/020612 2019-07-23 2020-05-25 Procédé de traitement d'informations, dispositif de traitement d'informations et programme de traitement d'informations WO2021014746A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2019-135619 2019-07-23
JP2019135619 2019-07-23

Publications (1)

Publication Number Publication Date
WO2021014746A1 true WO2021014746A1 (fr) 2021-01-28

Family

ID=74194135

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2020/020612 WO2021014746A1 (fr) 2019-07-23 2020-05-25 Procédé de traitement d'informations, dispositif de traitement d'informations et programme de traitement d'informations

Country Status (1)

Country Link
WO (1) WO2021014746A1 (fr)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180357530A1 (en) * 2017-06-13 2018-12-13 Ramot At Tel-Aviv University Ltd. Deep learning decoding of error correcting codes
WO2019034589A1 (fr) * 2017-08-15 2019-02-21 Norwegian University Of Science And Technology Système cryptographique biométrique

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180357530A1 (en) * 2017-06-13 2018-12-13 Ramot At Tel-Aviv University Ltd. Deep learning decoding of error correcting codes
WO2019034589A1 (fr) * 2017-08-15 2019-02-21 Norwegian University Of Science And Technology Système cryptographique biométrique

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
FUKUI, HIROSHI ET AL.: "Random Dropout and Ensemble Inference Networks for Pedestrian Detection and Traffic Sign Recognition", RANDOM DROPOUT ENSEMBLE INFERENCE NETWORKS, vol. 57, 15 March 2016 (2016-03-15), pages 910 - 921, XP009515166, ISSN: 1882-7764 *

Similar Documents

Publication Publication Date Title
Liang et al. An iterative BP-CNN architecture for channel decoding
Cammerer et al. Scaling deep learning-based decoding of polar codes via partitioning
Lugosch et al. Neural offset min-sum decoding
Be’Ery et al. Active deep decoding of linear codes
US20210383220A1 (en) Deep neural network ensembles for decoding error correction codes
KR102136428B1 (ko) 코드의 워드들의 확장된 스펙트럼의 분석에 의해, 정정 코드, 예를 들면, 터보 코드를 디코딩하는 방법
Ye et al. Circular convolutional auto-encoder for channel coding
CN109361404A (zh) 一种基于半监督深度学习网络的ldpc译码***及译码方法
TWI669917B (zh) 二維碼糾錯解碼方法、裝置、電子設備及電腦可讀介質
CN111200441B (zh) 一种Polar码译码方法、装置、设备及可读存储介质
CN107451106A (zh) 文本纠正方法及装置、电子设备
US11182665B2 (en) Recurrent neural network processing pooling operation
WO2019041085A1 (fr) Procédé et dispositif de décodage de signal, et dispositif de stockage
Raviv et al. perm2vec: Graph permutation selection for decoding of error correction codes using self-attention
Teng et al. Convolutional neural network-aided bit-flipping for belief propagation decoding of polar codes
US20240039559A1 (en) Decoding of error correction codes based on reverse diffusion
US9432054B2 (en) Method for decoding a correcting code with message passing, in particular for decoding LDPC codes or turbo codes
WO2021014746A1 (fr) Procédé de traitement d'informations, dispositif de traitement d'informations et programme de traitement d'informations
CN116155453B (zh) 一种面向动态信噪比的译码方法及相关设备
KR20190134608A (ko) 일반화된 폴라 코드
US11574181B2 (en) Fusion of neural networks
KR102494627B1 (ko) 데이터 라벨을 자동 교정하는 음성 인식 시스템 및 방법
CN112953565B (zh) 一种基于卷积神经网络的归零卷积码译码方法和***
Doan Low-complexity decoding of short linear block codes with machine learning
Ji et al. Fault-tolerant quaternary belief propagation decoding based on a neural network

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20843636

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20843636

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP