CN108631787A - Data-encoding scheme, device, computer equipment and storage medium - Google Patents
Data-encoding scheme, device, computer equipment and storage medium Download PDFInfo
- Publication number
- CN108631787A CN108631787A CN201810439264.5A CN201810439264A CN108631787A CN 108631787 A CN108631787 A CN 108631787A CN 201810439264 A CN201810439264 A CN 201810439264A CN 108631787 A CN108631787 A CN 108631787A
- Authority
- CN
- China
- Prior art keywords
- channel
- vector
- data
- training
- encoded
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
- H03M7/3082—Vector coding
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
This application involves a kind of data-encoding scheme, this method includes:Obtain data to be encoded;The corresponding hash function in each channel and channel vector matrix are obtained, the channel cryptographic Hash that the data to be encoded correspond to each channel is calculated according to the corresponding hash function in each channel;The channel subvector that the data to be encoded correspond to each channel is obtained in corresponding channel vector matrix according to the channel cryptographic Hash;Coding vector corresponding with the data to be encoded is obtained according to each channel subvector got.The data-encoding scheme not only reduces space hold, and reduces the collision rate between vector.In addition, it is also proposed that a kind of data coding device, computer equipment and storage medium.
Description
Technical field
This application involves computer processing technical fields, are set more particularly to a kind of data-encoding scheme, device, computer
Standby and storage medium.
Background technology
Data encoding refers to the representation for converting data to vector.With the development in deep learning field, using big
Data establish model becomes theme to carry out prediction to unknown result.In order to according to given data come to unknown number
It according to being predicted, needs given data being converted to the language that computer can identify, that is, needs given data using number
The vector of change indicates.Traditional is converted to given data vector, and generally use word is embedded in the methods of (embedding).But
It is had the following problems using embedding:If the number of vectors setting of embedding is larger, then memory headroom occupancy is more,
If the number of vectors setting of embedding is smaller, although memory headroom occupies less, collision rate is high, i.e., information loss is more.
Invention content
Based on this, it is necessary in view of the above-mentioned problems, it is few to propose a kind of space hold, and the data encoding side that collision rate is low
Method, device, computer equipment and storage medium.
A kind of data-encoding scheme, the method includes:
Obtain data to be encoded;
The corresponding hash function in each channel and channel vector matrix are obtained, according to the corresponding hash function meter in each channel
Calculation obtains the channel cryptographic Hash that the data to be encoded correspond to each channel;
The data to be encoded are obtained in corresponding channel vector matrix correspond to each lead to according to the channel cryptographic Hash
The channel subvector in road;
Coding vector corresponding with the data to be encoded is obtained according to each channel subvector got.
A kind of data coding device, described device include:
Data acquisition module, for obtaining data to be encoded;
Computing module, for obtaining the corresponding hash function in each channel and channel vector matrix, according to each channel pair
The channel cryptographic Hash that the data to be encoded correspond to each channel is calculated in the hash function answered;
Subvector acquisition module, for waiting for described in being obtained in corresponding channel vector matrix according to the channel cryptographic Hash
Coded data corresponds to the channel subvector in each channel;
Vector generation module, it is corresponding with the data to be encoded for being obtained according to each channel subvector got
Coding vector.
A kind of computer equipment, including memory and processor, the memory are stored with computer program, the calculating
When machine program is executed by the processor so that the processor executes following steps:
Obtain data to be encoded;
The corresponding hash function in each channel and channel vector matrix are obtained, according to the corresponding hash function meter in each channel
Calculation obtains the channel cryptographic Hash that the data to be encoded correspond to each channel;
The data to be encoded are obtained in corresponding channel vector matrix correspond to each lead to according to the channel cryptographic Hash
The channel subvector in road;
Coding vector corresponding with the data to be encoded is obtained according to each channel subvector got.
A kind of computer readable storage medium is stored with computer program, when the computer program is executed by processor,
So that the processor executes following steps:
Obtain data to be encoded;
The corresponding hash function in each channel and channel vector matrix are obtained, according to the corresponding hash function meter in each channel
Calculation obtains the channel cryptographic Hash that the data to be encoded correspond to each channel;
The data to be encoded are obtained in corresponding channel vector matrix correspond to each lead to according to the channel cryptographic Hash
The channel subvector in road;
Coding vector corresponding with the data to be encoded is obtained according to each channel subvector got.
Above-mentioned data-encoding scheme, device, computer equipment and storage medium, by obtaining data to be encoded, using every
The corresponding hash function in a channel calculates data to be encoded in the corresponding channel cryptographic Hash in each channel, then according to channel Hash
Value obtains corresponding channel subvector in corresponding channel vector matrix, obtains and wait to compile according to each channel subvector later
The corresponding coding vector of code data.Above-mentioned data-encoding scheme, due to by the channel subvector in each channel vector matrix into
Row combination can be used for indicating more vectors, corresponding with data relative to traditional direct acquisition in a vector matrix
The mode of coding vector, greatly reduces the occupancy in space, and reduces the collision rate between vector.
Description of the drawings
Fig. 1 is the applied environment figure of data-encoding scheme in one embodiment;
Fig. 2 is the flow chart of data-encoding scheme in one embodiment;
Fig. 3 is the schematic diagram that three-dimensional feature is compressed to two-dimensional space in one embodiment;
Fig. 4 is the contrast schematic diagram of unidirectional moment matrix and two vector matrixs in one embodiment;
Fig. 5 is to extract each channel subvector in one embodiment to obtain the schematic diagram of coding vector;
Fig. 6 is the flow chart that channel cryptographic Hash is calculated in one embodiment;
Fig. 7 is the flow chart of training channel vector matrix in one embodiment;
Fig. 8 is the schematic diagram for training multiple channel vector matrixs in one embodiment using deep learning;
Fig. 9 is the flow chart of training channel vector matrix in another embodiment;
Figure 10 is the schematic diagram of single channel and bilateral trace comparison in one embodiment;
Figure 11 is the flow chart of data-encoding scheme in another embodiment;
Figure 12 is the structure diagram of data coding device in one embodiment;
Figure 13 is the structure diagram of data coding device in another embodiment;
Figure 14 is the structure diagram of data coding device in another embodiment;
Figure 15 is the structure diagram of data coding device in further embodiment;
Figure 16 is the structure diagram of one embodiment Computer equipment.
Specific implementation mode
It is with reference to the accompanying drawings and embodiments, right in order to make the object, technical solution and advantage of the application be more clearly understood
The application is further elaborated.It should be appreciated that specific embodiment described herein is only used to explain the application, and
It is not used in restriction the application.
Fig. 1 is the applied environment figure of data-encoding scheme in one embodiment.Referring to Fig.1, the data-encoding scheme application
In the system of data encoding.The system of the data encoding includes terminal 110 and server 120.Terminal 110 and server 120 are logical
Network connection is crossed, terminal 110 can be specifically terminal console or mobile terminal, and mobile terminal can be specifically mobile phone, tablet electricity
At least one of brain, laptop etc..Server 120 can use the either multiple server compositions of independent server
Server cluster is realized.Data to be encoded are sent to server 120 by terminal 110, and server 120 obtains data to be encoded,
The corresponding hash function in each channel and channel vector matrix are obtained, institute is calculated according to the corresponding hash function in each channel
The channel cryptographic Hash that data to be encoded correspond to each channel is stated, according to the channel cryptographic Hash in corresponding channel vector matrix
The channel subvector that the data to be encoded correspond to each channel is obtained, is obtained according to each channel subvector got and institute
The corresponding coding vector of data to be encoded is stated, obtained coding vector is returned into terminal 110.
In another embodiment, above-mentioned data-encoding scheme may be directly applied to terminal 110.The acquisition of terminal 110 waits for
Coded data, then obtains the corresponding hash function in each channel and channel vector matrix, and terminal 110 is corresponded to according to each channel
Hash function the channel cryptographic Hash that the data to be encoded correspond to each channel is calculated, then according to the channel Hash
Value obtains the channel subvector that the data to be encoded correspond to each channel in corresponding channel vector matrix, according to getting
Each channel subvector obtain coding vector corresponding with the data to be encoded.
As shown in Fig. 2, in one embodiment, providing a kind of data-encoding scheme.Both server is can be applied to,
It can be used for terminal, the present embodiment is to be applied to terminal illustration.The data-encoding scheme specifically comprises the following steps:
Step S202 obtains data to be encoded.
Wherein, data to be encoded refer to the data for needing to carry out vectorization.Data to be encoded can be text, number, figure
Piece etc..In machine learning field, in order to handle data, need to convert data to the digitlization that computer can identify
It indicates, that is, needs that data are encoded to obtain the representation of vectorization.
Traditional encodes data, the method that generally use word is embedded in (embedding).But due to
Embedding matrixes generally take up one piece of continuous memory headroom, so needing to open up sufficiently large memory headroom in advance.Therefore it passes through
It is commonly present problems with:If the number of vectors setting of embedding is larger, then memory headroom occupancy is more, if embedding
Number of vectors setting it is smaller, although memory headroom occupies few, the rate that can lead to a conflict is high, i.e., information loss is more.To understand
The certainly above problem, propose in the present embodiment it is a kind of can not only reduce space hold, but also the data encoding of collision rate can be reduced
Method, the data-encoding scheme are not only suitable for sparse features, are also applied for dense characteristic.Wherein, sparse features refer to classification spy
The value range of one kind in sign, characteristic value is bigger (for example, 10000 or more), but the frequency occurred is low, for example, extensively
User id in announcement system, advertisement id etc. just belong to typical sparse features.Wherein, category feature refers to being made in machine learning
The value of some feature is enumerable, if sex character includes two man, female values.Dense characteristic is relative to sparse
For feature, the value range of characteristic value is smaller.
It refers to the method being mapped to features of classification in low-dimensional vector space that word, which is embedded in (Embedding) method,.It lifts a
Example, if it is all words that are likely to occur to use one-hot encoding (one-hot coding), vectorial latitude to word
Number, only one-component is 1 value in vector, other components are 0 value, for example the corresponding vector of following word is expressed as, Beijing:
[1,0,0,0 ..., 0], Shanghai:[0,1,0,0 ..., 0], Shenzhen:[0,0,1,0,......,0].As it can be seen that using
The vectorial latitude of one-hot encoding (one-hot coding) is very high, and can not calculate the similitude between two words.Phase
Than one-hot encoding only by characteristic symbol, embedding imparts feature the information of deeper.It will be in vector
Each element real number representation, can increase its indicate range.The dimension of original higher-dimension sparse features is compressed to low-dimensional
Space, the Spatial Dimension are far below the number of feature value.As shown in figure 3, embedding methods are by three-dimensional feature w1, w2, w3
(that is, the expression of characteristic value in the vector space that one-hot encoding are obtained) is compressed in two-dimensional space, then upper one
The corresponding embedding of feature in part institute illustrated example indicates as follows:Beijing:[0.122,0.401,0.312,
0.112 ..., -0.32], Shanghai:[0.101,0.362,0.290,0.124 ..., -0.41], Shenzhen:[0.421,
0.121,0.501,0.139,......,0.823].Similar to one-hot encoding, there are one corresponding for each word
Vector expression, but each component of vector is real number, has latent layer meaning, can portray the correlation between word;Its dimension
By artificially specifying, the vector dimension after typically well below one-hot encoding codings.But it is tieed up just because of embedding
Degree is preset by people, so it is excessive or too small to there is a problem of that dimension is arranged.Embodiments herein is embedded in word
(Embedding) what is carried out on the basis of method is further improved.
Step S204 obtains the corresponding hash function in each channel and channel vector matrix, corresponding according to each channel
The channel cryptographic Hash that data to be encoded correspond to each channel is calculated in hash function.
Wherein, the effect of hash function is that the input of random length is transformed into same type by hashing algorithm to fix length
The output of degree.Such as, it is assumed that input has character string " abc ", integer numerical value " 1 ", if using same hash function (hash letters
Number) carry out mapping can indicate as follows:Hash (key=" abc ", seed=0)=1222323233, hash (key=" 1 ",
Seed=0)=9384398434, carry out hash operations to be indifferent to the occurrence of acquisition that is, it is only necessary to ensure for difference
Input obtain be the output of same type of regular length, and ensure that the corresponding output numerical value of different inputs is different, as above
" 1222323233 " and " 9384398434 " in face." seed " among the above refers to hash seeds, is when carrying out hash operations
One parameter can be arbitrary an integer.Different seeds represents different hash functions, and the same data do hash fortune
It calculates, if the seed used is different, what is obtained is different result.For example, it is assumed that using different seeds to character string
" abc " carries out hash operations, and corresponding result indicates as follows:Hash (key=" abc ", seed=0)=1222323233;
Hash (key=" abc ", seed=1)=323762398;Hash (key=" abc ", seed=2)=7927986737.It can be with
It is interpreted as, the corresponding hash operations of different seeds are different.
The matrix that channel vector matrix is made of multiple vectors is independent matrix trained in advance.Each channel
A corresponding hash function and a channel vector matrix, wherein at least 2, channel.Each channel corresponds to a channel and breathes out
Uncommon value.The same data are different using the cryptographic Hash that different hash functions obtains, and the hash function in different channels can be identical,
It can also be different.Each channel corresponds to a channel vector matrix, the specification (i.e. size) of the channel vector matrix in different channels
It may be the same or different.The corresponding channel cryptographic Hash of data to be encoded is calculated according to the hash function in each channel,
Data to be encoded are mapped to obtain the channel cryptographic Hash in each channel using the corresponding hash function in each channel.
Step S206 obtains data to be encoded in corresponding channel vector matrix according to channel cryptographic Hash and corresponds to each lead to
The channel subvector in road.
Wherein, channel cryptographic Hash is used to search channel corresponding with data to be encoded subvector in the vector matrix of channel.
In one embodiment, channel cryptographic Hash is directly obtained into correspondence as the line number of channel vector matrix from the vector matrix of channel
Channel subvector.In another embodiment, it is also necessary to channel cryptographic Hash is further processed to obtain corresponding line number,
For example, modulus can be carried out to channel cryptographic Hash, the value that modulus is obtained obtains correspondence as line number from the vector matrix of channel
Channel subvector.For example, it is assumed that obtained channel cryptographic Hash is 102336785, if modulus 1000, obtains
Remainder is last 3 bit digital " 785 ", is obtained from corresponding channel vector matrix using the numerical value that the modulus obtains as line number
The corresponding vector of 785 rows is used as channel subvector.
Step S208 obtains coding vector corresponding with data to be encoded according to each channel subvector got.
It wherein, can will be multiple by customized pooled function after getting the corresponding channel subvector in each channel
Channel subvector merges into coding vector corresponding with data to be encoded.In one embodiment, pooled function is using simple
Summation operation is directly spliced each channel subvector to obtain coding vector.Such as, it is assumed that 3 are all the logical of 5 dimensions
Road subvector obtains the coding vector of one 15 dimension by splicing.In another embodiment, pooled function can also use base
In the merging of neural network, that is, also need to that obtained channel subvector is further processed to obtain final coding vector.
Above-mentioned data-encoding scheme, by using multiple independent channel vector matrixs, from each independent channel vector
A channel subvector is obtained in matrix, and coding vector corresponding with data to be encoded is obtained by the form of combination.This method
In by by the vector in each channel vector matrix be combined can be used for indicate more vectors, so being conducive to save
Space, and advantageously reduce collision rate.
For example, as shown in figure 4, in one embodiment, it is assumed that being N there are one line number, columns is the moment of a vector of M
Battle array, i.e., include N number of M dimensional vectors in the vector matrix.If this vector matrix is split as what the M that 2 line numbers are N/2 was tieed up
Matrix, then the vectorial number that the two vector matrixs can indicate is (N/2)2It is a, and the latitude of the vector indicated is 2M, that is, is existed
In the case of occupied space is identical, the vectorial number that can be indicated using two vector matrixs is more, and the latitude of vector is 2M.To
The latitude of amount is bigger, and corresponding collision rate is lower.Because the latitude of vector is longer, the number of features that can be indicated is more,
It is not only advantageous to reduce collision rate, and is conducive to indicate deeper feature.Therefore when indicating the vector of identical quantity,
Using above-mentioned data-encoding scheme, space can be not only saved, and advantageously reduce collision rate.
Above-mentioned data-encoding scheme is waited for by obtaining data to be encoded using the corresponding hash function calculating in each channel
Then coded data is obtained according to channel cryptographic Hash in corresponding channel vector matrix in the corresponding channel cryptographic Hash in each channel
Corresponding channel subvector is taken, coding vector corresponding with data to be encoded is obtained according to each channel subvector later.It is above-mentioned
Data-encoding scheme can be used for indicating more to compile since the channel subvector in each channel vector matrix to be combined
Code vector subtracts significantly relative to traditional direct mode for obtaining coding vector corresponding with data in a vector matrix
Lack the occupancy in space, and reduces the collision rate between vector.
Include multiple channels, different channels point in above-mentioned data-encoding scheme as shown in figure 5, in one embodiment
Hash operations are not carried out using different hash seeds and obtain the corresponding channel cryptographic Hash in each channel, then according to channel Hash
Value obtains corresponding channel subvector from corresponding channel vector matrix, then obtains data to be encoded according to channel subvector
Corresponding coding vector.In one embodiment, for a specific category feature F, it is assumed that total classification number is n, each class
Another characteristic value (i.e. data to be encoded) is fi(i=0,1 ..., n-1).If total port number is m, the hash letters in each channel
Number (also known as " bucket hash ") is expressed as h(j)(f), wherein j is channel number (j=0,1 ..., m-1), and f is specific special
Value indicative.So, for each channel, characteristic value fiTo the available following formula table of mapping of channel cryptographic Hash (also known as " bucket id ")
It reaches:
In above formulaAs characteristic value fiThe channel cryptographic Hash obtained on the j of channel.Further, if channel Hash
The maximum value (bucket size) of value is p, thenMeetOn the other hand, to each
The independent embedding matrixes E of Path Setup(j):
Then final coding vector can be expressed as following formula:
Wherein, the f in above formula indicates arbitrary pooled function.The most common summing function that is selected as (is directly spelled
It connects).Fig. 5 is illustrated extracts corresponding channel subvector from each channel vector matrix, is then spliced to obtain final
Coding vector.Random initializtion may be used in the initial value of the channel vector matrix (embedding matrixes) in above-mentioned all channels
Strategy is declined by gradient and is updated together with specific model parameter.
As shown in fig. 6, in one embodiment, obtaining the corresponding hash function in each channel and channel vector matrix, root
The channel cryptographic Hash that data to be encoded correspond to each channel is calculated according to the corresponding hash function in each channel, including:
Step S204A is calculated data to be encoded according to the corresponding hash function of current channel and corresponds to current channel
Original cryptographic Hash.
Wherein, original cryptographic Hash refers to the cryptographic Hash being directly calculated using hash function.Each channel corresponds to one
Hash function, current channel refer to current channel to be calculated.Number to be encoded is calculated using the corresponding hash function of current channel
According to the original cryptographic Hash of corresponding current channel.
Step S204B obtains the row number of dimensions of the corresponding current channel vector matrix of current channel.
Wherein, the row number of dimensions of channel vector matrix refers to the line number of channel vector matrix.The corresponding channel in different channels
The size of vector matrix may be the same or different.So the row number of dimensions of different channels vector matrix can be identical, also may be used
With difference.Be conducive to carry out subsequent modulo operation by obtaining row number of dimensions.
Step S204C takes the original cryptographic Hash of current channel according to the row number of dimensions of current channel vector matrix
Modular arithmetic obtains the channel cryptographic Hash that data to be encoded correspond to current channel.
Wherein, modulo operation refers to desired value divided by the remainder that mould (for example, mould=1000) obtains.According to current channel to
The row number of dimensions of moment matrix determines corresponding mould.It in one embodiment, can be directly using row number of dimensions as in modulo operation
Mould.Such as, it is assumed that row number of dimensions is 1000, is directly used as mould by 1000.In another embodiment, if row number of dimensions not
It is 10 multiple, then row number of dimensions can be will be greater than and from 10 multiple of row number of dimensions recently as mould, for example, row dimension
Number is 998, then being used as mould by 1000.Modulo operation is carried out to current channel vector matrix and obtains corresponding channel cryptographic Hash.
Assuming that obtained original cryptographic Hash is a, the modulus taken is b, then it is channel Hash to carry out the remainder c that modulo operation obtains to a
Value.Such as, it is assumed that the original cryptographic Hash being calculated is 12368259, if modulus is 1000, obtained remainder is
259.By determining corresponding mould according to row number of dimensions, modular arithmetic is then carried out, more accurate channel cryptographic Hash can be obtained.
In one embodiment, according to each channel subvector got obtain it is corresponding with data to be encoded encode to
Amount, including:Channel corresponding with each channel subvector will be obtained to be spliced to obtain and number to be encoded according to channel sequence
According to corresponding coding vector.
Wherein, after getting the corresponding channel subvector in each channel, spliced to obtain and be waited for according to the sequence in channel
The corresponding coding vector of coded data.It is ranked up in advance for channel, is divided into first passage, second channel etc., convenient for according to logical
The sequence in road is orderly spliced, because the mode of splicing is different, obtained coding vector is different, so being had
Sequence is spliced.
As shown in fig. 7, in one embodiment, training obtains channel vector matrix in the following way:
Step S702 obtains training data and the corresponding label of training data.
Wherein, in order to be trained to model, first, obtaining training data and the corresponding label of training data, label is
Refer to the mark to the corresponding result of training data, is the desired output of model training.
Step S704, using training data as the input for waiting for vector layer in training pattern, vector layer includes multiple channels,
There are corresponding hash function and training channel vector matrix, each channel is calculated according to corresponding hash function in each channel
The training channel cryptographic Hash that each channel is corresponded to training data, according to training channel cryptographic Hash in corresponding trained channel vector
The channel training subvector that training data corresponds to each channel is obtained in matrix, vector layer is trained according to each channel got
Subvector obtains trained coding vector corresponding with training data.
Wherein, vector layer is converted to the representation of vector, that is, is used to obtain and instruct for being encoded to training data
Practice the corresponding trained coding vector of data.Include multiple channels in vector layer, each channel there are corresponding hash function and
Training channel vector matrix, the training that training data corresponds to each channel is calculated according to corresponding hash function for each channel
Then channel cryptographic Hash obtains training data from training channel vector matrix according to training channel cryptographic Hash and corresponds to each channel
Channel train subvector, then vector layer according to each channel train subvector obtain the corresponding training of training data encode to
Amount.Before model training, pre-set port number and corresponding hash function, then initialize each trained channel to
Moment matrix.
Step S706, using training coding vector as the input for waiting for lower layer of training pattern, by the corresponding mark of training data
Label are treated training pattern and are trained, obtain target training pattern, target training pattern as the desired output for waiting for training pattern
Including object vector layer, object vector layer includes the corresponding destination channel vector matrix in each channel.
Wherein, further include other layers in addition to vector layer in training pattern, for example, further including convolution in one embodiment
Layer, full articulamentum, specifically can be according to the self-defined setting of actual conditions.Wherein, vector layer is first layer, and training data is made
For the input of vector layer in training pattern, the training coding vector of vector layer output is obtained, it then will training coding vector conduct
Wait for next layer of input of vector layer in training pattern, later the input by next layer of output as lower layer, i.e., it will be previous
Input of the output of layer as later layer, the output until obtaining last layer.Using the corresponding label of training data as waiting instructing
The desired output for practicing model, treats the parameter in training pattern according to the gap between reality output and desired output and is adjusted
Whole, until model training is completed to obtain target training pattern, which includes object vector layer, object vector layer packet
Include the corresponding destination channel vector matrix in each channel.The obtained destination channel vector matrix be the obtained channel of training to
Moment matrix.
In one embodiment, using training data as the input for waiting for vector layer in training pattern, training data is corresponded to
Label as the desired output for waiting for training pattern, treat training pattern and be trained, obtain target training pattern, target training
Model includes object vector layer, and object vector layer includes the corresponding destination channel vector matrix in each channel, including:By training number
According to as the input for waiting for vector layer in training pattern, the reality output for waiting for training pattern is obtained;According to reality output and it is expected defeated
Go out penalty values are calculated;The corresponding trained channel moment of a vector in each channel in vector layer is treated in training pattern according to penalty values
The parameter in parameter and other layers in battle array is adjusted;Training data and the corresponding label of training data are updated, returning will instruction
Practice data as the input for waiting for vector layer in training pattern, recycle according to this, until penalty values meet the preset condition of convergence, obtains
It include the target training pattern of object vector layer.
Wherein, using training data as the input for waiting for vector layer in training pattern, the reality output for waiting for training pattern is obtained,
Penalty values are calculated using loss function according to reality output and desired output, loss function can carry out according to actual needs
Selection can also use cross entropy loss function, can also use log-likelihood for example, mean square deviation loss function may be used
Loss function etc..Parameter in training channel vector matrix refers to the value of each element in trained channel vector matrix.Other layers
Parameter refer to corresponding weight parameter and offset parameter.It is treated in training pattern in vector layer according to the penalty values being calculated
The parameter of parameter and other layers in the corresponding trained access matrix in each channel is adjusted, and is then obtained next or next
Training data and the corresponding label of training data are criticized, is returned using training data as the step for waiting for the input of vector layer in training pattern
Suddenly, it trains successively, until penalty values meet the preset condition of convergence, the condition of convergence can be a loss range of setting, such as
Fruit penalty values are in this loss range, it is believed that model trained completion obtains the target training pattern of object vector layer.It adjusts
The mode of mould preparation shape parameter may be used gradient descent method and is adjusted one by one to the parameter in model from back to front.
Wait for that training pattern can wait for that training pattern can be textual classification model according to the self-defined setting of actual conditions,
Can click Probabilistic Prediction Model, can also be conversion ratio prediction model, information recommendation model etc..Here instruction is not treated
Practice model to be defined, every includes that the model of vector layer (embedding layers) is suitable for above-mentioned training method.It is above-mentioned logical
The method of road vector matrix training, the part by the way that channel vector matrix to be used as to vector layer in model is trained, at this
During a, it is only necessary to obtain the corresponding training data of training pattern and the corresponding label of training data can to each channel to
Moment matrix is trained, simple and convenient.
As shown in figure 8, in one embodiment, the schematic diagram of multiple channel vector matrixs is trained using deep learning.Fig. 8
In illustrate model be neural network in the case of, using Direct/Reverse transmission method simultaneously to model parameter and each channel
The schematic diagram that corresponding trained channel vector matrix is updated.Using the corresponding hash function in each channel to training data into
Trained channel cryptographic Hash is calculated in row, is then obtained from corresponding training channel vector matrix according to training channel cryptographic Hash
Subvector is trained in corresponding channel, then trains subvector to obtain training coding vector according to each channel, will training coding to
The input as next layer is measured, the reality output of training pattern is obtained, model is reversely adjusted according to reality output and desired output
In parameter and training channel vector matrix in parameter.
In one embodiment, above-mentioned data-encoding scheme further includes:Using the corresponding coding vector of data to be encoded as
Next layer of input of vector layer in target training pattern;Obtain the corresponding with data to be encoded defeated of target training pattern output
Go out result.
Wherein, the stage predicted in the specific target training pattern obtained using training, according to data to be encoded
After obtaining corresponding coding vector, using coding vector as next layer of input of vector layer in target training pattern, then obtain
The output result corresponding with data to be encoded for taking target training pattern to export.
In one embodiment, data to be encoded, which include the corresponding Information of target information, influences data, target training
Model is Information prediction model;Information is influenced into data as the input of vector layer in Information prediction model,
Vector layer predicts mould for determining coding vector corresponding with the data for influencing Information, using coding vector as Information
Next layer of input of vector layer in type;Obtain the predictive information point corresponding with target information of Information prediction model output
Hit probability.
Wherein, in the scene of Information prediction, data to be encoded are the corresponding Information shadow of target information
Data are rung, it refers to influencing the data of Information probability that Information, which influences data,.By the corresponding Information shadow of target information
Input of the data as vector layer in Information prediction model is rung, vector layer is corresponding with Information influence data for determining
Coding vector finally get letter using coding vector as next layer of input of vector layer in Information prediction model
The predictive information corresponding with target information that breath clicks prediction model output clicks probability.Wherein, target information can be to wait pushing away
The advertisement recommended can also be news to be recommended, can also be article etc. to be recommended.
In the scene of a text classification, data to be encoded are text to be sorted, and target training pattern is text classification
Model, using text to be sorted as the input of vector layer in textual classification model, vector layer is for determining that text to be sorted corresponds to
Text code vector, then using text code vector as next layer of input of vector layer in textual classification model, then
It obtains textual classification model and exports classification corresponding with text to be sorted.
In the scene of a game recommdation, data to be encoded are the historical game play behavioral data of user to be recommended, target
Training pattern is game recommdation model, using the historical game play behavioral data of user to be recommended as vector layer in game recommdation model
Input, vector layer is for determining corresponding with historical game play behavioral data game behavior coding vector, and then will play behavior
Then next layer of input of the coding vector as vector layer in game recommdation model obtains the target of game recommdation model output
Recommended games.
As shown in figure 9, in one embodiment, training obtains channel vector matrix in the following way:
Step S902 obtains training data and the corresponding standard code vector of training data.
Wherein, standard code vector refers to correct coding vector corresponding with training data.Training data refers to known
The data of standard code vector.Each channel vector matrix is trained by using the data of known standard code vector, so as to
The each channel vector matrix subsequently obtained according to training is compiled to obtain the standard corresponding to the data of unknown standard code vector
Code vector.
Step S904, using training data as the input in each channel, there are corresponding hash function and instructions in each channel
Practice channel vector matrix, each channel according to corresponding hash function be calculated training data correspond to each channel training it is logical
Road cryptographic Hash obtains training data according to training channel cryptographic Hash in corresponding trained channel vector matrix and corresponds to each channel
Channel train subvector.
Wherein, using training data as the input in each channel, there are corresponding hash function and training are logical in each channel
Road vector matrix calculates the training channel cryptographic Hash that training data corresponds to each channel, so according to the hash function in each channel
Corresponding channel is obtained in corresponding trained channel vector according to training channel cryptographic Hash afterwards and trains subvector.
Step S906 trains subvector to obtain trained coding vector corresponding with training data according to each channel.
Wherein, after getting each channel training subvector, each channel training subvector is carried out using pooled function
Splicing obtains training coding vector.Training coding vector refers to the coding vector actually obtained.
Coding loss value is calculated according to training coding vector and standard code vector in step S908.
Wherein, preset loss function is obtained, for example, mean square deviation loss function, training coding is calculated using loss function
Coding loss value between vector and standard code vector, coding loss value for weigh trained coding vector and standard code to
Gap between amount.
Step S910 is adjusted the parameter in training channel vector matrix according to coding loss value.
Wherein, after obtaining coding loss value, according to coding loss value in the training channel vector matrix in each channel
Parameter is adjusted, so that coding loss value minimizes.
Step S912 updates training data and training data corresponding standard code vector, return using training data as
The input in each channel, recycles according to this, until coding loss value meets the preset condition of convergence, obtains destination channel moment of a vector
Battle array.
Wherein, it refers to being concentrated under acquisition from training data to update training data and the corresponding standard code vector of training data
One training data and the corresponding standard code vector of next training data, continue to use new training data and training data
Corresponding standard code vector is trained training pattern, recycles according to this, until the coding loss value being calculated meets in advance
If the condition of convergence, training finishes and obtaining destination channel vector matrix.
As shown in Figure 10, it is the schematic diagram of one group of the simulation experiment result, illustrates under same memory occupancy, single channel
Reducing powers of the hash and two channel hash to feature raw information.In this experiment, the classification of different scales is generated at random
Feature (abscissa in corresponding diagram indicates that number of features is different) corresponding standard code is vectorial (ground truth).For
Single channel hash, it is 50 that bucket size (i.e. space hold) are arranged here.For two channels, the bucket in each channel
Size is 25, i.e. single channel and the EMS memory occupation in two channels is consistent.In addition, the pooled function in corresponding two channels uses simply
Summation, i.e., directly spliced, to ensure that pooled function does not include the fairness that additional variable influences comparison.And single-pass
Corresponding channel vector matrix is all instructed using same mean square deviation loss function when training in road and two channels
Practice.MSE (mean square deviation) after convergence indicates that MSE is lower in figure below with ordinate, and the information for representing loss is fewer.It can see
It arrives, under different characteristic value number, the error of the coding vector in two channels is respectively less than the error of single pass coding vector, this table
Information loss smaller caused by bright two channel, can preferably expression characteristic.That is, the coding method using multichannel can be with
It is effectively reduced collision rate, reduces the loss of information, fault-tolerance is more preferable.
In one embodiment, it stores and transmits, the channel vector matrix of multichannel can be used for the ease of data
The mode of distributed storage is stored, and in order to improve speed, the mode that parallel computation may be used in multichannel is calculated.
As shown in figure 11, in one embodiment it is proposed that a kind of data-encoding scheme, includes the following steps:
Step S1101 obtains training data and the corresponding label of training data.
Step S1102, using training data as the input for waiting for vector layer in training pattern, vector layer includes multiple logical
Road, each channel is there are corresponding hash function and training channel vector matrix, and each channel is according to corresponding hash function meter
Calculation obtains the training channel cryptographic Hash that training data corresponds to each channel, according to training channel cryptographic Hash in corresponding trained channel
The channel training subvector that training data corresponds to each channel is obtained in vector matrix, vector layer is according to each channel got
Training subvector obtains trained coding vector corresponding with training data.
Step S1103, using training coding vector as the input for waiting for lower layer of training pattern, by the corresponding mark of training data
Label are treated training pattern and are trained, obtain target training pattern, target training pattern as the desired output for waiting for training pattern
Including object vector layer, object vector layer includes the corresponding destination channel vector matrix in each channel.
Step S1104 obtains data to be encoded.
Step S1105 is calculated data to be encoded according to the corresponding hash function of current channel and corresponds to current channel
Original cryptographic Hash.
Step S1106 obtains the row number of dimensions of the corresponding current channel vector matrix of current channel.
Step S1107 takes the original cryptographic Hash of current channel according to the row number of dimensions of current channel vector matrix
Modular arithmetic obtains the channel cryptographic Hash that data to be encoded correspond to current channel.
Step S1108, obtained in corresponding channel vector matrix according to channel cryptographic Hash data to be encoded correspond to it is each
The channel subvector in channel.
Step S1109 will obtain channel corresponding with each channel subvector and be spliced to obtain according to channel sequence
Coding vector corresponding with data to be encoded.
As shown in figure 12, in one embodiment it is proposed that a kind of data coding device, the device include:
Data acquisition module 1202, for obtaining data to be encoded;
Computing module 1204, for obtaining the corresponding hash function in each channel and channel vector matrix, according to each logical
The channel cryptographic Hash that the data to be encoded correspond to each channel is calculated in the corresponding hash function in road;
Subvector acquisition module 1206, for obtaining institute in corresponding channel vector matrix according to the channel cryptographic Hash
State the channel subvector that data to be encoded correspond to each channel;
Vector generation module 1208, for being obtained and the data to be encoded pair according to each channel subvector got
The coding vector answered.
In one embodiment, the computing module 1204 is additionally operable to be calculated according to the corresponding hash function of current channel
The original cryptographic Hash of current channel is corresponded to the data to be encoded, obtains the corresponding current channel vector matrix of current channel
Row number of dimensions carries out modulo operation according to the row number of dimensions of the current channel vector matrix to the original cryptographic Hash of current channel
Obtain the channel cryptographic Hash that the data to be encoded correspond to current channel.
In one embodiment, the vector generation module 1208 is additionally operable to the channel corresponding with each channel that will be obtained
Subvector is spliced to obtain coding vector corresponding with the data to be encoded according to channel sequence.
As shown in figure 13, in one embodiment, above-mentioned data coding device further includes:
First passage vector matrix training module 1201, for obtaining training data and the corresponding mark of the training data
Label;Using the training data as the input for waiting for vector layer in training pattern, the vector layer includes multiple channels, Ge Getong
There are corresponding hash function and training channel vector matrix, each channels to be calculated according to corresponding hash function described in road
Training data corresponds to the training channel cryptographic Hash in each channel, according to the trained channel cryptographic Hash corresponding trained channel to
The channel training subvector that the training data corresponds to each channel is obtained in moment matrix, the vector layer is each according to what is got
A channel training subvector obtains trained coding vector corresponding with the training data;Using the trained coding vector as institute
The input for waiting for lower layer of training pattern is stated, the corresponding label of the training data is waited for that the expectation of training pattern is defeated as described in
Going out, wait for that training pattern is trained to described, obtains target training pattern, the target training pattern includes object vector layer,
The object vector layer includes the corresponding destination channel vector matrix in each channel.
In one embodiment, the first passage vector matrix training module is additionally operable to using the training data as waiting for
The input of vector layer in training pattern waits for the reality output of training pattern described in acquisition;According to the reality output and the phase
Hope output that penalty values are calculated;According to the penalty values the corresponding instruction in each channel in vector layer is waited in training pattern to described
The parameter practiced in parameter and other layers in the vector matrix of channel is adjusted;Update the training data and the training data
Corresponding label is returned using the training data as the input for waiting for vector layer in training pattern, is recycled according to this, until the damage
Mistake value meets the preset condition of convergence, obtain include the object vector layer target training pattern.
As shown in figure 14, in one embodiment, above-mentioned data coding device further includes:
Output module 1210, for using the corresponding coding vector of the data to be encoded as in the target training pattern
Next layer of input of vector layer obtains output corresponding with the data to be encoded knot of the target training pattern output
Fruit.
In one embodiment, the data to be encoded, which include the corresponding Information of target information, influences data, described
Target training pattern is Information prediction model;Above-mentioned data coding device further includes:Probabilistic forecasting module is clicked, being used for will
Described information, which is clicked, influences the input that data click vector layer in prediction model as described information, and the vector layer is for determining
Coding vector corresponding with described information click influence data, is clicked the coding vector as described information in prediction model
Next layer of input of vector layer;Obtain prediction corresponding with the target information letter that described information clicks prediction model output
Breath clicks probability.
As shown in figure 15, in one embodiment, above-mentioned data coding device further includes:
Second channel vector matrix training module 1212, for obtaining training data and the corresponding standard of the training data
Coding vector;Using the training data as the input in each channel, there are corresponding hash function and training are logical in each channel
Road vector matrix, each channel according to corresponding hash function be calculated the training data correspond to each channel training it is logical
Road cryptographic Hash obtains the training data in corresponding trained channel vector matrix according to the trained channel cryptographic Hash and corresponds to
Train subvector in the channel in each channel;It trains subvector to obtain training corresponding with the training data according to each channel to compile
Code vector;Coding loss value is calculated according to the trained coding vector and the standard code vector;According to the coding
Penalty values are adjusted the parameter in the trained channel vector matrix;Update the training data and the training data pair
The standard code vector answered, returns using the training data as the input in each channel, recycles according to this, until the coding damages
Mistake value meets the preset condition of convergence, obtains destination channel vector matrix.
Figure 16 shows the internal structure chart of one embodiment Computer equipment.The computer equipment can be specifically eventually
End, can also be server.As shown in figure 16, which includes processor, the memory connected by system bus
And network interface.Wherein, memory includes non-volatile memory medium and built-in storage.The non-volatile of the computer equipment is deposited
Storage media is stored with operating system, can also be stored with computer program, when which is executed by processor, may make place
It manages device and realizes data-encoding scheme.Also computer program can be stored in the built-in storage, which is held by processor
When row, processor may make to execute data-encoding scheme.It will be understood by those skilled in the art that structure shown in Figure 16, only
Only be with the block diagram of the relevant part-structure of application scheme, do not constitute the computer being applied thereon to application scheme
The restriction of equipment, specific computer equipment may include than more or fewer components as shown in the figure, or the certain portions of combination
Part, or arranged with different components.
In one embodiment, data-encoding scheme provided by the present application can be implemented as a kind of shape of computer program
Formula, computer program can be run on computer equipment as shown in figure 16.Composition can be stored in the memory of computer equipment
Each program module of the data-encoding scheme device, for example, the data acquisition module 1202 of Figure 12, computing module 1204, son
Vectorial acquisition module 1206 and vector generation module 1208.The computer program that each program module is constituted makes processor execute
Step in the data-encoding scheme device of each embodiment of the application described in this specification.For example, being counted shown in Figure 16
Data to be encoded can be obtained by the acquisition module 1202 of data-encoding scheme device as shown in figure 12 by calculating machine equipment;Pass through
Computing module 1204 obtains the corresponding hash function in each channel and channel vector matrix, according to the corresponding Hash letter in each channel
The channel cryptographic Hash that the data to be encoded correspond to each channel is calculated in number;By subvector acquisition module 1206 according to institute
It states channel cryptographic Hash and obtains the channel subvector that the data to be encoded correspond to each channel in corresponding channel vector matrix;
Coding corresponding with the data to be encoded is obtained according to each channel subvector got by vector generation module 1208
Vector.
In one embodiment it is proposed that a kind of computer equipment, including memory and processor, the memory storage
There is computer program, when the computer program is executed by the processor so that the processor executes following steps:It obtains
Data to be encoded;The corresponding hash function in each channel and channel vector matrix are obtained, according to the corresponding Hash letter in each channel
The channel cryptographic Hash that the data to be encoded correspond to each channel is calculated in number;According to the channel cryptographic Hash corresponding logical
The channel subvector that the data to be encoded correspond to each channel is obtained in road vector matrix;According to each channel got
Vector obtains coding vector corresponding with the data to be encoded.
In one embodiment, described to obtain the corresponding hash function in each channel and channel vector matrix, according to each
The channel cryptographic Hash that the data to be encoded correspond to each channel is calculated in the corresponding hash function in channel, including:According to working as
The original cryptographic Hash that the data to be encoded correspond to current channel is calculated in the corresponding hash function of prepass;It obtains current logical
The row number of dimensions of the corresponding current channel vector matrix in road;According to the row number of dimensions of the current channel vector matrix to currently leading to
The original cryptographic Hash in road carries out modulo operation and obtains the channel cryptographic Hash that the data to be encoded correspond to current channel.
In one embodiment, each channel subvector that the basis is got obtains volume corresponding with data to be encoded
Code vector, including:To obtain channel corresponding with each channel subvector according to channel sequence spliced to obtain with it is described
The corresponding coding vector of data to be encoded.
In one embodiment, the computer program also makes the processor execute following steps:Obtain training number
According to label corresponding with the training data;Using the training data as the input for waiting for vector layer in training pattern, it is described to
Amount layer includes multiple channels, each channel there are corresponding hash function and training channel vector matrix, each channel according to
The training channel cryptographic Hash that the training data corresponds to each channel is calculated in corresponding hash function, logical according to the training
Road cryptographic Hash obtained in corresponding trained channel vector matrix the training data correspond to each channel channel training son to
Amount, the vector layer according to each channel for getting train subvector obtain training corresponding with the training data encode to
Amount;The input that the trained coding vector is waited for lower layer of training pattern as described in, by the corresponding label of the training data
As the desired output for waiting for training pattern, waits for that training pattern is trained to described, obtain target training pattern, the mesh
It includes object vector layer to mark training pattern, and the object vector layer includes the corresponding destination channel vector matrix in each channel.
In one embodiment, described using the training data as the input for waiting for vector layer in training pattern, it will be described
The corresponding label of training data waits for the desired output of training pattern as described in, waits for that training pattern is trained to described, obtains
Target training pattern, the target training pattern include object vector layer, and the object vector layer includes that each channel is corresponding
Destination channel vector matrix, including:Using the training data as the input for waiting for vector layer in training pattern, wait instructing described in acquisition
Practice the reality output of model;Penalty values are calculated according to the reality output and the desired output;According to the penalty values
To in the parameter and other layers waited in training pattern in vector layer in the corresponding trained channel vector matrix in each channel
Parameter is adjusted;Update the training data and the corresponding label of the training data, return using the training data as
The input for waiting for vector layer in training pattern, recycles according to this, until the penalty values meet the preset condition of convergence, is included
The target training pattern of the object vector layer.
In one embodiment, the computer program also makes the processor execute following steps:It waits compiling by described
Next layer of input of the corresponding coding vector of code data as vector layer in the target training pattern;
Obtain the output result corresponding with the data to be encoded of the target training pattern output.
In one embodiment, the data to be encoded, which include the corresponding Information of target information, influences data, described
Target training pattern is Information prediction model;
The computer program also makes the processor execute following steps:Using described information click influence data as
Described information clicks the input of vector layer in prediction model, and the vector layer influences data pair for determining to click with described information
The coding vector answered clicks next layer of input of vector layer in prediction model using the coding vector as described information;It obtains
The predictive information corresponding with the target information that described information clicks prediction model output is taken to click probability.
In one embodiment, the computer program also makes the processor execute following steps:Obtain training number
According to standard code corresponding with training data vector;Using the training data as the input in each channel, each channel
There are corresponding hash functions and training channel vector matrix, and the instruction is calculated according to corresponding hash function in each channel
Practice the training channel cryptographic Hash that data correspond to each channel, according to the trained channel cryptographic Hash in corresponding trained channel vector
The channel training subvector that the training data corresponds to each channel is obtained in matrix;Subvector is trained to obtain according to each channel
Trained coding vector corresponding with the training data;It is calculated according to the trained coding vector and the standard code vector
To coding loss value;The parameter in the trained channel vector matrix is adjusted according to the coding loss value;Update institute
Training data and the corresponding standard code vector of the training data are stated, is returned using the training data as the defeated of each channel
Enter, recycle according to this, until the coding loss value meets the preset condition of convergence, obtains destination channel vector matrix.
In one embodiment it is proposed that a kind of computer readable storage medium, is stored with computer program, the calculating
When machine program is executed by processor so that the processor executes following steps:Obtain data to be encoded;Obtain each channel pair
The data to be encoded pair are calculated according to the corresponding hash function in each channel in the hash function and channel vector matrix answered
Answer the channel cryptographic Hash in each channel;It is obtained in corresponding channel vector matrix according to the channel cryptographic Hash described to be encoded
Data correspond to the channel subvector in each channel;It is obtained and the data to be encoded pair according to each channel subvector got
The coding vector answered.
In one embodiment, described to obtain the corresponding hash function in each channel and channel vector matrix, according to each
The channel cryptographic Hash that the data to be encoded correspond to each channel is calculated in the corresponding hash function in channel, including:According to working as
The original cryptographic Hash that the data to be encoded correspond to current channel is calculated in the corresponding hash function of prepass;It obtains current logical
The row number of dimensions of the corresponding current channel vector matrix in road;According to the row number of dimensions of the current channel vector matrix to currently leading to
The original cryptographic Hash in road carries out modulo operation and obtains the channel cryptographic Hash that the data to be encoded correspond to current channel.
In one embodiment, each channel subvector that the basis is got obtains volume corresponding with data to be encoded
Code vector, including:To obtain channel corresponding with each channel subvector according to channel sequence spliced to obtain with it is described
The corresponding coding vector of data to be encoded.
In one embodiment, the computer program also makes the processor execute following steps:Obtain training number
According to label corresponding with the training data;Using the training data as the input for waiting for vector layer in training pattern, it is described to
Amount layer includes multiple channels, each channel there are corresponding hash function and training channel vector matrix, each channel according to
The training channel cryptographic Hash that the training data corresponds to each channel is calculated in corresponding hash function, logical according to the training
Road cryptographic Hash obtained in corresponding trained channel vector matrix the training data correspond to each channel channel training son to
Amount, the vector layer according to each channel for getting train subvector obtain training corresponding with the training data encode to
Amount;The input that the trained coding vector is waited for lower layer of training pattern as described in, by the corresponding label of the training data
As the desired output for waiting for training pattern, waits for that training pattern is trained to described, obtain target training pattern, the mesh
It includes object vector layer to mark training pattern, and the object vector layer includes the corresponding destination channel vector matrix in each channel.
In one embodiment, described using the training data as the input for waiting for vector layer in training pattern, it will be described
The corresponding label of training data waits for the desired output of training pattern as described in, waits for that training pattern is trained to described, obtains
Target training pattern, the target training pattern include object vector layer, and the object vector layer includes that each channel is corresponding
Destination channel vector matrix, including:Using the training data as the input for waiting for vector layer in training pattern, wait instructing described in acquisition
Practice the reality output of model;Penalty values are calculated according to the reality output and the desired output;According to the penalty values
To in the parameter and other layers waited in training pattern in vector layer in the corresponding trained channel vector matrix in each channel
Parameter is adjusted;Update the training data and the corresponding label of the training data, return using the training data as
The input for waiting for vector layer in training pattern, recycles according to this, until the penalty values meet the preset condition of convergence, is included
The target training pattern of the object vector layer.
In one embodiment, the computer program also makes the processor execute following steps:It waits compiling by described
Next layer of input of the corresponding coding vector of code data as vector layer in the target training pattern;Obtain the target instruction
Practice the output result corresponding with the data to be encoded of model output.
In one embodiment, the data to be encoded, which include the corresponding Information of target information, influences data, described
Target training pattern is Information prediction model;The computer program also makes the processor execute following steps:It will
Described information, which is clicked, influences the input that data click vector layer in prediction model as described information, and the vector layer is for determining
Coding vector corresponding with described information click influence data, is clicked the coding vector as described information in prediction model
Next layer of input of vector layer;Obtain prediction corresponding with the target information letter that described information clicks prediction model output
Breath clicks probability.
In one embodiment, the computer program also makes the processor execute following steps:Obtain training number
According to standard code corresponding with training data vector;Using the training data as the input in each channel, each channel
There are corresponding hash functions and training channel vector matrix, and the instruction is calculated according to corresponding hash function in each channel
Practice the training channel cryptographic Hash that data correspond to each channel, according to the trained channel cryptographic Hash in corresponding trained channel vector
The channel training subvector that the training data corresponds to each channel is obtained in matrix;Subvector is trained to obtain according to each channel
Trained coding vector corresponding with the training data;It is calculated according to the trained coding vector and the standard code vector
To coding loss value;The parameter in the trained channel vector matrix is adjusted according to the coding loss value;Update institute
Training data and the corresponding standard code vector of the training data are stated, is returned using the training data as the defeated of each channel
Enter, recycle according to this, until the coding loss value meets the preset condition of convergence, obtains destination channel vector matrix.
One of ordinary skill in the art will appreciate that realizing all or part of flow in above-described embodiment method, being can be with
Relevant hardware is instructed to complete by computer program, the program can be stored in a non-volatile computer and can be read
In storage medium, the program is when being executed, it may include such as the flow of the embodiment of above-mentioned each method.Wherein, provided herein
Each embodiment used in any reference to memory, storage, database or other media, may each comprise non-volatile
And/or volatile memory.Nonvolatile memory may include that read-only memory (ROM), programming ROM (PROM), electricity can be compiled
Journey ROM (EPROM), electrically erasable ROM (EEPROM) or flash memory.Volatile memory may include random access memory
(RAM) or external cache.By way of illustration and not limitation, RAM is available in many forms, such as static state RAM
(SRAM), dynamic ram (DRAM), synchronous dram (SDRAM), double data rate sdram (DDRSDRAM), enhanced SDRAM
(ESDRAM), synchronization link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) directly RAM (RDRAM), straight
Connect memory bus dynamic ram (DRDRAM) and memory bus dynamic ram (RDRAM) etc..
Each technical characteristic of above example can be combined arbitrarily, to keep description succinct, not to above-described embodiment
In each technical characteristic it is all possible combination be all described, as long as however, the combination of these technical characteristics be not present lance
Shield is all considered to be the range of this specification record.
The several embodiments of the application above described embodiment only expresses, the description thereof is more specific and detailed, but simultaneously
Cannot the limitation to the application the scope of the claims therefore be interpreted as.It should be pointed out that for those of ordinary skill in the art
For, under the premise of not departing from the application design, various modifications and improvements can be made, these belong to the guarantor of the application
Protect range.Therefore, the protection domain of the application patent should be determined by the appended claims.
Claims (15)
1. a kind of data-encoding scheme, the method includes:
Obtain data to be encoded;
The corresponding hash function in each channel and channel vector matrix are obtained, is calculated according to the corresponding hash function in each channel
The channel cryptographic Hash in each channel is corresponded to the data to be encoded;
The data to be encoded are obtained in corresponding channel vector matrix correspond to each channel according to the channel cryptographic Hash
Channel subvector;
Coding vector corresponding with the data to be encoded is obtained according to each channel subvector got.
2. according to the method described in claim 1, it is characterized in that, described obtain the corresponding hash function in each channel and channel
The channel Kazakhstan that the data to be encoded correspond to each channel is calculated according to the corresponding hash function in each channel in vector matrix
Uncommon value, including:
The original cryptographic Hash that the data to be encoded correspond to current channel is calculated according to the corresponding hash function of current channel;
Obtain the row number of dimensions of the corresponding current channel vector matrix of current channel;
Modulo operation is carried out according to the row number of dimensions of the current channel vector matrix to the original cryptographic Hash of current channel to obtain
The data to be encoded correspond to the channel cryptographic Hash of current channel.
3. according to the method described in claim 1, it is characterized in that, each channel subvector that the basis is got obtain with
The corresponding coding vector of data to be encoded, including:
Channel corresponding with each channel subvector will be obtained to be spliced to obtain and the number to be encoded according to channel sequence
According to corresponding coding vector.
4. according to the method described in claim 1, it is characterized in that, the channel vector matrix is trained in the following way
It arrives:
Obtain training data and the corresponding label of the training data;
Using the training data as the input for waiting for vector layer in training pattern, the vector layer includes multiple channels, each
There are corresponding hash function and training channel vector matrix, institute is calculated according to corresponding hash function in each channel in channel
The training channel cryptographic Hash that training data corresponds to each channel is stated, according to the trained channel cryptographic Hash in corresponding trained channel
Obtain the channel training subvector that the training data corresponds to each channel in vector matrix, the vector layer is according to getting
Each channel training subvector obtains trained coding vector corresponding with the training data;
The input that the trained coding vector is waited for lower layer of training pattern as described in, by the corresponding label of the training data
As the desired output for waiting for training pattern, waits for that training pattern is trained to described, obtain target training pattern, the mesh
It includes object vector layer to mark training pattern, and the object vector layer includes the corresponding destination channel vector matrix in each channel.
5. according to the method described in claim 4, it is characterized in that, it is described using the training data as wait in training pattern to
The corresponding label of the training data, the desired output of training pattern is waited for as described in, waits training to described by the input for measuring layer
Model is trained, and obtains target training pattern, and the target training pattern includes object vector layer, the object vector layer packet
The corresponding destination channel vector matrix in each channel is included, including:
Using the training data as the input for waiting for vector layer in training pattern, the reality output of training pattern is waited for described in acquisition;
Penalty values are calculated according to the reality output and the desired output;
It is waited in training pattern in vector layer in the corresponding trained channel vector matrix in each channel described according to the penalty values
Parameter and other layers in parameter be adjusted;
It updates the training data and the corresponding label of the training data, returns using the training data as waiting for training pattern
The input of middle vector layer, recycles according to this, until the penalty values meet the preset condition of convergence, obtain include the target to
Measure the target training pattern of layer.
6. according to the method described in claim 4, it is characterized in that, the method further includes:
Using the corresponding coding vector of the data to be encoded as next layer of input of vector layer in the target training pattern;
Obtain the output result corresponding with the data to be encoded of the target training pattern output.
7. according to the method described in claim 4, it is characterized in that, the data to be encoded include the corresponding information of target information
Clicking influences data, and the target training pattern is Information prediction model;
Described information, which is clicked, influences data as the input of vector layer in described information click prediction model, and the vector layer is used
In determining coding vector corresponding with described information click influence data, the coding vector is clicked as described information and is predicted
Next layer of input of vector layer in model;
Obtain predictive information corresponding with the target information click probability that described information clicks prediction model output.
8. according to the method described in claim 1, it is characterized in that, the channel vector matrix is trained in the following way
It arrives:
Obtain training data and the corresponding standard code vector of the training data;
Using the training data as the input in each channel, there are corresponding hash functions and training channel vector in each channel
The training channel Hash that the training data corresponds to each channel is calculated according to corresponding hash function for matrix, each channel
Value obtains the training data in corresponding trained channel vector matrix according to the trained channel cryptographic Hash and corresponds to each lead to
Train subvector in the channel in road;
Subvector is trained to obtain trained coding vector corresponding with the training data according to each channel;
Coding loss value is calculated according to the trained coding vector and the standard code vector;
The parameter in the trained channel vector matrix is adjusted according to the coding loss value;
The training data and the corresponding standard code vector of the training data are updated, is returned using the training data as each
The input in a channel, recycles according to this, until the coding loss value meets the preset condition of convergence, obtains destination channel moment of a vector
Battle array.
9. a kind of data coding device, described device include:
Data acquisition module, for obtaining data to be encoded;
Computing module, it is corresponding according to each channel for obtaining the corresponding hash function in each channel and channel vector matrix
The channel cryptographic Hash that the data to be encoded correspond to each channel is calculated in hash function;
Subvector acquisition module, it is described to be encoded for being obtained in corresponding channel vector matrix according to the channel cryptographic Hash
Data correspond to the channel subvector in each channel;
Vector generation module, for obtaining coding corresponding with the data to be encoded according to each channel subvector got
Vector.
10. device according to claim 9, which is characterized in that the computing module is additionally operable to be corresponded to according to current channel
Hash function be calculated the original cryptographic Hash that the data to be encoded correspond to current channel, obtain that current channel is corresponding to work as
The row number of dimensions of prepass vector matrix, according to the row number of dimensions of the current channel vector matrix to the original Kazakhstan of current channel
Uncommon value carries out modulo operation and obtains the channel cryptographic Hash that the data to be encoded correspond to current channel.
11. device according to claim 9, which is characterized in that the vector generation module be additionally operable to will to obtain with it is each
The corresponding channel subvector in a channel is spliced to obtain coding vector corresponding with the data to be encoded according to channel sequence.
12. device according to claim 9, which is characterized in that described device further includes:
First passage vector matrix training module, for obtaining training data and the corresponding label of the training data;It will be described
For training data as the input for waiting for vector layer in training pattern, the vector layer includes multiple channels, each channel presence pair
The hash function and training channel vector matrix answered, the training data is calculated according to corresponding hash function in each channel
The training channel cryptographic Hash in corresponding each channel, according to the trained channel cryptographic Hash in corresponding trained channel vector matrix
The channel training subvector that the training data corresponds to each channel is obtained, the vector layer is instructed according to each channel got
Practice subvector and obtains trained coding vector corresponding with the training data;The trained coding vector is waited training as described in
The corresponding label of the training data is waited for the desired output of training pattern, to described by the input that lower layer of model as described in
Wait for that training pattern is trained, obtain target training pattern, the target training pattern includes object vector layer, the target to
It includes the corresponding destination channel vector matrix in each channel to measure layer.
13. device according to claim 12, which is characterized in that the first passage vector matrix training module is additionally operable to
Using the training data as the input for waiting for vector layer in training pattern, the reality output of training pattern is waited for described in acquisition;According to
Penalty values are calculated in the reality output and the desired output;According to the penalty values vector in training pattern is waited for described
Parameter in layer in the corresponding trained channel vector matrix in each channel and the parameter in other layers are adjusted;Update the instruction
Practice data and the corresponding label of the training data, returns using the training data as waiting for the defeated of vector layer in training pattern
Enter, recycle according to this, until the penalty values meet the preset condition of convergence, obtain include the object vector layer target instruction
Practice model.
14. a kind of computer readable storage medium is stored with computer program, when the computer program is executed by processor,
So that the processor is executed such as the step of any one of claim 1 to 8 the method.
15. a kind of computer equipment, including memory and processor, the memory is stored with computer program, the calculating
When machine program is executed by the processor so that the processor executes the step such as any one of claim 1 to 8 the method
Suddenly.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810439264.5A CN108631787B (en) | 2018-05-09 | 2018-05-09 | Data encoding method, data encoding device, computer equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810439264.5A CN108631787B (en) | 2018-05-09 | 2018-05-09 | Data encoding method, data encoding device, computer equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108631787A true CN108631787A (en) | 2018-10-09 |
CN108631787B CN108631787B (en) | 2020-04-03 |
Family
ID=63692246
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810439264.5A Active CN108631787B (en) | 2018-05-09 | 2018-05-09 | Data encoding method, data encoding device, computer equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108631787B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200104700A1 (en) * | 2018-09-28 | 2020-04-02 | Wipro Limited | Method and system for improving performance of an artificial neural network |
CN111612079A (en) * | 2020-05-22 | 2020-09-01 | 深圳前海微众银行股份有限公司 | Data right confirming method, equipment and readable storage medium |
CN111666442A (en) * | 2020-06-02 | 2020-09-15 | 腾讯科技(深圳)有限公司 | Image retrieval method and device and computer equipment |
CN113792816A (en) * | 2021-09-27 | 2021-12-14 | 重庆紫光华山智安科技有限公司 | Data encoding method, data encoding device, computer equipment and storage medium |
CN116192402A (en) * | 2023-01-18 | 2023-05-30 | 南阳理工学院 | Communication method with information coding |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120263048A1 (en) * | 2011-04-14 | 2012-10-18 | Cisco Technology, Inc. | Methods for Even Hash Distribution for Port Channel with a Large Number of Ports |
CN102915740A (en) * | 2012-10-24 | 2013-02-06 | 兰州理工大学 | Phonetic empathy Hash content authentication method capable of implementing tamper localization |
CN107220614A (en) * | 2017-05-24 | 2017-09-29 | 北京小米移动软件有限公司 | Image-recognizing method, device and computer-readable recording medium |
-
2018
- 2018-05-09 CN CN201810439264.5A patent/CN108631787B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120263048A1 (en) * | 2011-04-14 | 2012-10-18 | Cisco Technology, Inc. | Methods for Even Hash Distribution for Port Channel with a Large Number of Ports |
CN102915740A (en) * | 2012-10-24 | 2013-02-06 | 兰州理工大学 | Phonetic empathy Hash content authentication method capable of implementing tamper localization |
CN107220614A (en) * | 2017-05-24 | 2017-09-29 | 北京小米移动软件有限公司 | Image-recognizing method, device and computer-readable recording medium |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200104700A1 (en) * | 2018-09-28 | 2020-04-02 | Wipro Limited | Method and system for improving performance of an artificial neural network |
US11544551B2 (en) * | 2018-09-28 | 2023-01-03 | Wipro Limited | Method and system for improving performance of an artificial neural network |
CN111612079A (en) * | 2020-05-22 | 2020-09-01 | 深圳前海微众银行股份有限公司 | Data right confirming method, equipment and readable storage medium |
CN111612079B (en) * | 2020-05-22 | 2021-07-20 | 深圳前海微众银行股份有限公司 | Data right confirming method, equipment and readable storage medium |
CN111666442A (en) * | 2020-06-02 | 2020-09-15 | 腾讯科技(深圳)有限公司 | Image retrieval method and device and computer equipment |
CN111666442B (en) * | 2020-06-02 | 2023-04-18 | 腾讯科技(深圳)有限公司 | Image retrieval method and device and computer equipment |
CN113792816A (en) * | 2021-09-27 | 2021-12-14 | 重庆紫光华山智安科技有限公司 | Data encoding method, data encoding device, computer equipment and storage medium |
CN113792816B (en) * | 2021-09-27 | 2022-11-01 | 重庆紫光华山智安科技有限公司 | Data encoding method, data encoding device, computer equipment and storage medium |
CN116192402A (en) * | 2023-01-18 | 2023-05-30 | 南阳理工学院 | Communication method with information coding |
CN116192402B (en) * | 2023-01-18 | 2023-10-10 | 南阳理工学院 | Communication method with information coding |
Also Published As
Publication number | Publication date |
---|---|
CN108631787B (en) | 2020-04-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108631787A (en) | Data-encoding scheme, device, computer equipment and storage medium | |
CN109816615B (en) | Image restoration method, device, equipment and storage medium | |
CN107563567A (en) | Core extreme learning machine Flood Forecasting Method based on sparse own coding | |
CN109271933A (en) | The method for carrying out 3 D human body Attitude estimation based on video flowing | |
CN110287335A (en) | The personalized recommending scenery spot method and device of knowledge based map and user's shot and long term preference | |
CN110197307B (en) | Regional sea surface temperature prediction method combined with attention mechanism | |
CN109299685A (en) | Deduction network and its method for the estimation of human synovial 3D coordinate | |
CN108985899B (en) | Recommendation method, system and storage medium based on CNN-LFM model | |
CN107562787A (en) | A kind of POI coding methods and device, POI recommend method, electronic equipment | |
CN116128158B (en) | Oil well efficiency prediction method of mixed sampling attention mechanism | |
CN110442846A (en) | A kind of sequence data forecasting system of New Multi-scale attention mechanism | |
CN115307780B (en) | Sea surface temperature prediction method, system and application based on time-space information interaction fusion | |
CN107003834A (en) | Pedestrian detection apparatus and method | |
CN109919670A (en) | Prediction technique, device, server and the storage medium of ad click probability | |
CN113255908A (en) | Method, neural network model and device for service prediction based on event sequence | |
CN111861046A (en) | Intelligent patent value evaluation system based on big data and deep learning | |
CN115049919A (en) | Attention regulation based remote sensing image semantic segmentation method and system | |
CN117576402A (en) | Deep learning-based multi-scale aggregation transducer remote sensing image semantic segmentation method | |
CN115222947B (en) | Rock joint segmentation method and device based on global self-attention transformation network | |
CN101572693A (en) | Equipment and method for parallel mode matching | |
CN109670057A (en) | A kind of gradual end-to-end depth characteristic quantization system and method | |
CN112927810B (en) | Smart medical response method based on big data and smart medical cloud computing system | |
CN112734519B (en) | Commodity recommendation method based on convolution self-encoder network | |
CN114398980A (en) | Cross-modal Hash model training method, encoding method, device and electronic equipment | |
Wong et al. | Addressing Deep Learning Model Uncertainty in Long-Range Climate Forecasting with Late Fusion |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |