CN108631787A - Data-encoding scheme, device, computer equipment and storage medium - Google Patents

Data-encoding scheme, device, computer equipment and storage medium Download PDF

Info

Publication number
CN108631787A
CN108631787A CN201810439264.5A CN201810439264A CN108631787A CN 108631787 A CN108631787 A CN 108631787A CN 201810439264 A CN201810439264 A CN 201810439264A CN 108631787 A CN108631787 A CN 108631787A
Authority
CN
China
Prior art keywords
channel
vector
data
training
encoded
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810439264.5A
Other languages
Chinese (zh)
Other versions
CN108631787B (en
Inventor
唐溪柳
汪伟
姚伶伶
屈伟
潘颖吉
谭骜
王军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201810439264.5A priority Critical patent/CN108631787B/en
Publication of CN108631787A publication Critical patent/CN108631787A/en
Application granted granted Critical
Publication of CN108631787B publication Critical patent/CN108631787B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/3082Vector coding

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

This application involves a kind of data-encoding scheme, this method includes:Obtain data to be encoded;The corresponding hash function in each channel and channel vector matrix are obtained, the channel cryptographic Hash that the data to be encoded correspond to each channel is calculated according to the corresponding hash function in each channel;The channel subvector that the data to be encoded correspond to each channel is obtained in corresponding channel vector matrix according to the channel cryptographic Hash;Coding vector corresponding with the data to be encoded is obtained according to each channel subvector got.The data-encoding scheme not only reduces space hold, and reduces the collision rate between vector.In addition, it is also proposed that a kind of data coding device, computer equipment and storage medium.

Description

Data-encoding scheme, device, computer equipment and storage medium
Technical field
This application involves computer processing technical fields, are set more particularly to a kind of data-encoding scheme, device, computer Standby and storage medium.
Background technology
Data encoding refers to the representation for converting data to vector.With the development in deep learning field, using big Data establish model becomes theme to carry out prediction to unknown result.In order to according to given data come to unknown number It according to being predicted, needs given data being converted to the language that computer can identify, that is, needs given data using number The vector of change indicates.Traditional is converted to given data vector, and generally use word is embedded in the methods of (embedding).But It is had the following problems using embedding:If the number of vectors setting of embedding is larger, then memory headroom occupancy is more, If the number of vectors setting of embedding is smaller, although memory headroom occupies less, collision rate is high, i.e., information loss is more.
Invention content
Based on this, it is necessary in view of the above-mentioned problems, it is few to propose a kind of space hold, and the data encoding side that collision rate is low Method, device, computer equipment and storage medium.
A kind of data-encoding scheme, the method includes:
Obtain data to be encoded;
The corresponding hash function in each channel and channel vector matrix are obtained, according to the corresponding hash function meter in each channel Calculation obtains the channel cryptographic Hash that the data to be encoded correspond to each channel;
The data to be encoded are obtained in corresponding channel vector matrix correspond to each lead to according to the channel cryptographic Hash The channel subvector in road;
Coding vector corresponding with the data to be encoded is obtained according to each channel subvector got.
A kind of data coding device, described device include:
Data acquisition module, for obtaining data to be encoded;
Computing module, for obtaining the corresponding hash function in each channel and channel vector matrix, according to each channel pair The channel cryptographic Hash that the data to be encoded correspond to each channel is calculated in the hash function answered;
Subvector acquisition module, for waiting for described in being obtained in corresponding channel vector matrix according to the channel cryptographic Hash Coded data corresponds to the channel subvector in each channel;
Vector generation module, it is corresponding with the data to be encoded for being obtained according to each channel subvector got Coding vector.
A kind of computer equipment, including memory and processor, the memory are stored with computer program, the calculating When machine program is executed by the processor so that the processor executes following steps:
Obtain data to be encoded;
The corresponding hash function in each channel and channel vector matrix are obtained, according to the corresponding hash function meter in each channel Calculation obtains the channel cryptographic Hash that the data to be encoded correspond to each channel;
The data to be encoded are obtained in corresponding channel vector matrix correspond to each lead to according to the channel cryptographic Hash The channel subvector in road;
Coding vector corresponding with the data to be encoded is obtained according to each channel subvector got.
A kind of computer readable storage medium is stored with computer program, when the computer program is executed by processor, So that the processor executes following steps:
Obtain data to be encoded;
The corresponding hash function in each channel and channel vector matrix are obtained, according to the corresponding hash function meter in each channel Calculation obtains the channel cryptographic Hash that the data to be encoded correspond to each channel;
The data to be encoded are obtained in corresponding channel vector matrix correspond to each lead to according to the channel cryptographic Hash The channel subvector in road;
Coding vector corresponding with the data to be encoded is obtained according to each channel subvector got.
Above-mentioned data-encoding scheme, device, computer equipment and storage medium, by obtaining data to be encoded, using every The corresponding hash function in a channel calculates data to be encoded in the corresponding channel cryptographic Hash in each channel, then according to channel Hash Value obtains corresponding channel subvector in corresponding channel vector matrix, obtains and wait to compile according to each channel subvector later The corresponding coding vector of code data.Above-mentioned data-encoding scheme, due to by the channel subvector in each channel vector matrix into Row combination can be used for indicating more vectors, corresponding with data relative to traditional direct acquisition in a vector matrix The mode of coding vector, greatly reduces the occupancy in space, and reduces the collision rate between vector.
Description of the drawings
Fig. 1 is the applied environment figure of data-encoding scheme in one embodiment;
Fig. 2 is the flow chart of data-encoding scheme in one embodiment;
Fig. 3 is the schematic diagram that three-dimensional feature is compressed to two-dimensional space in one embodiment;
Fig. 4 is the contrast schematic diagram of unidirectional moment matrix and two vector matrixs in one embodiment;
Fig. 5 is to extract each channel subvector in one embodiment to obtain the schematic diagram of coding vector;
Fig. 6 is the flow chart that channel cryptographic Hash is calculated in one embodiment;
Fig. 7 is the flow chart of training channel vector matrix in one embodiment;
Fig. 8 is the schematic diagram for training multiple channel vector matrixs in one embodiment using deep learning;
Fig. 9 is the flow chart of training channel vector matrix in another embodiment;
Figure 10 is the schematic diagram of single channel and bilateral trace comparison in one embodiment;
Figure 11 is the flow chart of data-encoding scheme in another embodiment;
Figure 12 is the structure diagram of data coding device in one embodiment;
Figure 13 is the structure diagram of data coding device in another embodiment;
Figure 14 is the structure diagram of data coding device in another embodiment;
Figure 15 is the structure diagram of data coding device in further embodiment;
Figure 16 is the structure diagram of one embodiment Computer equipment.
Specific implementation mode
It is with reference to the accompanying drawings and embodiments, right in order to make the object, technical solution and advantage of the application be more clearly understood The application is further elaborated.It should be appreciated that specific embodiment described herein is only used to explain the application, and It is not used in restriction the application.
Fig. 1 is the applied environment figure of data-encoding scheme in one embodiment.Referring to Fig.1, the data-encoding scheme application In the system of data encoding.The system of the data encoding includes terminal 110 and server 120.Terminal 110 and server 120 are logical Network connection is crossed, terminal 110 can be specifically terminal console or mobile terminal, and mobile terminal can be specifically mobile phone, tablet electricity At least one of brain, laptop etc..Server 120 can use the either multiple server compositions of independent server Server cluster is realized.Data to be encoded are sent to server 120 by terminal 110, and server 120 obtains data to be encoded, The corresponding hash function in each channel and channel vector matrix are obtained, institute is calculated according to the corresponding hash function in each channel The channel cryptographic Hash that data to be encoded correspond to each channel is stated, according to the channel cryptographic Hash in corresponding channel vector matrix The channel subvector that the data to be encoded correspond to each channel is obtained, is obtained according to each channel subvector got and institute The corresponding coding vector of data to be encoded is stated, obtained coding vector is returned into terminal 110.
In another embodiment, above-mentioned data-encoding scheme may be directly applied to terminal 110.The acquisition of terminal 110 waits for Coded data, then obtains the corresponding hash function in each channel and channel vector matrix, and terminal 110 is corresponded to according to each channel Hash function the channel cryptographic Hash that the data to be encoded correspond to each channel is calculated, then according to the channel Hash Value obtains the channel subvector that the data to be encoded correspond to each channel in corresponding channel vector matrix, according to getting Each channel subvector obtain coding vector corresponding with the data to be encoded.
As shown in Fig. 2, in one embodiment, providing a kind of data-encoding scheme.Both server is can be applied to, It can be used for terminal, the present embodiment is to be applied to terminal illustration.The data-encoding scheme specifically comprises the following steps:
Step S202 obtains data to be encoded.
Wherein, data to be encoded refer to the data for needing to carry out vectorization.Data to be encoded can be text, number, figure Piece etc..In machine learning field, in order to handle data, need to convert data to the digitlization that computer can identify It indicates, that is, needs that data are encoded to obtain the representation of vectorization.
Traditional encodes data, the method that generally use word is embedded in (embedding).But due to Embedding matrixes generally take up one piece of continuous memory headroom, so needing to open up sufficiently large memory headroom in advance.Therefore it passes through It is commonly present problems with:If the number of vectors setting of embedding is larger, then memory headroom occupancy is more, if embedding Number of vectors setting it is smaller, although memory headroom occupies few, the rate that can lead to a conflict is high, i.e., information loss is more.To understand The certainly above problem, propose in the present embodiment it is a kind of can not only reduce space hold, but also the data encoding of collision rate can be reduced Method, the data-encoding scheme are not only suitable for sparse features, are also applied for dense characteristic.Wherein, sparse features refer to classification spy The value range of one kind in sign, characteristic value is bigger (for example, 10000 or more), but the frequency occurred is low, for example, extensively User id in announcement system, advertisement id etc. just belong to typical sparse features.Wherein, category feature refers to being made in machine learning The value of some feature is enumerable, if sex character includes two man, female values.Dense characteristic is relative to sparse For feature, the value range of characteristic value is smaller.
It refers to the method being mapped to features of classification in low-dimensional vector space that word, which is embedded in (Embedding) method,.It lifts a Example, if it is all words that are likely to occur to use one-hot encoding (one-hot coding), vectorial latitude to word Number, only one-component is 1 value in vector, other components are 0 value, for example the corresponding vector of following word is expressed as, Beijing: [1,0,0,0 ..., 0], Shanghai:[0,1,0,0 ..., 0], Shenzhen:[0,0,1,0,......,0].As it can be seen that using The vectorial latitude of one-hot encoding (one-hot coding) is very high, and can not calculate the similitude between two words.Phase Than one-hot encoding only by characteristic symbol, embedding imparts feature the information of deeper.It will be in vector Each element real number representation, can increase its indicate range.The dimension of original higher-dimension sparse features is compressed to low-dimensional Space, the Spatial Dimension are far below the number of feature value.As shown in figure 3, embedding methods are by three-dimensional feature w1, w2, w3 (that is, the expression of characteristic value in the vector space that one-hot encoding are obtained) is compressed in two-dimensional space, then upper one The corresponding embedding of feature in part institute illustrated example indicates as follows:Beijing:[0.122,0.401,0.312, 0.112 ..., -0.32], Shanghai:[0.101,0.362,0.290,0.124 ..., -0.41], Shenzhen:[0.421, 0.121,0.501,0.139,......,0.823].Similar to one-hot encoding, there are one corresponding for each word Vector expression, but each component of vector is real number, has latent layer meaning, can portray the correlation between word;Its dimension By artificially specifying, the vector dimension after typically well below one-hot encoding codings.But it is tieed up just because of embedding Degree is preset by people, so it is excessive or too small to there is a problem of that dimension is arranged.Embodiments herein is embedded in word (Embedding) what is carried out on the basis of method is further improved.
Step S204 obtains the corresponding hash function in each channel and channel vector matrix, corresponding according to each channel The channel cryptographic Hash that data to be encoded correspond to each channel is calculated in hash function.
Wherein, the effect of hash function is that the input of random length is transformed into same type by hashing algorithm to fix length The output of degree.Such as, it is assumed that input has character string " abc ", integer numerical value " 1 ", if using same hash function (hash letters Number) carry out mapping can indicate as follows:Hash (key=" abc ", seed=0)=1222323233, hash (key=" 1 ", Seed=0)=9384398434, carry out hash operations to be indifferent to the occurrence of acquisition that is, it is only necessary to ensure for difference Input obtain be the output of same type of regular length, and ensure that the corresponding output numerical value of different inputs is different, as above " 1222323233 " and " 9384398434 " in face." seed " among the above refers to hash seeds, is when carrying out hash operations One parameter can be arbitrary an integer.Different seeds represents different hash functions, and the same data do hash fortune It calculates, if the seed used is different, what is obtained is different result.For example, it is assumed that using different seeds to character string " abc " carries out hash operations, and corresponding result indicates as follows:Hash (key=" abc ", seed=0)=1222323233; Hash (key=" abc ", seed=1)=323762398;Hash (key=" abc ", seed=2)=7927986737.It can be with It is interpreted as, the corresponding hash operations of different seeds are different.
The matrix that channel vector matrix is made of multiple vectors is independent matrix trained in advance.Each channel A corresponding hash function and a channel vector matrix, wherein at least 2, channel.Each channel corresponds to a channel and breathes out Uncommon value.The same data are different using the cryptographic Hash that different hash functions obtains, and the hash function in different channels can be identical, It can also be different.Each channel corresponds to a channel vector matrix, the specification (i.e. size) of the channel vector matrix in different channels It may be the same or different.The corresponding channel cryptographic Hash of data to be encoded is calculated according to the hash function in each channel, Data to be encoded are mapped to obtain the channel cryptographic Hash in each channel using the corresponding hash function in each channel.
Step S206 obtains data to be encoded in corresponding channel vector matrix according to channel cryptographic Hash and corresponds to each lead to The channel subvector in road.
Wherein, channel cryptographic Hash is used to search channel corresponding with data to be encoded subvector in the vector matrix of channel. In one embodiment, channel cryptographic Hash is directly obtained into correspondence as the line number of channel vector matrix from the vector matrix of channel Channel subvector.In another embodiment, it is also necessary to channel cryptographic Hash is further processed to obtain corresponding line number, For example, modulus can be carried out to channel cryptographic Hash, the value that modulus is obtained obtains correspondence as line number from the vector matrix of channel Channel subvector.For example, it is assumed that obtained channel cryptographic Hash is 102336785, if modulus 1000, obtains Remainder is last 3 bit digital " 785 ", is obtained from corresponding channel vector matrix using the numerical value that the modulus obtains as line number The corresponding vector of 785 rows is used as channel subvector.
Step S208 obtains coding vector corresponding with data to be encoded according to each channel subvector got.
It wherein, can will be multiple by customized pooled function after getting the corresponding channel subvector in each channel Channel subvector merges into coding vector corresponding with data to be encoded.In one embodiment, pooled function is using simple Summation operation is directly spliced each channel subvector to obtain coding vector.Such as, it is assumed that 3 are all the logical of 5 dimensions Road subvector obtains the coding vector of one 15 dimension by splicing.In another embodiment, pooled function can also use base In the merging of neural network, that is, also need to that obtained channel subvector is further processed to obtain final coding vector.
Above-mentioned data-encoding scheme, by using multiple independent channel vector matrixs, from each independent channel vector A channel subvector is obtained in matrix, and coding vector corresponding with data to be encoded is obtained by the form of combination.This method In by by the vector in each channel vector matrix be combined can be used for indicate more vectors, so being conducive to save Space, and advantageously reduce collision rate.
For example, as shown in figure 4, in one embodiment, it is assumed that being N there are one line number, columns is the moment of a vector of M Battle array, i.e., include N number of M dimensional vectors in the vector matrix.If this vector matrix is split as what the M that 2 line numbers are N/2 was tieed up Matrix, then the vectorial number that the two vector matrixs can indicate is (N/2)2It is a, and the latitude of the vector indicated is 2M, that is, is existed In the case of occupied space is identical, the vectorial number that can be indicated using two vector matrixs is more, and the latitude of vector is 2M.To The latitude of amount is bigger, and corresponding collision rate is lower.Because the latitude of vector is longer, the number of features that can be indicated is more, It is not only advantageous to reduce collision rate, and is conducive to indicate deeper feature.Therefore when indicating the vector of identical quantity, Using above-mentioned data-encoding scheme, space can be not only saved, and advantageously reduce collision rate.
Above-mentioned data-encoding scheme is waited for by obtaining data to be encoded using the corresponding hash function calculating in each channel Then coded data is obtained according to channel cryptographic Hash in corresponding channel vector matrix in the corresponding channel cryptographic Hash in each channel Corresponding channel subvector is taken, coding vector corresponding with data to be encoded is obtained according to each channel subvector later.It is above-mentioned Data-encoding scheme can be used for indicating more to compile since the channel subvector in each channel vector matrix to be combined Code vector subtracts significantly relative to traditional direct mode for obtaining coding vector corresponding with data in a vector matrix Lack the occupancy in space, and reduces the collision rate between vector.
Include multiple channels, different channels point in above-mentioned data-encoding scheme as shown in figure 5, in one embodiment Hash operations are not carried out using different hash seeds and obtain the corresponding channel cryptographic Hash in each channel, then according to channel Hash Value obtains corresponding channel subvector from corresponding channel vector matrix, then obtains data to be encoded according to channel subvector Corresponding coding vector.In one embodiment, for a specific category feature F, it is assumed that total classification number is n, each class Another characteristic value (i.e. data to be encoded) is fi(i=0,1 ..., n-1).If total port number is m, the hash letters in each channel Number (also known as " bucket hash ") is expressed as h(j)(f), wherein j is channel number (j=0,1 ..., m-1), and f is specific special Value indicative.So, for each channel, characteristic value fiTo the available following formula table of mapping of channel cryptographic Hash (also known as " bucket id ") It reaches:
In above formulaAs characteristic value fiThe channel cryptographic Hash obtained on the j of channel.Further, if channel Hash The maximum value (bucket size) of value is p, thenMeetOn the other hand, to each The independent embedding matrixes E of Path Setup(j)
Then final coding vector can be expressed as following formula:
Wherein, the f in above formula indicates arbitrary pooled function.The most common summing function that is selected as (is directly spelled It connects).Fig. 5 is illustrated extracts corresponding channel subvector from each channel vector matrix, is then spliced to obtain final Coding vector.Random initializtion may be used in the initial value of the channel vector matrix (embedding matrixes) in above-mentioned all channels Strategy is declined by gradient and is updated together with specific model parameter.
As shown in fig. 6, in one embodiment, obtaining the corresponding hash function in each channel and channel vector matrix, root The channel cryptographic Hash that data to be encoded correspond to each channel is calculated according to the corresponding hash function in each channel, including:
Step S204A is calculated data to be encoded according to the corresponding hash function of current channel and corresponds to current channel Original cryptographic Hash.
Wherein, original cryptographic Hash refers to the cryptographic Hash being directly calculated using hash function.Each channel corresponds to one Hash function, current channel refer to current channel to be calculated.Number to be encoded is calculated using the corresponding hash function of current channel According to the original cryptographic Hash of corresponding current channel.
Step S204B obtains the row number of dimensions of the corresponding current channel vector matrix of current channel.
Wherein, the row number of dimensions of channel vector matrix refers to the line number of channel vector matrix.The corresponding channel in different channels The size of vector matrix may be the same or different.So the row number of dimensions of different channels vector matrix can be identical, also may be used With difference.Be conducive to carry out subsequent modulo operation by obtaining row number of dimensions.
Step S204C takes the original cryptographic Hash of current channel according to the row number of dimensions of current channel vector matrix Modular arithmetic obtains the channel cryptographic Hash that data to be encoded correspond to current channel.
Wherein, modulo operation refers to desired value divided by the remainder that mould (for example, mould=1000) obtains.According to current channel to The row number of dimensions of moment matrix determines corresponding mould.It in one embodiment, can be directly using row number of dimensions as in modulo operation Mould.Such as, it is assumed that row number of dimensions is 1000, is directly used as mould by 1000.In another embodiment, if row number of dimensions not It is 10 multiple, then row number of dimensions can be will be greater than and from 10 multiple of row number of dimensions recently as mould, for example, row dimension Number is 998, then being used as mould by 1000.Modulo operation is carried out to current channel vector matrix and obtains corresponding channel cryptographic Hash. Assuming that obtained original cryptographic Hash is a, the modulus taken is b, then it is channel Hash to carry out the remainder c that modulo operation obtains to a Value.Such as, it is assumed that the original cryptographic Hash being calculated is 12368259, if modulus is 1000, obtained remainder is 259.By determining corresponding mould according to row number of dimensions, modular arithmetic is then carried out, more accurate channel cryptographic Hash can be obtained.
In one embodiment, according to each channel subvector got obtain it is corresponding with data to be encoded encode to Amount, including:Channel corresponding with each channel subvector will be obtained to be spliced to obtain and number to be encoded according to channel sequence According to corresponding coding vector.
Wherein, after getting the corresponding channel subvector in each channel, spliced to obtain and be waited for according to the sequence in channel The corresponding coding vector of coded data.It is ranked up in advance for channel, is divided into first passage, second channel etc., convenient for according to logical The sequence in road is orderly spliced, because the mode of splicing is different, obtained coding vector is different, so being had Sequence is spliced.
As shown in fig. 7, in one embodiment, training obtains channel vector matrix in the following way:
Step S702 obtains training data and the corresponding label of training data.
Wherein, in order to be trained to model, first, obtaining training data and the corresponding label of training data, label is Refer to the mark to the corresponding result of training data, is the desired output of model training.
Step S704, using training data as the input for waiting for vector layer in training pattern, vector layer includes multiple channels, There are corresponding hash function and training channel vector matrix, each channel is calculated according to corresponding hash function in each channel The training channel cryptographic Hash that each channel is corresponded to training data, according to training channel cryptographic Hash in corresponding trained channel vector The channel training subvector that training data corresponds to each channel is obtained in matrix, vector layer is trained according to each channel got Subvector obtains trained coding vector corresponding with training data.
Wherein, vector layer is converted to the representation of vector, that is, is used to obtain and instruct for being encoded to training data Practice the corresponding trained coding vector of data.Include multiple channels in vector layer, each channel there are corresponding hash function and Training channel vector matrix, the training that training data corresponds to each channel is calculated according to corresponding hash function for each channel Then channel cryptographic Hash obtains training data from training channel vector matrix according to training channel cryptographic Hash and corresponds to each channel Channel train subvector, then vector layer according to each channel train subvector obtain the corresponding training of training data encode to Amount.Before model training, pre-set port number and corresponding hash function, then initialize each trained channel to Moment matrix.
Step S706, using training coding vector as the input for waiting for lower layer of training pattern, by the corresponding mark of training data Label are treated training pattern and are trained, obtain target training pattern, target training pattern as the desired output for waiting for training pattern Including object vector layer, object vector layer includes the corresponding destination channel vector matrix in each channel.
Wherein, further include other layers in addition to vector layer in training pattern, for example, further including convolution in one embodiment Layer, full articulamentum, specifically can be according to the self-defined setting of actual conditions.Wherein, vector layer is first layer, and training data is made For the input of vector layer in training pattern, the training coding vector of vector layer output is obtained, it then will training coding vector conduct Wait for next layer of input of vector layer in training pattern, later the input by next layer of output as lower layer, i.e., it will be previous Input of the output of layer as later layer, the output until obtaining last layer.Using the corresponding label of training data as waiting instructing The desired output for practicing model, treats the parameter in training pattern according to the gap between reality output and desired output and is adjusted Whole, until model training is completed to obtain target training pattern, which includes object vector layer, object vector layer packet Include the corresponding destination channel vector matrix in each channel.The obtained destination channel vector matrix be the obtained channel of training to Moment matrix.
In one embodiment, using training data as the input for waiting for vector layer in training pattern, training data is corresponded to Label as the desired output for waiting for training pattern, treat training pattern and be trained, obtain target training pattern, target training Model includes object vector layer, and object vector layer includes the corresponding destination channel vector matrix in each channel, including:By training number According to as the input for waiting for vector layer in training pattern, the reality output for waiting for training pattern is obtained;According to reality output and it is expected defeated Go out penalty values are calculated;The corresponding trained channel moment of a vector in each channel in vector layer is treated in training pattern according to penalty values The parameter in parameter and other layers in battle array is adjusted;Training data and the corresponding label of training data are updated, returning will instruction Practice data as the input for waiting for vector layer in training pattern, recycle according to this, until penalty values meet the preset condition of convergence, obtains It include the target training pattern of object vector layer.
Wherein, using training data as the input for waiting for vector layer in training pattern, the reality output for waiting for training pattern is obtained, Penalty values are calculated using loss function according to reality output and desired output, loss function can carry out according to actual needs Selection can also use cross entropy loss function, can also use log-likelihood for example, mean square deviation loss function may be used Loss function etc..Parameter in training channel vector matrix refers to the value of each element in trained channel vector matrix.Other layers Parameter refer to corresponding weight parameter and offset parameter.It is treated in training pattern in vector layer according to the penalty values being calculated The parameter of parameter and other layers in the corresponding trained access matrix in each channel is adjusted, and is then obtained next or next Training data and the corresponding label of training data are criticized, is returned using training data as the step for waiting for the input of vector layer in training pattern Suddenly, it trains successively, until penalty values meet the preset condition of convergence, the condition of convergence can be a loss range of setting, such as Fruit penalty values are in this loss range, it is believed that model trained completion obtains the target training pattern of object vector layer.It adjusts The mode of mould preparation shape parameter may be used gradient descent method and is adjusted one by one to the parameter in model from back to front.
Wait for that training pattern can wait for that training pattern can be textual classification model according to the self-defined setting of actual conditions, Can click Probabilistic Prediction Model, can also be conversion ratio prediction model, information recommendation model etc..Here instruction is not treated Practice model to be defined, every includes that the model of vector layer (embedding layers) is suitable for above-mentioned training method.It is above-mentioned logical The method of road vector matrix training, the part by the way that channel vector matrix to be used as to vector layer in model is trained, at this During a, it is only necessary to obtain the corresponding training data of training pattern and the corresponding label of training data can to each channel to Moment matrix is trained, simple and convenient.
As shown in figure 8, in one embodiment, the schematic diagram of multiple channel vector matrixs is trained using deep learning.Fig. 8 In illustrate model be neural network in the case of, using Direct/Reverse transmission method simultaneously to model parameter and each channel The schematic diagram that corresponding trained channel vector matrix is updated.Using the corresponding hash function in each channel to training data into Trained channel cryptographic Hash is calculated in row, is then obtained from corresponding training channel vector matrix according to training channel cryptographic Hash Subvector is trained in corresponding channel, then trains subvector to obtain training coding vector according to each channel, will training coding to The input as next layer is measured, the reality output of training pattern is obtained, model is reversely adjusted according to reality output and desired output In parameter and training channel vector matrix in parameter.
In one embodiment, above-mentioned data-encoding scheme further includes:Using the corresponding coding vector of data to be encoded as Next layer of input of vector layer in target training pattern;Obtain the corresponding with data to be encoded defeated of target training pattern output Go out result.
Wherein, the stage predicted in the specific target training pattern obtained using training, according to data to be encoded After obtaining corresponding coding vector, using coding vector as next layer of input of vector layer in target training pattern, then obtain The output result corresponding with data to be encoded for taking target training pattern to export.
In one embodiment, data to be encoded, which include the corresponding Information of target information, influences data, target training Model is Information prediction model;Information is influenced into data as the input of vector layer in Information prediction model, Vector layer predicts mould for determining coding vector corresponding with the data for influencing Information, using coding vector as Information Next layer of input of vector layer in type;Obtain the predictive information point corresponding with target information of Information prediction model output Hit probability.
Wherein, in the scene of Information prediction, data to be encoded are the corresponding Information shadow of target information Data are rung, it refers to influencing the data of Information probability that Information, which influences data,.By the corresponding Information shadow of target information Input of the data as vector layer in Information prediction model is rung, vector layer is corresponding with Information influence data for determining Coding vector finally get letter using coding vector as next layer of input of vector layer in Information prediction model The predictive information corresponding with target information that breath clicks prediction model output clicks probability.Wherein, target information can be to wait pushing away The advertisement recommended can also be news to be recommended, can also be article etc. to be recommended.
In the scene of a text classification, data to be encoded are text to be sorted, and target training pattern is text classification Model, using text to be sorted as the input of vector layer in textual classification model, vector layer is for determining that text to be sorted corresponds to Text code vector, then using text code vector as next layer of input of vector layer in textual classification model, then It obtains textual classification model and exports classification corresponding with text to be sorted.
In the scene of a game recommdation, data to be encoded are the historical game play behavioral data of user to be recommended, target Training pattern is game recommdation model, using the historical game play behavioral data of user to be recommended as vector layer in game recommdation model Input, vector layer is for determining corresponding with historical game play behavioral data game behavior coding vector, and then will play behavior Then next layer of input of the coding vector as vector layer in game recommdation model obtains the target of game recommdation model output Recommended games.
As shown in figure 9, in one embodiment, training obtains channel vector matrix in the following way:
Step S902 obtains training data and the corresponding standard code vector of training data.
Wherein, standard code vector refers to correct coding vector corresponding with training data.Training data refers to known The data of standard code vector.Each channel vector matrix is trained by using the data of known standard code vector, so as to The each channel vector matrix subsequently obtained according to training is compiled to obtain the standard corresponding to the data of unknown standard code vector Code vector.
Step S904, using training data as the input in each channel, there are corresponding hash function and instructions in each channel Practice channel vector matrix, each channel according to corresponding hash function be calculated training data correspond to each channel training it is logical Road cryptographic Hash obtains training data according to training channel cryptographic Hash in corresponding trained channel vector matrix and corresponds to each channel Channel train subvector.
Wherein, using training data as the input in each channel, there are corresponding hash function and training are logical in each channel Road vector matrix calculates the training channel cryptographic Hash that training data corresponds to each channel, so according to the hash function in each channel Corresponding channel is obtained in corresponding trained channel vector according to training channel cryptographic Hash afterwards and trains subvector.
Step S906 trains subvector to obtain trained coding vector corresponding with training data according to each channel.
Wherein, after getting each channel training subvector, each channel training subvector is carried out using pooled function Splicing obtains training coding vector.Training coding vector refers to the coding vector actually obtained.
Coding loss value is calculated according to training coding vector and standard code vector in step S908.
Wherein, preset loss function is obtained, for example, mean square deviation loss function, training coding is calculated using loss function Coding loss value between vector and standard code vector, coding loss value for weigh trained coding vector and standard code to Gap between amount.
Step S910 is adjusted the parameter in training channel vector matrix according to coding loss value.
Wherein, after obtaining coding loss value, according to coding loss value in the training channel vector matrix in each channel Parameter is adjusted, so that coding loss value minimizes.
Step S912 updates training data and training data corresponding standard code vector, return using training data as The input in each channel, recycles according to this, until coding loss value meets the preset condition of convergence, obtains destination channel moment of a vector Battle array.
Wherein, it refers to being concentrated under acquisition from training data to update training data and the corresponding standard code vector of training data One training data and the corresponding standard code vector of next training data, continue to use new training data and training data Corresponding standard code vector is trained training pattern, recycles according to this, until the coding loss value being calculated meets in advance If the condition of convergence, training finishes and obtaining destination channel vector matrix.
As shown in Figure 10, it is the schematic diagram of one group of the simulation experiment result, illustrates under same memory occupancy, single channel Reducing powers of the hash and two channel hash to feature raw information.In this experiment, the classification of different scales is generated at random Feature (abscissa in corresponding diagram indicates that number of features is different) corresponding standard code is vectorial (ground truth).For Single channel hash, it is 50 that bucket size (i.e. space hold) are arranged here.For two channels, the bucket in each channel Size is 25, i.e. single channel and the EMS memory occupation in two channels is consistent.In addition, the pooled function in corresponding two channels uses simply Summation, i.e., directly spliced, to ensure that pooled function does not include the fairness that additional variable influences comparison.And single-pass Corresponding channel vector matrix is all instructed using same mean square deviation loss function when training in road and two channels Practice.MSE (mean square deviation) after convergence indicates that MSE is lower in figure below with ordinate, and the information for representing loss is fewer.It can see It arrives, under different characteristic value number, the error of the coding vector in two channels is respectively less than the error of single pass coding vector, this table Information loss smaller caused by bright two channel, can preferably expression characteristic.That is, the coding method using multichannel can be with It is effectively reduced collision rate, reduces the loss of information, fault-tolerance is more preferable.
In one embodiment, it stores and transmits, the channel vector matrix of multichannel can be used for the ease of data The mode of distributed storage is stored, and in order to improve speed, the mode that parallel computation may be used in multichannel is calculated.
As shown in figure 11, in one embodiment it is proposed that a kind of data-encoding scheme, includes the following steps:
Step S1101 obtains training data and the corresponding label of training data.
Step S1102, using training data as the input for waiting for vector layer in training pattern, vector layer includes multiple logical Road, each channel is there are corresponding hash function and training channel vector matrix, and each channel is according to corresponding hash function meter Calculation obtains the training channel cryptographic Hash that training data corresponds to each channel, according to training channel cryptographic Hash in corresponding trained channel The channel training subvector that training data corresponds to each channel is obtained in vector matrix, vector layer is according to each channel got Training subvector obtains trained coding vector corresponding with training data.
Step S1103, using training coding vector as the input for waiting for lower layer of training pattern, by the corresponding mark of training data Label are treated training pattern and are trained, obtain target training pattern, target training pattern as the desired output for waiting for training pattern Including object vector layer, object vector layer includes the corresponding destination channel vector matrix in each channel.
Step S1104 obtains data to be encoded.
Step S1105 is calculated data to be encoded according to the corresponding hash function of current channel and corresponds to current channel Original cryptographic Hash.
Step S1106 obtains the row number of dimensions of the corresponding current channel vector matrix of current channel.
Step S1107 takes the original cryptographic Hash of current channel according to the row number of dimensions of current channel vector matrix Modular arithmetic obtains the channel cryptographic Hash that data to be encoded correspond to current channel.
Step S1108, obtained in corresponding channel vector matrix according to channel cryptographic Hash data to be encoded correspond to it is each The channel subvector in channel.
Step S1109 will obtain channel corresponding with each channel subvector and be spliced to obtain according to channel sequence Coding vector corresponding with data to be encoded.
As shown in figure 12, in one embodiment it is proposed that a kind of data coding device, the device include:
Data acquisition module 1202, for obtaining data to be encoded;
Computing module 1204, for obtaining the corresponding hash function in each channel and channel vector matrix, according to each logical The channel cryptographic Hash that the data to be encoded correspond to each channel is calculated in the corresponding hash function in road;
Subvector acquisition module 1206, for obtaining institute in corresponding channel vector matrix according to the channel cryptographic Hash State the channel subvector that data to be encoded correspond to each channel;
Vector generation module 1208, for being obtained and the data to be encoded pair according to each channel subvector got The coding vector answered.
In one embodiment, the computing module 1204 is additionally operable to be calculated according to the corresponding hash function of current channel The original cryptographic Hash of current channel is corresponded to the data to be encoded, obtains the corresponding current channel vector matrix of current channel Row number of dimensions carries out modulo operation according to the row number of dimensions of the current channel vector matrix to the original cryptographic Hash of current channel Obtain the channel cryptographic Hash that the data to be encoded correspond to current channel.
In one embodiment, the vector generation module 1208 is additionally operable to the channel corresponding with each channel that will be obtained Subvector is spliced to obtain coding vector corresponding with the data to be encoded according to channel sequence.
As shown in figure 13, in one embodiment, above-mentioned data coding device further includes:
First passage vector matrix training module 1201, for obtaining training data and the corresponding mark of the training data Label;Using the training data as the input for waiting for vector layer in training pattern, the vector layer includes multiple channels, Ge Getong There are corresponding hash function and training channel vector matrix, each channels to be calculated according to corresponding hash function described in road Training data corresponds to the training channel cryptographic Hash in each channel, according to the trained channel cryptographic Hash corresponding trained channel to The channel training subvector that the training data corresponds to each channel is obtained in moment matrix, the vector layer is each according to what is got A channel training subvector obtains trained coding vector corresponding with the training data;Using the trained coding vector as institute The input for waiting for lower layer of training pattern is stated, the corresponding label of the training data is waited for that the expectation of training pattern is defeated as described in Going out, wait for that training pattern is trained to described, obtains target training pattern, the target training pattern includes object vector layer, The object vector layer includes the corresponding destination channel vector matrix in each channel.
In one embodiment, the first passage vector matrix training module is additionally operable to using the training data as waiting for The input of vector layer in training pattern waits for the reality output of training pattern described in acquisition;According to the reality output and the phase Hope output that penalty values are calculated;According to the penalty values the corresponding instruction in each channel in vector layer is waited in training pattern to described The parameter practiced in parameter and other layers in the vector matrix of channel is adjusted;Update the training data and the training data Corresponding label is returned using the training data as the input for waiting for vector layer in training pattern, is recycled according to this, until the damage Mistake value meets the preset condition of convergence, obtain include the object vector layer target training pattern.
As shown in figure 14, in one embodiment, above-mentioned data coding device further includes:
Output module 1210, for using the corresponding coding vector of the data to be encoded as in the target training pattern Next layer of input of vector layer obtains output corresponding with the data to be encoded knot of the target training pattern output Fruit.
In one embodiment, the data to be encoded, which include the corresponding Information of target information, influences data, described Target training pattern is Information prediction model;Above-mentioned data coding device further includes:Probabilistic forecasting module is clicked, being used for will Described information, which is clicked, influences the input that data click vector layer in prediction model as described information, and the vector layer is for determining Coding vector corresponding with described information click influence data, is clicked the coding vector as described information in prediction model Next layer of input of vector layer;Obtain prediction corresponding with the target information letter that described information clicks prediction model output Breath clicks probability.
As shown in figure 15, in one embodiment, above-mentioned data coding device further includes:
Second channel vector matrix training module 1212, for obtaining training data and the corresponding standard of the training data Coding vector;Using the training data as the input in each channel, there are corresponding hash function and training are logical in each channel Road vector matrix, each channel according to corresponding hash function be calculated the training data correspond to each channel training it is logical Road cryptographic Hash obtains the training data in corresponding trained channel vector matrix according to the trained channel cryptographic Hash and corresponds to Train subvector in the channel in each channel;It trains subvector to obtain training corresponding with the training data according to each channel to compile Code vector;Coding loss value is calculated according to the trained coding vector and the standard code vector;According to the coding Penalty values are adjusted the parameter in the trained channel vector matrix;Update the training data and the training data pair The standard code vector answered, returns using the training data as the input in each channel, recycles according to this, until the coding damages Mistake value meets the preset condition of convergence, obtains destination channel vector matrix.
Figure 16 shows the internal structure chart of one embodiment Computer equipment.The computer equipment can be specifically eventually End, can also be server.As shown in figure 16, which includes processor, the memory connected by system bus And network interface.Wherein, memory includes non-volatile memory medium and built-in storage.The non-volatile of the computer equipment is deposited Storage media is stored with operating system, can also be stored with computer program, when which is executed by processor, may make place It manages device and realizes data-encoding scheme.Also computer program can be stored in the built-in storage, which is held by processor When row, processor may make to execute data-encoding scheme.It will be understood by those skilled in the art that structure shown in Figure 16, only Only be with the block diagram of the relevant part-structure of application scheme, do not constitute the computer being applied thereon to application scheme The restriction of equipment, specific computer equipment may include than more or fewer components as shown in the figure, or the certain portions of combination Part, or arranged with different components.
In one embodiment, data-encoding scheme provided by the present application can be implemented as a kind of shape of computer program Formula, computer program can be run on computer equipment as shown in figure 16.Composition can be stored in the memory of computer equipment Each program module of the data-encoding scheme device, for example, the data acquisition module 1202 of Figure 12, computing module 1204, son Vectorial acquisition module 1206 and vector generation module 1208.The computer program that each program module is constituted makes processor execute Step in the data-encoding scheme device of each embodiment of the application described in this specification.For example, being counted shown in Figure 16 Data to be encoded can be obtained by the acquisition module 1202 of data-encoding scheme device as shown in figure 12 by calculating machine equipment;Pass through Computing module 1204 obtains the corresponding hash function in each channel and channel vector matrix, according to the corresponding Hash letter in each channel The channel cryptographic Hash that the data to be encoded correspond to each channel is calculated in number;By subvector acquisition module 1206 according to institute It states channel cryptographic Hash and obtains the channel subvector that the data to be encoded correspond to each channel in corresponding channel vector matrix; Coding corresponding with the data to be encoded is obtained according to each channel subvector got by vector generation module 1208 Vector.
In one embodiment it is proposed that a kind of computer equipment, including memory and processor, the memory storage There is computer program, when the computer program is executed by the processor so that the processor executes following steps:It obtains Data to be encoded;The corresponding hash function in each channel and channel vector matrix are obtained, according to the corresponding Hash letter in each channel The channel cryptographic Hash that the data to be encoded correspond to each channel is calculated in number;According to the channel cryptographic Hash corresponding logical The channel subvector that the data to be encoded correspond to each channel is obtained in road vector matrix;According to each channel got Vector obtains coding vector corresponding with the data to be encoded.
In one embodiment, described to obtain the corresponding hash function in each channel and channel vector matrix, according to each The channel cryptographic Hash that the data to be encoded correspond to each channel is calculated in the corresponding hash function in channel, including:According to working as The original cryptographic Hash that the data to be encoded correspond to current channel is calculated in the corresponding hash function of prepass;It obtains current logical The row number of dimensions of the corresponding current channel vector matrix in road;According to the row number of dimensions of the current channel vector matrix to currently leading to The original cryptographic Hash in road carries out modulo operation and obtains the channel cryptographic Hash that the data to be encoded correspond to current channel.
In one embodiment, each channel subvector that the basis is got obtains volume corresponding with data to be encoded Code vector, including:To obtain channel corresponding with each channel subvector according to channel sequence spliced to obtain with it is described The corresponding coding vector of data to be encoded.
In one embodiment, the computer program also makes the processor execute following steps:Obtain training number According to label corresponding with the training data;Using the training data as the input for waiting for vector layer in training pattern, it is described to Amount layer includes multiple channels, each channel there are corresponding hash function and training channel vector matrix, each channel according to The training channel cryptographic Hash that the training data corresponds to each channel is calculated in corresponding hash function, logical according to the training Road cryptographic Hash obtained in corresponding trained channel vector matrix the training data correspond to each channel channel training son to Amount, the vector layer according to each channel for getting train subvector obtain training corresponding with the training data encode to Amount;The input that the trained coding vector is waited for lower layer of training pattern as described in, by the corresponding label of the training data As the desired output for waiting for training pattern, waits for that training pattern is trained to described, obtain target training pattern, the mesh It includes object vector layer to mark training pattern, and the object vector layer includes the corresponding destination channel vector matrix in each channel.
In one embodiment, described using the training data as the input for waiting for vector layer in training pattern, it will be described The corresponding label of training data waits for the desired output of training pattern as described in, waits for that training pattern is trained to described, obtains Target training pattern, the target training pattern include object vector layer, and the object vector layer includes that each channel is corresponding Destination channel vector matrix, including:Using the training data as the input for waiting for vector layer in training pattern, wait instructing described in acquisition Practice the reality output of model;Penalty values are calculated according to the reality output and the desired output;According to the penalty values To in the parameter and other layers waited in training pattern in vector layer in the corresponding trained channel vector matrix in each channel Parameter is adjusted;Update the training data and the corresponding label of the training data, return using the training data as The input for waiting for vector layer in training pattern, recycles according to this, until the penalty values meet the preset condition of convergence, is included The target training pattern of the object vector layer.
In one embodiment, the computer program also makes the processor execute following steps:It waits compiling by described Next layer of input of the corresponding coding vector of code data as vector layer in the target training pattern;
Obtain the output result corresponding with the data to be encoded of the target training pattern output.
In one embodiment, the data to be encoded, which include the corresponding Information of target information, influences data, described Target training pattern is Information prediction model;
The computer program also makes the processor execute following steps:Using described information click influence data as Described information clicks the input of vector layer in prediction model, and the vector layer influences data pair for determining to click with described information The coding vector answered clicks next layer of input of vector layer in prediction model using the coding vector as described information;It obtains The predictive information corresponding with the target information that described information clicks prediction model output is taken to click probability.
In one embodiment, the computer program also makes the processor execute following steps:Obtain training number According to standard code corresponding with training data vector;Using the training data as the input in each channel, each channel There are corresponding hash functions and training channel vector matrix, and the instruction is calculated according to corresponding hash function in each channel Practice the training channel cryptographic Hash that data correspond to each channel, according to the trained channel cryptographic Hash in corresponding trained channel vector The channel training subvector that the training data corresponds to each channel is obtained in matrix;Subvector is trained to obtain according to each channel Trained coding vector corresponding with the training data;It is calculated according to the trained coding vector and the standard code vector To coding loss value;The parameter in the trained channel vector matrix is adjusted according to the coding loss value;Update institute Training data and the corresponding standard code vector of the training data are stated, is returned using the training data as the defeated of each channel Enter, recycle according to this, until the coding loss value meets the preset condition of convergence, obtains destination channel vector matrix.
In one embodiment it is proposed that a kind of computer readable storage medium, is stored with computer program, the calculating When machine program is executed by processor so that the processor executes following steps:Obtain data to be encoded;Obtain each channel pair The data to be encoded pair are calculated according to the corresponding hash function in each channel in the hash function and channel vector matrix answered Answer the channel cryptographic Hash in each channel;It is obtained in corresponding channel vector matrix according to the channel cryptographic Hash described to be encoded Data correspond to the channel subvector in each channel;It is obtained and the data to be encoded pair according to each channel subvector got The coding vector answered.
In one embodiment, described to obtain the corresponding hash function in each channel and channel vector matrix, according to each The channel cryptographic Hash that the data to be encoded correspond to each channel is calculated in the corresponding hash function in channel, including:According to working as The original cryptographic Hash that the data to be encoded correspond to current channel is calculated in the corresponding hash function of prepass;It obtains current logical The row number of dimensions of the corresponding current channel vector matrix in road;According to the row number of dimensions of the current channel vector matrix to currently leading to The original cryptographic Hash in road carries out modulo operation and obtains the channel cryptographic Hash that the data to be encoded correspond to current channel.
In one embodiment, each channel subvector that the basis is got obtains volume corresponding with data to be encoded Code vector, including:To obtain channel corresponding with each channel subvector according to channel sequence spliced to obtain with it is described The corresponding coding vector of data to be encoded.
In one embodiment, the computer program also makes the processor execute following steps:Obtain training number According to label corresponding with the training data;Using the training data as the input for waiting for vector layer in training pattern, it is described to Amount layer includes multiple channels, each channel there are corresponding hash function and training channel vector matrix, each channel according to The training channel cryptographic Hash that the training data corresponds to each channel is calculated in corresponding hash function, logical according to the training Road cryptographic Hash obtained in corresponding trained channel vector matrix the training data correspond to each channel channel training son to Amount, the vector layer according to each channel for getting train subvector obtain training corresponding with the training data encode to Amount;The input that the trained coding vector is waited for lower layer of training pattern as described in, by the corresponding label of the training data As the desired output for waiting for training pattern, waits for that training pattern is trained to described, obtain target training pattern, the mesh It includes object vector layer to mark training pattern, and the object vector layer includes the corresponding destination channel vector matrix in each channel.
In one embodiment, described using the training data as the input for waiting for vector layer in training pattern, it will be described The corresponding label of training data waits for the desired output of training pattern as described in, waits for that training pattern is trained to described, obtains Target training pattern, the target training pattern include object vector layer, and the object vector layer includes that each channel is corresponding Destination channel vector matrix, including:Using the training data as the input for waiting for vector layer in training pattern, wait instructing described in acquisition Practice the reality output of model;Penalty values are calculated according to the reality output and the desired output;According to the penalty values To in the parameter and other layers waited in training pattern in vector layer in the corresponding trained channel vector matrix in each channel Parameter is adjusted;Update the training data and the corresponding label of the training data, return using the training data as The input for waiting for vector layer in training pattern, recycles according to this, until the penalty values meet the preset condition of convergence, is included The target training pattern of the object vector layer.
In one embodiment, the computer program also makes the processor execute following steps:It waits compiling by described Next layer of input of the corresponding coding vector of code data as vector layer in the target training pattern;Obtain the target instruction Practice the output result corresponding with the data to be encoded of model output.
In one embodiment, the data to be encoded, which include the corresponding Information of target information, influences data, described Target training pattern is Information prediction model;The computer program also makes the processor execute following steps:It will Described information, which is clicked, influences the input that data click vector layer in prediction model as described information, and the vector layer is for determining Coding vector corresponding with described information click influence data, is clicked the coding vector as described information in prediction model Next layer of input of vector layer;Obtain prediction corresponding with the target information letter that described information clicks prediction model output Breath clicks probability.
In one embodiment, the computer program also makes the processor execute following steps:Obtain training number According to standard code corresponding with training data vector;Using the training data as the input in each channel, each channel There are corresponding hash functions and training channel vector matrix, and the instruction is calculated according to corresponding hash function in each channel Practice the training channel cryptographic Hash that data correspond to each channel, according to the trained channel cryptographic Hash in corresponding trained channel vector The channel training subvector that the training data corresponds to each channel is obtained in matrix;Subvector is trained to obtain according to each channel Trained coding vector corresponding with the training data;It is calculated according to the trained coding vector and the standard code vector To coding loss value;The parameter in the trained channel vector matrix is adjusted according to the coding loss value;Update institute Training data and the corresponding standard code vector of the training data are stated, is returned using the training data as the defeated of each channel Enter, recycle according to this, until the coding loss value meets the preset condition of convergence, obtains destination channel vector matrix.
One of ordinary skill in the art will appreciate that realizing all or part of flow in above-described embodiment method, being can be with Relevant hardware is instructed to complete by computer program, the program can be stored in a non-volatile computer and can be read In storage medium, the program is when being executed, it may include such as the flow of the embodiment of above-mentioned each method.Wherein, provided herein Each embodiment used in any reference to memory, storage, database or other media, may each comprise non-volatile And/or volatile memory.Nonvolatile memory may include that read-only memory (ROM), programming ROM (PROM), electricity can be compiled Journey ROM (EPROM), electrically erasable ROM (EEPROM) or flash memory.Volatile memory may include random access memory (RAM) or external cache.By way of illustration and not limitation, RAM is available in many forms, such as static state RAM (SRAM), dynamic ram (DRAM), synchronous dram (SDRAM), double data rate sdram (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronization link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) directly RAM (RDRAM), straight Connect memory bus dynamic ram (DRDRAM) and memory bus dynamic ram (RDRAM) etc..
Each technical characteristic of above example can be combined arbitrarily, to keep description succinct, not to above-described embodiment In each technical characteristic it is all possible combination be all described, as long as however, the combination of these technical characteristics be not present lance Shield is all considered to be the range of this specification record.
The several embodiments of the application above described embodiment only expresses, the description thereof is more specific and detailed, but simultaneously Cannot the limitation to the application the scope of the claims therefore be interpreted as.It should be pointed out that for those of ordinary skill in the art For, under the premise of not departing from the application design, various modifications and improvements can be made, these belong to the guarantor of the application Protect range.Therefore, the protection domain of the application patent should be determined by the appended claims.

Claims (15)

1. a kind of data-encoding scheme, the method includes:
Obtain data to be encoded;
The corresponding hash function in each channel and channel vector matrix are obtained, is calculated according to the corresponding hash function in each channel The channel cryptographic Hash in each channel is corresponded to the data to be encoded;
The data to be encoded are obtained in corresponding channel vector matrix correspond to each channel according to the channel cryptographic Hash Channel subvector;
Coding vector corresponding with the data to be encoded is obtained according to each channel subvector got.
2. according to the method described in claim 1, it is characterized in that, described obtain the corresponding hash function in each channel and channel The channel Kazakhstan that the data to be encoded correspond to each channel is calculated according to the corresponding hash function in each channel in vector matrix Uncommon value, including:
The original cryptographic Hash that the data to be encoded correspond to current channel is calculated according to the corresponding hash function of current channel;
Obtain the row number of dimensions of the corresponding current channel vector matrix of current channel;
Modulo operation is carried out according to the row number of dimensions of the current channel vector matrix to the original cryptographic Hash of current channel to obtain The data to be encoded correspond to the channel cryptographic Hash of current channel.
3. according to the method described in claim 1, it is characterized in that, each channel subvector that the basis is got obtain with The corresponding coding vector of data to be encoded, including:
Channel corresponding with each channel subvector will be obtained to be spliced to obtain and the number to be encoded according to channel sequence According to corresponding coding vector.
4. according to the method described in claim 1, it is characterized in that, the channel vector matrix is trained in the following way It arrives:
Obtain training data and the corresponding label of the training data;
Using the training data as the input for waiting for vector layer in training pattern, the vector layer includes multiple channels, each There are corresponding hash function and training channel vector matrix, institute is calculated according to corresponding hash function in each channel in channel The training channel cryptographic Hash that training data corresponds to each channel is stated, according to the trained channel cryptographic Hash in corresponding trained channel Obtain the channel training subvector that the training data corresponds to each channel in vector matrix, the vector layer is according to getting Each channel training subvector obtains trained coding vector corresponding with the training data;
The input that the trained coding vector is waited for lower layer of training pattern as described in, by the corresponding label of the training data As the desired output for waiting for training pattern, waits for that training pattern is trained to described, obtain target training pattern, the mesh It includes object vector layer to mark training pattern, and the object vector layer includes the corresponding destination channel vector matrix in each channel.
5. according to the method described in claim 4, it is characterized in that, it is described using the training data as wait in training pattern to The corresponding label of the training data, the desired output of training pattern is waited for as described in, waits training to described by the input for measuring layer Model is trained, and obtains target training pattern, and the target training pattern includes object vector layer, the object vector layer packet The corresponding destination channel vector matrix in each channel is included, including:
Using the training data as the input for waiting for vector layer in training pattern, the reality output of training pattern is waited for described in acquisition;
Penalty values are calculated according to the reality output and the desired output;
It is waited in training pattern in vector layer in the corresponding trained channel vector matrix in each channel described according to the penalty values Parameter and other layers in parameter be adjusted;
It updates the training data and the corresponding label of the training data, returns using the training data as waiting for training pattern The input of middle vector layer, recycles according to this, until the penalty values meet the preset condition of convergence, obtain include the target to Measure the target training pattern of layer.
6. according to the method described in claim 4, it is characterized in that, the method further includes:
Using the corresponding coding vector of the data to be encoded as next layer of input of vector layer in the target training pattern;
Obtain the output result corresponding with the data to be encoded of the target training pattern output.
7. according to the method described in claim 4, it is characterized in that, the data to be encoded include the corresponding information of target information Clicking influences data, and the target training pattern is Information prediction model;
Described information, which is clicked, influences data as the input of vector layer in described information click prediction model, and the vector layer is used In determining coding vector corresponding with described information click influence data, the coding vector is clicked as described information and is predicted Next layer of input of vector layer in model;
Obtain predictive information corresponding with the target information click probability that described information clicks prediction model output.
8. according to the method described in claim 1, it is characterized in that, the channel vector matrix is trained in the following way It arrives:
Obtain training data and the corresponding standard code vector of the training data;
Using the training data as the input in each channel, there are corresponding hash functions and training channel vector in each channel The training channel Hash that the training data corresponds to each channel is calculated according to corresponding hash function for matrix, each channel Value obtains the training data in corresponding trained channel vector matrix according to the trained channel cryptographic Hash and corresponds to each lead to Train subvector in the channel in road;
Subvector is trained to obtain trained coding vector corresponding with the training data according to each channel;
Coding loss value is calculated according to the trained coding vector and the standard code vector;
The parameter in the trained channel vector matrix is adjusted according to the coding loss value;
The training data and the corresponding standard code vector of the training data are updated, is returned using the training data as each The input in a channel, recycles according to this, until the coding loss value meets the preset condition of convergence, obtains destination channel moment of a vector Battle array.
9. a kind of data coding device, described device include:
Data acquisition module, for obtaining data to be encoded;
Computing module, it is corresponding according to each channel for obtaining the corresponding hash function in each channel and channel vector matrix The channel cryptographic Hash that the data to be encoded correspond to each channel is calculated in hash function;
Subvector acquisition module, it is described to be encoded for being obtained in corresponding channel vector matrix according to the channel cryptographic Hash Data correspond to the channel subvector in each channel;
Vector generation module, for obtaining coding corresponding with the data to be encoded according to each channel subvector got Vector.
10. device according to claim 9, which is characterized in that the computing module is additionally operable to be corresponded to according to current channel Hash function be calculated the original cryptographic Hash that the data to be encoded correspond to current channel, obtain that current channel is corresponding to work as The row number of dimensions of prepass vector matrix, according to the row number of dimensions of the current channel vector matrix to the original Kazakhstan of current channel Uncommon value carries out modulo operation and obtains the channel cryptographic Hash that the data to be encoded correspond to current channel.
11. device according to claim 9, which is characterized in that the vector generation module be additionally operable to will to obtain with it is each The corresponding channel subvector in a channel is spliced to obtain coding vector corresponding with the data to be encoded according to channel sequence.
12. device according to claim 9, which is characterized in that described device further includes:
First passage vector matrix training module, for obtaining training data and the corresponding label of the training data;It will be described For training data as the input for waiting for vector layer in training pattern, the vector layer includes multiple channels, each channel presence pair The hash function and training channel vector matrix answered, the training data is calculated according to corresponding hash function in each channel The training channel cryptographic Hash in corresponding each channel, according to the trained channel cryptographic Hash in corresponding trained channel vector matrix The channel training subvector that the training data corresponds to each channel is obtained, the vector layer is instructed according to each channel got Practice subvector and obtains trained coding vector corresponding with the training data;The trained coding vector is waited training as described in The corresponding label of the training data is waited for the desired output of training pattern, to described by the input that lower layer of model as described in Wait for that training pattern is trained, obtain target training pattern, the target training pattern includes object vector layer, the target to It includes the corresponding destination channel vector matrix in each channel to measure layer.
13. device according to claim 12, which is characterized in that the first passage vector matrix training module is additionally operable to Using the training data as the input for waiting for vector layer in training pattern, the reality output of training pattern is waited for described in acquisition;According to Penalty values are calculated in the reality output and the desired output;According to the penalty values vector in training pattern is waited for described Parameter in layer in the corresponding trained channel vector matrix in each channel and the parameter in other layers are adjusted;Update the instruction Practice data and the corresponding label of the training data, returns using the training data as waiting for the defeated of vector layer in training pattern Enter, recycle according to this, until the penalty values meet the preset condition of convergence, obtain include the object vector layer target instruction Practice model.
14. a kind of computer readable storage medium is stored with computer program, when the computer program is executed by processor, So that the processor is executed such as the step of any one of claim 1 to 8 the method.
15. a kind of computer equipment, including memory and processor, the memory is stored with computer program, the calculating When machine program is executed by the processor so that the processor executes the step such as any one of claim 1 to 8 the method Suddenly.
CN201810439264.5A 2018-05-09 2018-05-09 Data encoding method, data encoding device, computer equipment and storage medium Active CN108631787B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810439264.5A CN108631787B (en) 2018-05-09 2018-05-09 Data encoding method, data encoding device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810439264.5A CN108631787B (en) 2018-05-09 2018-05-09 Data encoding method, data encoding device, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN108631787A true CN108631787A (en) 2018-10-09
CN108631787B CN108631787B (en) 2020-04-03

Family

ID=63692246

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810439264.5A Active CN108631787B (en) 2018-05-09 2018-05-09 Data encoding method, data encoding device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN108631787B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200104700A1 (en) * 2018-09-28 2020-04-02 Wipro Limited Method and system for improving performance of an artificial neural network
CN111612079A (en) * 2020-05-22 2020-09-01 深圳前海微众银行股份有限公司 Data right confirming method, equipment and readable storage medium
CN111666442A (en) * 2020-06-02 2020-09-15 腾讯科技(深圳)有限公司 Image retrieval method and device and computer equipment
CN113792816A (en) * 2021-09-27 2021-12-14 重庆紫光华山智安科技有限公司 Data encoding method, data encoding device, computer equipment and storage medium
CN116192402A (en) * 2023-01-18 2023-05-30 南阳理工学院 Communication method with information coding

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120263048A1 (en) * 2011-04-14 2012-10-18 Cisco Technology, Inc. Methods for Even Hash Distribution for Port Channel with a Large Number of Ports
CN102915740A (en) * 2012-10-24 2013-02-06 兰州理工大学 Phonetic empathy Hash content authentication method capable of implementing tamper localization
CN107220614A (en) * 2017-05-24 2017-09-29 北京小米移动软件有限公司 Image-recognizing method, device and computer-readable recording medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120263048A1 (en) * 2011-04-14 2012-10-18 Cisco Technology, Inc. Methods for Even Hash Distribution for Port Channel with a Large Number of Ports
CN102915740A (en) * 2012-10-24 2013-02-06 兰州理工大学 Phonetic empathy Hash content authentication method capable of implementing tamper localization
CN107220614A (en) * 2017-05-24 2017-09-29 北京小米移动软件有限公司 Image-recognizing method, device and computer-readable recording medium

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200104700A1 (en) * 2018-09-28 2020-04-02 Wipro Limited Method and system for improving performance of an artificial neural network
US11544551B2 (en) * 2018-09-28 2023-01-03 Wipro Limited Method and system for improving performance of an artificial neural network
CN111612079A (en) * 2020-05-22 2020-09-01 深圳前海微众银行股份有限公司 Data right confirming method, equipment and readable storage medium
CN111612079B (en) * 2020-05-22 2021-07-20 深圳前海微众银行股份有限公司 Data right confirming method, equipment and readable storage medium
CN111666442A (en) * 2020-06-02 2020-09-15 腾讯科技(深圳)有限公司 Image retrieval method and device and computer equipment
CN111666442B (en) * 2020-06-02 2023-04-18 腾讯科技(深圳)有限公司 Image retrieval method and device and computer equipment
CN113792816A (en) * 2021-09-27 2021-12-14 重庆紫光华山智安科技有限公司 Data encoding method, data encoding device, computer equipment and storage medium
CN113792816B (en) * 2021-09-27 2022-11-01 重庆紫光华山智安科技有限公司 Data encoding method, data encoding device, computer equipment and storage medium
CN116192402A (en) * 2023-01-18 2023-05-30 南阳理工学院 Communication method with information coding
CN116192402B (en) * 2023-01-18 2023-10-10 南阳理工学院 Communication method with information coding

Also Published As

Publication number Publication date
CN108631787B (en) 2020-04-03

Similar Documents

Publication Publication Date Title
CN108631787A (en) Data-encoding scheme, device, computer equipment and storage medium
CN109816615B (en) Image restoration method, device, equipment and storage medium
CN107563567A (en) Core extreme learning machine Flood Forecasting Method based on sparse own coding
CN109271933A (en) The method for carrying out 3 D human body Attitude estimation based on video flowing
CN110287335A (en) The personalized recommending scenery spot method and device of knowledge based map and user's shot and long term preference
CN110197307B (en) Regional sea surface temperature prediction method combined with attention mechanism
CN109299685A (en) Deduction network and its method for the estimation of human synovial 3D coordinate
CN108985899B (en) Recommendation method, system and storage medium based on CNN-LFM model
CN107562787A (en) A kind of POI coding methods and device, POI recommend method, electronic equipment
CN116128158B (en) Oil well efficiency prediction method of mixed sampling attention mechanism
CN110442846A (en) A kind of sequence data forecasting system of New Multi-scale attention mechanism
CN115307780B (en) Sea surface temperature prediction method, system and application based on time-space information interaction fusion
CN107003834A (en) Pedestrian detection apparatus and method
CN109919670A (en) Prediction technique, device, server and the storage medium of ad click probability
CN113255908A (en) Method, neural network model and device for service prediction based on event sequence
CN111861046A (en) Intelligent patent value evaluation system based on big data and deep learning
CN115049919A (en) Attention regulation based remote sensing image semantic segmentation method and system
CN117576402A (en) Deep learning-based multi-scale aggregation transducer remote sensing image semantic segmentation method
CN115222947B (en) Rock joint segmentation method and device based on global self-attention transformation network
CN101572693A (en) Equipment and method for parallel mode matching
CN109670057A (en) A kind of gradual end-to-end depth characteristic quantization system and method
CN112927810B (en) Smart medical response method based on big data and smart medical cloud computing system
CN112734519B (en) Commodity recommendation method based on convolution self-encoder network
CN114398980A (en) Cross-modal Hash model training method, encoding method, device and electronic equipment
Wong et al. Addressing Deep Learning Model Uncertainty in Long-Range Climate Forecasting with Late Fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant