CN110147444B - Text prediction method and device based on neural network language model and storage medium - Google Patents

Text prediction method and device based on neural network language model and storage medium Download PDF

Info

Publication number
CN110147444B
CN110147444B CN201811435778.XA CN201811435778A CN110147444B CN 110147444 B CN110147444 B CN 110147444B CN 201811435778 A CN201811435778 A CN 201811435778A CN 110147444 B CN110147444 B CN 110147444B
Authority
CN
China
Prior art keywords
hidden layer
text
cluster
expression
hidden
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811435778.XA
Other languages
Chinese (zh)
Other versions
CN110147444A (en
Inventor
陈强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201811435778.XA priority Critical patent/CN110147444B/en
Priority to CN201910745810.2A priority patent/CN110442721B/en
Publication of CN110147444A publication Critical patent/CN110147444A/en
Application granted granted Critical
Publication of CN110147444B publication Critical patent/CN110147444B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Machine Translation (AREA)

Abstract

The embodiment of the invention discloses a text prediction method, a text prediction device and a storage medium based on a neural network language model; the method comprises the following steps: mapping the input text into corresponding feature vectors through an input layer of the model; calling an activation function through a hidden layer of the model, and outputting a first hidden layer expression of the corresponding feature vector to an output layer; decomposing the first hidden layer expression through an output layer of the model to obtain second hidden layer expressions corresponding to the first hidden layer expression under different hidden subjects respectively; respectively determining a cluster corresponding to each second hidden layer expression, and calling a normalization index function corresponding to the cluster to obtain probability distribution corresponding to the second hidden layer expressions; the clustering comprises a head clustering and a tail clustering, and the output probability of the text classification in the head clustering is different from the output probability of the text classification in the tail clustering; and fusing the probability distribution corresponding to each second hidden layer expression, and outputting the target text corresponding to the text based on the probability distribution obtained after fusion.

Description

Text prediction method and device based on neural network language model and storage medium
Technical Field
The present invention relates to natural language processing technologies, and in particular, to a text prediction method, device, and storage medium based on a neural network language model.
Background
With the development of natural language processing technology, language models based on Recurrent Neural Network (RNN) architecture are increasingly applied to processing multi-category problems, however, when the categories to be processed are huge (such as 100K or even 1B), the training efficiency of language models in related technologies is low, and even training cannot be performed due to limited computing resources.
Disclosure of Invention
The embodiment of the invention provides a text prediction method, a text prediction device and a storage medium based on a neural network language model, which can improve the representation capability of the language model and improve the training efficiency of the language model.
The technical scheme of the embodiment of the invention is realized as follows:
in a first aspect, an embodiment of the present invention provides a text prediction apparatus based on a neural network language model, where the neural network language model includes an input layer, a hidden layer, and an output layer, the apparatus including:
the mapping module is used for mapping the input text into corresponding feature vectors through the input layer;
the hidden layer expression module is used for calling an activation function through the hidden layer to obtain a first hidden layer expression corresponding to the characteristic vector;
the output module is used for decomposing the first hidden layer expression through the output layer to obtain second hidden layer expressions corresponding to the first hidden layer expression under different hidden subjects respectively;
respectively determining a cluster type corresponding to each second hidden layer expression, and calling a normalization index function corresponding to the cluster type to obtain probability distribution corresponding to the second hidden layer expressions; the cluster category comprises a head cluster and a tail cluster, and the output probability of the text classification in the head cluster is different from the output probability of the text classification in the tail cluster;
and fusing the probability distribution corresponding to each second hidden layer expression, and outputting the target text corresponding to the text based on the probability distribution obtained after fusion.
In a second aspect, an embodiment of the present invention provides a text prediction method based on a neural network language model, including:
inputting a text to an input layer of the neural network language model to map the text into corresponding feature vectors;
calling an activation function through a hidden layer of the neural network language model to obtain a first hidden layer expression corresponding to the feature vector;
decomposing the first hidden layer expression through an output layer of the neural network language model to obtain second hidden layer expressions corresponding to the first hidden layer expression under different hidden subjects respectively;
respectively determining a cluster type corresponding to each second hidden layer expression, and calling a normalization index function corresponding to the cluster type to obtain probability distribution corresponding to the second hidden layer expressions; the cluster category comprises a head cluster and a tail cluster, and the output probability of the text classification in the head cluster is different from the output probability of the text classification in the tail cluster;
and fusing the probability distribution corresponding to each second hidden layer expression, and outputting the target text corresponding to the text based on the probability distribution obtained after fusion.
In a third aspect, an embodiment of the present invention provides a text prediction apparatus based on a neural network language model, where the apparatus includes:
a memory for storing an executable program;
and the processor is used for realizing the text prediction method based on the neural network language model when executing the executable program stored in the memory.
In a fourth aspect, an embodiment of the present invention provides a storage medium, which stores an executable program, and when the executable program is executed by a processor, the method for text prediction based on a neural network language model is implemented.
The application of the embodiment of the invention has the following beneficial effects:
1) Decomposing the first hidden layer expression of the text by an output layer of the neural network language model to obtain second hidden layer expressions corresponding to the first hidden layer expression under different hidden subjects respectively; the actual expression dimension of the model is expanded, and the integral characterization capability of the model is improved;
2) The plurality of text classifications are clustered to form a plurality of clustering categories including head clustering and tail clustering, the specific clustering category corresponds to the normalization index function, and different clustering categories correspond to different normalization index functions; because the output probability of the text classification in the head cluster is different from the output probability of the text classification in the tail cluster, the training chances of the normalization index functions corresponding to different cluster classes are unequal in the training process of the neural network language model, the parameters of the normalization index functions corresponding to the cluster classes with high output probability of the text classification are obviously updated frequently in the training process, when the number of the text classifications is large, the parameters of the normalization index functions corresponding to the cluster classes with low output probability are prevented from being frequently updated in the model training process, the model training efficiency is improved, and hardware resources are saved.
Drawings
FIG. 1 is a schematic diagram of a neural network language model according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a neural network language model according to an embodiment of the present invention;
FIG. 3 is a functional diagram of a softmax layer provided by an embodiment of the present invention;
FIG. 4 is a schematic diagram of an architecture of a neural network language model according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of an architecture of a neural network language model according to an embodiment of the present invention;
fig. 6 is a schematic flowchart of a text prediction method based on a neural network language model according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of a text prediction apparatus based on a neural network language model according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail below with reference to the accompanying drawings and examples. It should be understood that the examples provided herein are merely illustrative of the present invention and are not intended to limit the present invention. In addition, the following embodiments are provided as partial embodiments for implementing the present invention, not all embodiments for implementing the present invention, and the technical solutions described in the embodiments of the present invention may be implemented in any combination without conflict.
It should be noted that, in the embodiments of the present invention, the terms "comprises", "comprising" or any other variation thereof are intended to cover a non-exclusive inclusion, so that a method or apparatus including a series of elements includes not only the explicitly recited elements but also other elements not explicitly listed or inherent to the method or apparatus. Without further limitation, an element defined by the phrases "comprising a component of 8230; \8230;" does not exclude the presence of additional related elements in a method or apparatus that comprises the element (e.g., steps in a method or elements in an apparatus, such as elements that may be part of a circuit, part of a processor, part of a program or software, etc.).
The neural network language model provided by the embodiment of the invention is used for predicting the probability distribution of the nth word through the input n-1 words, namely predicting the probability of a word appearing at the next position through the neural network language model when the previous words are known.
As an embodiment of a neural network language model, fig. 1 is a schematic structural diagram of the neural network language model provided in the embodiment of the present invention, and referring to fig. 1, the neural network language model includes an input layer, a hidden layer, and an output layer;
an input layer: by a mapping matrix C (the size of the matrix is | V | m, where | V | is the vocabulary size and V = { w = { 1, w 2,… w |V| M is the dimension of a word vector), the first n-1 discrete words are mapped into n-1 m-dimensional vectors, namely, the words are changed into the word vectors in a table look-up mode, and then the n-1 m-dimensional vectors are connected end to form an m (n-1) vector, which is the input vector x of the neural network.
Hiding the layer: the number of nodes of the hidden layer is H, in order to convert an m (n-1) vector x output by the input layer into an input of the hidden layer (with a dimension of H), a parameter matrix H is required between the input layer and the hidden layer (the scale of H is H × m (n-1)), and a bias d is required, the change can be expressed as f (x) = Hx + d, which is a linear transformation, the output of the hidden layer needs to perform a non-linear transformation on the vector subjected to the linear transformation, in an embodiment, the selected activation function 1 is tanh/th (hyperbolic tangent), and the output of the corresponding hidden layer is tanh (Hx + d).
An output layer: the transfer from the hidden layer to the output layer also requires a linear transformation and a non-linear transformation, first converting the dimension of the output vector of the hidden layer into the number of nodes corresponding to the number of nodes of the output layer by the linear transformation, and in order to represent the output in the form of a probability distribution (the sum of the values in each dimension is 1), a non-linear transformation is required to be performed on the input of the output layer (i.e. the output of the hidden layer), and in one embodiment, the activation function 2 used is softmax (normalized exponential function) output probability distribution p.
In an embodiment, the number of hidden layers of the neural network language model is two, and the hidden layers are respectively present as feature layers, fig. 2 is a schematic architecture diagram of the neural network language model provided in the embodiment of the present invention, referring to fig. 2, a softmax layer is present as an output layer, data is processed by an input layer and the two feature layers, and finally, probability values with categories of y =0, y =1, and y =2 are obtained by the softmax layer.
With continuing reference to fig. 3, fig. 3 is a functional schematic diagram of the softmax layer according to the embodiment of the present invention, where 1, 2, and 3 represent three inputs, and the three inputs pass through the softmax to obtain an array [0.88,0.12,0], which respectively represents output probabilities of corresponding categories.
Fig. 4 is a schematic diagram of an architecture of a neural network language model according to an embodiment of the present invention, and referring to fig. 4, the neural network language model includes an input layer, a hidden layer, and an output layer; wherein, the hidden layer is implemented by a Recurrent Neural Network (RNN), specifically a Long Short-Term Memory Network (LSTM) in fig. 4; the activation function model corresponding to the output layer is a mixed Softmax model (MoS, mixture of Softmax).
Adopting a neural network language model based on a hybrid Softmax model, decomposing hidden layer expressions (hidden, vectors or matrixes) output by a hidden layer, namely hidden states output by the hidden layer into N hidden layer expressions (hidden, having the same dimensionality and data type with a source hidden) before the output of the neural network language model to a Softmax layer, then respectively carrying out weight distribution (weight sum is 1) according to the new N hidden layer expressions, carrying out independent Softmax calculation on each new hidden layer expression obtained by decomposition, outputting classification probability distribution, and finally carrying out weighted summation on all output classification probability distribution according to the weight distribution obtained by calculation to obtain the classification probability distributionAnd finally, carrying out classification probability distribution, and then carrying out target loss calculation. Referring to fig. 4, in which,
Figure DEST_PATH_IMAGE001
representing the t-th word in the word sequence w,
Figure DEST_PATH_IMAGE002
representing output hidden layer representation to LSTM layer
Figure DEST_PATH_IMAGE003
Performing expression decomposition into multiple hidden expressions, and passing each hidden expression
Figure DEST_PATH_IMAGE004
Obtaining corresponding multi-classification probability distribution expression after Softmax operation
Figure DEST_PATH_IMAGE005
Figure DEST_PATH_IMAGE006
Expressing the corresponding weight value of each hidden layer, and then according to
Figure 242123DEST_PATH_IMAGE006
Weighting and summing the probability distribution of all the hidden layers to obtain the final probability distribution, thereby predicting the next word
Figure DEST_PATH_IMAGE007
The neural network language model based on the hybrid Softmax model is applied, actual expression dimensionality of Softmax is expanded in a mode of calculating a plurality of Softmax after hidden layer expression is decomposed, and finally the purpose of improving the overall representation capacity of the model is achieved; and the information is basically kept complete in the model processing process by a mode of implicit expression decomposition and information fusion. However, softmax needs to perform an exponential operation on all classifications at each calculation, and thus, when the vocabulary is large in scale, a large amount of computing resources are consumed, which requires a high-performance calculator (for example, most operations in a neural network are matrix operations, and thus an expensive Graphics Processing Unit (GPU) may need to be configured), and meanwhile, a large amount of intermediate values need to be stored in the calculation process, which occupies a storage environment, which requires a large-storage flash memory (for example, a large-storage memory) or a hard disk exchange area, which causes a large cost for model training and limits a hardware environment required for training. Because Softmax needs to perform an exponential operation on all classes during each calculation, and most of the classes (in an embodiment, each word can be regarded as one class) in some training examples or training examples batch are rarely involved, such calculation actually wastes not only computing resources, but also greatly increases training time, so that training speed is greatly reduced, and training efficiency of the language model is seriously affected.
In order to solve the problem that the traditional Softmax method occupies a large Memory and causes Memory leakage (OOM, out Of Memory) due to the fact that the number Of classes is large in a multi-classification (classification with a large number Of classes) task, and finally training cannot be performed under the condition that the current hardware storage is limited, a self-Adaptive Softmax model (Adaptive Softmax) can be adopted, firstly, classes are arranged in a reverse order (namely, the classes are arranged from high to low according to the frequency) appearing in training data according to the frequency Of the classes (in one embodiment, each word can be regarded as one class, and different words are different classes), then, the classes are sequentially traversed and accumulated, the classes are clustered according to a preset statistical strategy, the large difference Of the total frequency Of the clusters is ensured, each class identifier (ID, IDdentifier) is allocated to each class, and an independent Softmax model is designed in model training; and when the target output of the training data belongs to a certain cluster, training and updating the parameters of the softmax model to which the corresponding cluster belongs, and performing multiple rounds of training on the training data set until the training is converged.
In the Adaptive Softmax model, the first cluster is called the Head class (i.e., head cluster) because the probability of the total word frequency occurrence is the greatest, meaning that the frequency of updating in the training is the highest, while the subsequent cluster is called the Tail class (i.e., tail cluster) because the category in the data has a lower frequency of occurrence. In practical application, the scale of the head cluster is below 10K, so that less hardware resources are occupied under the condition of high-frequency access, the calculation speed is high, and frequent updating of a large number of Softmax model parameters where non-high-frequency classes are located in training is avoided, so that the training efficiency is guaranteed while hardware resources are saved. In order to ensure that the Softmax models of the clusters where all the categories are located are updated, adaptive Softmax adds IDs of all the Tail classes to the end of the first cluster, and when the categories in the training sample do not appear in the Head cluster, the Tail classes to which the Tail classes belong can be found according to the IDs of the Tail classes, and then the Softmax models corresponding to the Tail classes are trained.
After the Adaptive Softmax model is applied and the classification targets are clustered according to a certain strategy, partial classification is guaranteed to be called in the calculation process, so that resource exhaustion caused by calculation resource null calculation is avoided, and the adaptivity of the Adaptive Softmax model is embodied in that only partial classification is called for calculation according to self conditions in different training samples.
In an embodiment, the Adaptive Softmax method is used to replace the traditional Softmax method in the hybrid Softmax model, that is, the Adaptive Softmax is introduced into the MoS to form a hybrid Adaptive Softmax model (MoAS), and the respective advantages of the MoS and the Adaptive Softmax are combined to ensure that any multi-class model can be trained normally while improving the model performance.
As an embodiment of the neural network language model, fig. 5 is a schematic structural diagram of the neural network language model provided in the embodiment of the present invention, and referring to fig. 5, the neural network language model provided in the embodiment of the present invention includes: an input layer, a hidden layer and an output layer; wherein,
the input layer is used for mapping the input text into corresponding characteristic vectors and inputting the hidden layer;
the hidden layer is used for calling an activation function based on the input feature vector and outputting a first hidden layer expression corresponding to the feature vector to an output layer;
the output layer is used for decomposing the first hidden layer expression to obtain second hidden layer expressions corresponding to the first hidden layer expression under different hidden subjects respectively;
respectively determining a cluster type corresponding to each second hidden layer expression, and calling a normalization index function corresponding to the cluster type to obtain probability distribution corresponding to the second hidden layer expressions; the cluster category comprises a head cluster and a tail cluster, and the output probability of the text classification in the head cluster is different from the output probability of the text classification in the tail cluster;
and fusing the probability distribution corresponding to each second hidden layer expression, and outputting the target text corresponding to the text based on the probability distribution obtained after fusion.
In one embodiment, the input text is words, the words are mapped into corresponding input vectors after being input into the input layer of the language model, and then the hidden layer expression is obtained by processing the words through the hidden layer
Figure DEST_PATH_IMAGE008
I.e. the first hidden layer representation.
Referring to fig. 5, the output layer in the neural network language model according to the embodiment of the present invention is mosas, that is, the conventional Softmax method in the hybrid Softmax model is replaced with the adaptive Softmax method, specifically, as shown in fig. 4
Figure DEST_PATH_IMAGE009
Alternative to that in figure 5
Figure DEST_PATH_IMAGE010
In one embodiment, the output layer adopts multiple sets of parameters to construct N independent full-connection networks
Figure DEST_PATH_IMAGE011
Expressing the hidden layer
Figure 283897DEST_PATH_IMAGE008
Applying to the fully-connected network model corresponding to different hidden subjects to obtain
Figure 329213DEST_PATH_IMAGE008
Hidden layer expression corresponding to multiple hidden subjects
Figure DEST_PATH_IMAGE012
I.e. the second hidden layer representation; in particular, the amount of the solvent to be used,
Figure DEST_PATH_IMAGE014
in an embodiment, the output layer is further configured to determine a weight of each of the second hidden layer expressions under the corresponding hidden subject
Figure DEST_PATH_IMAGE015
In the case of a liquid crystal display device, in particular,
Figure DEST_PATH_IMAGE017
wherein,
Figure 317898DEST_PATH_IMAGE015
and expressing the weight of the ith second hidden layer obtained by decomposition under the corresponding hidden subject. It is explained here that hidden subject, in practical application, a sentence or a document is often belonged to a subject, and if a sentence about sports is found in a document of a technical subject suddenly, it will certainly feel strange, which is to say that the subject consistency is destroyed.
In an embodiment, the output layer is further configured to cluster a plurality of text classifications according to frequencies of occurrence of the text classifications in training data to obtain one head cluster and at least one tail cluster.
Specifically, the output layer sorts the plurality of text classifications in the order from high frequency to low frequency to obtain a text classification sequence; traversing the text classification sequence, and accumulating the frequency of text classification; stopping the traversal when the accumulated frequency of the text classification meets a preset condition, and taking a set formed by all the text classifications traversed in the text classification sequence as a head cluster; in practical application, the cumulative frequency of text classification meeting the preset condition may be: the percentage of the accumulated frequency of the text classification to the total frequency reaches a preset percentage threshold, such as 80%;
the output layer continuously traverses the remaining unretraversed text classification sequences in the text classification sequences and accumulates the frequency of text classification; stopping the traversal when the accumulated frequency of the text classification meets a preset condition, and taking a set formed by all the text classifications traversed this time in the text classification sequence as a tail cluster; the method for obtaining the tail clusters is adopted to obtain one or more tail clusters, if the number of the current tail clusters does not reach the preset number (specifically, the number can be set according to actual needs), the output layer repeatedly executes the following operations until the number of the tail clusters is the preset number:
continuously traversing the remaining unretraversed text classification sequences in the text classification sequences, and accumulating the frequency of text classification; and when the accumulated frequency of the text classification meets a preset condition, stopping the traversal, and taking a set formed by all the text classifications traversed this time in the text classification sequence as a tail cluster.
In practical applications, the cluster categories in the normal case include head clusters and tail clusters, and the special case may include only head clusters.
In an embodiment, the output layer is further configured to assign a class ID to each tail cluster; correspondingly, the head cluster also includes the class IDs of the tail clusters of the preset number.
In actual implementation, the number of clustering categories is set
Figure DEST_PATH_IMAGE018
(1 Head classification, M-1 Tail classification), performing frequency statistics on all classifications in training data, and arranging according to descending order to obtain a sequence classification sequence
Figure DEST_PATH_IMAGE019
(ii) a Then from high to low in frequencyThe sequence is traversed and classified, the frequency is accumulated and counted, and when the current classification is traversed
Figure DEST_PATH_IMAGE020
(in practical implementation, the p-th word in the word list V can be used), the accumulated frequency of classification reaches 80% of the total frequency, the traversal is stopped, and the position of the beginning of the sequence of sequential classification is determined
Figure DEST_PATH_IMAGE021
To the current position
Figure 725746DEST_PATH_IMAGE020
As Head clusters (Head classes), while all tails (Tail classes) are clustered (i.e., IDs) of
Figure DEST_PATH_IMAGE022
) Adding head clustering to obtain:
Figure DEST_PATH_IMAGE023
reset order classification
Figure DEST_PATH_IMAGE024
Sequentially obtaining according to the acquisition mode of the head cluster
Figure DEST_PATH_IMAGE025
Based on the above description of clustering, the following description will discuss the training of the output layer MoAS model.
In an embodiment, the output layer is further configured to determine a cluster class corresponding to a second hidden layer expression of the training data, and then train a normalized exponential function corresponding to the cluster class to predict performance of corresponding target data according to the second hidden layer expression of the training data, with the second hidden layer expression of the training data as input and the target data corresponding to the training data as output. In practical implementation, after target clustering is performed on the vocabulary, each cluster corresponds to a respective Softmax model, and after the cluster to which the input training data belongs is determined, only the Softmax model parameters of the corresponding cluster are updated.
In practical implementation, the output layer applies a normalized exponential function (Softmax) corresponding to the head cluster to the second hidden layer expression to obtain a probability distribution corresponding to the second hidden layer expression; determining a text (word) corresponding to the maximum value of the probability distribution corresponding to the second hidden layer expression; and determining a cluster corresponding to the second hidden layer expression according to the determined text.
In particular, with continued reference to FIG. 5, in actual practice, for training data, it is first mapped to its corresponding cluster, e.g., for training data batch
Figure DEST_PATH_IMAGE026
Classified according to its target
Figure DEST_PATH_IMAGE027
(ii) a Wherein,
Figure DEST_PATH_IMAGE028
for batch size, the data is mapped to its corresponding cluster, respectively, assuming
Figure DEST_PATH_IMAGE029
The hidden layer obtained by LSTM calculation is expressed as
Figure DEST_PATH_IMAGE030
Then the mapping result is:
Figure DEST_PATH_IMAGE032
then, calculating the loss of Softmax corresponding to each hidden layer expression obtained after the hidden layer expression decomposition in the corresponding data item; for the
Figure DEST_PATH_IMAGE033
Where k corresponds to the decomposed k-th layer implicit expression, assuming
Figure DEST_PATH_IMAGE034
The corresponding loss calculation is shown in equation (3):
Figure DEST_PATH_IMAGE036
wherein,
Figure DEST_PATH_IMAGE037
which represents a cross-entropy calculation of the cross-entropy,
Figure DEST_PATH_IMAGE038
the model parameters of Softmax expressed for the corresponding hidden layer of the k layer.
The loss calculation of the whole training data batch is shown as the formula (4):
Figure DEST_PATH_IMAGE040
still taking the training data as batch
Figure 996145DEST_PATH_IMAGE026
The training of the output layer MoAS model of the language model according to the embodiment of the present invention is described as an example.
batch
Figure 528758DEST_PATH_IMAGE026
The input layer passing through the language model is mapped into corresponding characteristic vectors, and the hidden layer expression is output through the hidden layer
Figure 993237DEST_PATH_IMAGE030
Then, through the hidden layer expression decomposition of the output layer, will
Figure 927695DEST_PATH_IMAGE030
Is decomposed into
Figure DEST_PATH_IMAGE041
(ii) a Wherein,
Figure DEST_PATH_IMAGE042
for training samples
Figure DEST_PATH_IMAGE043
To (1) a
Figure DEST_PATH_IMAGE044
A decomposition of the hidden-layer expression vector,
Figure DEST_PATH_IMAGE045
to decompose the number of hidden layers; at the same time by the formula
Figure DEST_PATH_IMAGE046
Calculating weights corresponding to the hidden subjects
Figure DEST_PATH_IMAGE047
Wherein
Figure DEST_PATH_IMAGE048
Figure DEST_PATH_IMAGE049
Is a scalar quantity.
Mapping the data items corresponding to the training data batch under each cluster, and resetting the data batch data items obtained by each mask according to the number of the subjects, specifically:
respectively mapping the data to the corresponding clusters, and obtaining a mapping result as follows:
Figure DEST_PATH_IMAGE051
then, for each sub data batch block
Figure DEST_PATH_IMAGE052
And operating according to the formula (5) to obtain a new batch data block (block):
Figure DEST_PATH_IMAGE054
Figure DEST_PATH_IMAGE056
Figure DEST_PATH_IMAGE058
wherein,
Figure DEST_PATH_IMAGE059
it is shown that the operation of the cascade is,
Figure DEST_PATH_IMAGE060
and
Figure DEST_PATH_IMAGE061
respectively represent clusters
Figure DEST_PATH_IMAGE062
First of
Figure DEST_PATH_IMAGE064
First of a training example
Figure 560539DEST_PATH_IMAGE044
Decomposition hidden layer expression and corresponding weight
Figure DEST_PATH_IMAGE065
Representing clusters
Figure 846027DEST_PATH_IMAGE062
First of
Figure 848618DEST_PATH_IMAGE064
Target classification for each training example.
Then, the class probability distribution of each Softmax on the corresponding reset batch data item is calculated according to equation (6):
Figure DEST_PATH_IMAGE067
then, based on the weight, in accordance with equation (7)
Figure DEST_PATH_IMAGE068
And carrying out weighted summation to obtain the loss after the class probability distribution of each batch data item:
Figure DEST_PATH_IMAGE070
finally, the loss of the entire batch is calculated according to equation (8):
Figure DEST_PATH_IMAGE072
the model training adopts a feedforward neural network (BP) mode, and in practical application, the training of the neural network language model provided by the embodiment of the invention can adopt one machine with multiple cards or multiple machines with multiple cards for training; here, the multi-card refers to a device having a plurality of GPUs/Field Programmable Gate Arrays (FPGAs)/Application Specific Integrated Circuits (ASICs) for model parameter calculation, and the multi-card refers to a cluster of devices having the multi-card.
In an embodiment, class-based Softmax may also be introduced into MoS, and since the design of Class-based Softmax and adaptive Softmax is also to solve the training problem caused by the huge number of classes, it may be replaced with adaptive Softmax in the embodiment of the present invention.
In an embodiment, noise Contrast Estimation (NCE) can be introduced into MoS, and NCE adopts a negative sampling method, and trains a model by a positive and negative sample Loss comparison method, which is helpful for increasing the model training speed.
Next, an application scenario of the neural network language model provided in the embodiment of the present invention is explained.
In natural language processing and many scenarios in the speech field, language models play an important role, such as optimizing translation results by the language models in machine translation, and decoding by the language models together with acoustic model results in speech recognition to improve recognition effects. For example, the input pinyin string is nixianzaiganshenme, the corresponding output can be in various forms, such as what you do now, what you catch up with you again in west' an, and the like, and then which one is the correct conversion result, and by using the neural network language model, we know that the probability of the former is greater than that of the latter, so that the conversion into the former is more reasonable under most conditions. As another example of machine translation, given a Chinese sentence Li Ming watching TV at home, it can be translated into Li Ming is watching TV at home, li Ming at home watching TV, etc. also according to the language model, we know that the probability of the former is greater than that of the latter, so the translation into the former is more reasonable.
The language modeling based on the RNN framework is a typical multi-classification problem with huge categories, the number of word lists is the number of categories, and the scale of the word lists in natural language often reaches the magnitude of 100K or even 1B, which means that the problem that the model cannot be trained due to the limitation of computing resources is very likely to occur, and the neural network language model provided by the embodiment of the invention can be perfectly suitable for the language modeling problem with the large word lists.
Fig. 6 is a schematic flow diagram of a text prediction method based on a neural network language model according to an embodiment of the present invention, and referring to fig. 6, a text prediction method based on a neural network language model according to an embodiment of the present invention includes:
step 101: inputting text into an input layer of the neural network language model to map the text into corresponding feature vectors.
Here, in practical applications, the input text may be a word sequence, and the word sequence maps discrete words into corresponding m-dimensional vectors through the mapping matrix C of the input layer, as the input of the hidden layer.
Step 102: and calling an activation function through a hidden layer of the neural network language model to obtain a first hidden layer expression corresponding to the feature vector.
In one embodiment, the activation function called by the hidden layer is a tanh function, and the input vector passes through the hidden layer and outputs a first hidden layer expression (hidden, vector or matrix) corresponding to the input vector.
Step 103: and decomposing the first hidden layer expression through an output layer of the neural network language model to obtain second hidden layer expressions corresponding to the first hidden layer expression under different hidden subjects respectively.
Here, in practical implementation, the output layer adopts multiple sets of parameters to construct N independent fully-connected networks
Figure 965479DEST_PATH_IMAGE011
Expressing the hidden layer
Figure 926482DEST_PATH_IMAGE008
Applying the model to the fully-connected network corresponding to different hidden subjects to obtain
Figure 433686DEST_PATH_IMAGE008
Hidden layer expressions corresponding to multiple hidden topics
Figure 974389DEST_PATH_IMAGE012
I.e. the second hidden layer representation, the dimensions and data types of the second hidden layer representation are the same as those of the first hidden layer representation.
In an embodiment, after the output layer performs hidden layer expression decomposition, a weight of each second hidden layer expression under the corresponding hidden topic is further determined, which can be specifically implemented according to formula (2).
Step 104: respectively determining a cluster type corresponding to each second hidden layer expression, and calling a normalization index function corresponding to the cluster type to obtain probability distribution corresponding to the second hidden layer expression; the cluster category comprises a head cluster and a tail cluster, and the output probability of the text classification in the head cluster is different from the output probability of the text classification in the tail cluster.
Here, in actual implementation, the output layer clusters a plurality of text classifications according to the frequency of occurrence of the text classifications in the training data to obtain at least one head cluster and at least one tail cluster. Each cluster corresponds to a respective normalized exponential function (Softmax), in particular:
sequencing the plurality of text classifications according to the sequence of the frequency from high to low to obtain a text classification sequence; traversing the text classification sequence, and accumulating the frequency of text classification; when the accumulated frequency of the text classifications meets a preset condition, stopping the traversal, and taking a set formed by all the text classifications traversed in the text classification sequence as the head cluster;
traversing the remaining unretraversed text classification sequences in the text classification sequence, and accumulating the frequency of text classification; stopping the traversal when the accumulated frequency of the text classification meets a preset condition, and taking a set formed by all the text classifications traversed this time in the text classification sequence as a tail cluster; and repeatedly executing the operations of traversing, frequency accumulating and preset condition judging until the number of the acquired tail clusters is Q, wherein Q is a preset positive integer.
In an embodiment, the method further comprises: assigning a class ID to the tail cluster; correspondingly, the head cluster may further include the class IDs of the Q tail clusters.
In an embodiment, the cluster category corresponding to each of the second hidden layer expressions may be determined separately as follows:
applying a normalization index function corresponding to the head cluster to the second hidden layer expression to obtain probability distribution corresponding to the second hidden layer expression; determining a text corresponding to the maximum value of the probability distribution corresponding to the second hidden layer expression; and determining the cluster category corresponding to the second hidden layer expression according to the determined text.
Step 105: and fusing the probability distribution corresponding to each second hidden layer expression, and outputting the target text corresponding to the text based on the probability distribution obtained after fusion.
In one embodiment, the corresponding probability distribution of each of the second hidden layer expressions may be fused as follows:
determining a weight of each of the second hidden layer representations under the respective hidden topic; and based on the weight of each second hidden layer expression under the corresponding hidden subject, carrying out weighted summation on the probability distribution corresponding to each second hidden layer expression to obtain the fused probability distribution.
The embodiment of the invention also provides a text prediction device based on the neural network language model, the neural network language model comprises an input layer, a hidden layer and an output layer, and the device comprises:
the mapping module is used for mapping the input text into corresponding feature vectors through the input layer;
the hidden layer expression module is used for calling an activation function through the hidden layer to obtain a first hidden layer expression corresponding to the characteristic vector;
the output module is used for decomposing the first hidden layer expression through the output layer to obtain second hidden layer expressions corresponding to the first hidden layer expression under different hidden subjects respectively;
respectively determining a cluster type corresponding to each second hidden layer expression, and calling a normalization index function corresponding to the cluster type to obtain probability distribution corresponding to the second hidden layer expressions; the cluster category comprises a head cluster and a tail cluster, and the output probability of the text classification in the head cluster is different from the output probability of the text classification in the tail cluster;
and fusing the probability distribution corresponding to each second hidden layer expression, and outputting the target text corresponding to the text based on the probability distribution obtained after fusion.
In some embodiments, the output module is further configured to cluster the plurality of text classifications according to a frequency of occurrence of the text classifications in the training data to obtain at least one of the head clusters and at least one of the tail clusters.
In some embodiments, the output module is further configured to apply a normalized exponential function corresponding to the head cluster to the second hidden layer expression to obtain a probability distribution corresponding to the second hidden layer expression;
determining a text corresponding to the maximum value of the probability distribution corresponding to the second hidden layer expression;
and determining a cluster corresponding to the second hidden layer expression according to the determined text.
In some embodiments, the output module is further configured to determine a cluster category corresponding to a second hidden layer expression of the training data;
and taking the second implicit expression of the training data as input, taking the target data corresponding to the training data as output, and training the normalized exponential function corresponding to the cluster category to predict the performance of the corresponding target data according to the second implicit expression of the training data.
Fig. 7 is a schematic structural diagram of a text prediction apparatus based on a neural network language model according to an embodiment of the present invention, and referring to fig. 7, the text prediction apparatus based on the neural network language model according to the embodiment of the present invention includes: at least one processor 210, memory 240, at least one network interface 220, and a user interface 230. The various components in the device are coupled together by a bus system 250. It is understood that the bus system 250 is used to enable connected communication between these components. The bus system 250 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are designated as bus system 250 in FIG. 7.
The user interface 230 may include a display, keyboard, mouse, trackball, click wheel, keys, buttons, touch pad, touch screen, or the like.
The memory 240 may be either volatile memory or nonvolatile memory, and may include both volatile and nonvolatile memory. The nonvolatile Memory may be a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Programmable Read-Only Memory (EPROM), a Flash Memory (Flash Memory), or the like. Volatile Memory can be Random Access Memory (RAM), which acts as external cache Memory.
The Processor 210 may be an integrated circuit chip having Signal processing capabilities, such as a general purpose Processor, a Digital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like, wherein the general purpose Processor may be a microprocessor or any conventional Processor, or the like.
The memory 240 is capable of storing executable instructions 2401 to support the operation of the message processing apparatus, examples of which include: various forms of software modules such as programs, plug-ins, and scripts for operating on a message processing device may include, for example, an operating system and application programs, where the operating system contains various system programs such as a framework layer, a core library layer, a driver layer, etc. for implementing various underlying services and handling hardware-based tasks.
In one embodiment, a memory for storing an executable program;
a processor, configured to implement, when executing the executable program stored in the memory:
inputting a text to an input layer of the neural network language model to map the text into corresponding feature vectors;
calling an activation function through a hidden layer of the neural network language model to obtain a first hidden layer expression corresponding to the feature vector;
decomposing the first hidden layer expression through an output layer of the neural network language model to obtain second hidden layer expressions corresponding to the first hidden layer expression under different hidden subjects respectively;
respectively determining a cluster type corresponding to each second hidden layer expression, and calling a normalization index function corresponding to the cluster type to obtain probability distribution corresponding to the second hidden layer expressions; the cluster category comprises a head cluster and a tail cluster, and the output probability of the text classification in the head cluster is different from the output probability of the text classification in the tail cluster;
and fusing the probability distribution corresponding to each second hidden layer expression, and outputting the target text corresponding to the text based on the probability distribution obtained after fusion.
In an embodiment, the processor is further configured to cluster the plurality of text classifications according to frequencies of occurrence of the text classifications in the training data to obtain at least one head cluster and at least one tail cluster.
In an embodiment, the processor is further configured to sort the text classifications in an order from high to low according to the frequency to obtain a text classification sequence;
traversing the text classification sequence, and accumulating the frequency of text classification;
and when the accumulated frequency of the text classification meets a preset condition, stopping the traversal, and taking a set formed by all the text classifications traversed in the text classification sequence as the head cluster.
In an embodiment, the processor is further configured to repeatedly perform the following operations until a predetermined number of tail clusters are obtained:
traversing the remaining unretraversed text classification sequences in the text classification sequence, and accumulating the frequency of text classification;
and when the accumulated frequency of the text classification meets a preset condition, stopping the traversal, and taking a set formed by all the text classifications traversed this time in the text classification sequence as a tail cluster.
In an embodiment, the processor is further configured to assign a class ID to each of the tail clusters;
correspondingly, the class IDs of the predetermined number of tail clusters are also included in the head cluster.
In an embodiment, the processor is further configured to apply a normalized exponential function corresponding to the head cluster to the second hidden layer expression to obtain a probability distribution corresponding to the second hidden layer expression;
determining a text corresponding to the maximum value of the probability distribution corresponding to the second hidden layer expression;
and determining the cluster category corresponding to the second hidden layer expression according to the determined text.
In an embodiment, the processor is further configured to determine a cluster category corresponding to a second hidden layer expression of the training data;
and taking the second implicit expression of the training data as input, taking the target data corresponding to the training data as output, and training the normalized exponential function corresponding to the cluster category to predict the performance of the corresponding target data according to the second implicit expression of the training data.
In an embodiment, the processor is further configured to determine a weight of each of the second hidden layers expressed under the corresponding hidden subject;
and based on the weight of each second hidden layer expression under the corresponding hidden subject, carrying out weighted summation on the probability distribution corresponding to each second hidden layer expression to obtain fused probability distribution.
In an embodiment, the processor is further configured to apply the first hidden layer expression to fully-connected network models corresponding to different hidden topics, and call an activation function to output second hidden layer expressions corresponding to the first hidden layer expression and the second hidden layer expressions corresponding to different hidden topics, respectively.
The embodiment of the invention also provides a storage medium which stores an executable program, and when the executable program is executed by a processor, the text prediction method based on the neural network language model is realized.
Here, it should be noted that: the description of the text prediction device related to the neural network language model is similar to the description of the method, and the description of the beneficial effects of the method is not repeated. For technical details not disclosed in the embodiment of the text prediction apparatus of the neural network language model of the present invention, please refer to the description of the embodiment of the method of the present invention.
All or part of the steps of the embodiments may be implemented by hardware associated with program instructions, and the program may be stored in a computer-readable storage medium, and when executed, performs the steps including the method embodiments; and the aforementioned storage medium includes: various media that can store program codes, such as a removable Memory device, a Random Access Memory (RAM), a Read-Only Memory (ROM), a magnetic disk, or an optical disk.
Alternatively, the integrated unit of the present invention may be stored in a computer-readable storage medium if it is implemented in the form of a software functional module and sold or used as a separate product. Based on such understanding, the technical solutions of the embodiments of the present invention or portions thereof contributing to the related art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: a removable storage device, a RAM, a ROM, a magnetic or optical disk, or various other media that can store program code.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims (15)

1. A text prediction method based on a neural network language model is characterized by comprising the following steps:
inputting a text to an input layer of the neural network language model to map the text into corresponding feature vectors;
calling an activation function through a hidden layer of the neural network language model to obtain a first hidden layer expression corresponding to the feature vector;
decomposing the first hidden layer expression through an output layer of the neural network language model to obtain second hidden layer expressions corresponding to the first hidden layer expression under different hidden subjects respectively;
respectively determining a cluster type corresponding to each second hidden layer expression, and calling a normalization index function corresponding to the cluster type to obtain probability distribution corresponding to the second hidden layer expression; the cluster category comprises a head cluster and a tail cluster, and the output probability of the text classification in the head cluster is different from the output probability of the text classification in the tail cluster;
and fusing the probability distribution corresponding to each second hidden layer expression, and outputting the target text corresponding to the text based on the probability distribution obtained after fusion.
2. The method of claim 1, wherein the method further comprises:
and clustering a plurality of text classifications according to the frequency of the text classifications appearing in the training data to obtain at least one head cluster and at least one tail cluster.
3. The method of claim 2, wherein clustering the plurality of text classifications based on how often the text classifications occur in the training data comprises:
sequencing the plurality of text classifications according to the sequence of the frequency from high to low to obtain a text classification sequence;
traversing the text classification sequence, and accumulating the frequency of text classification;
and when the accumulated frequency of the text classifications meets a preset condition, stopping the traversal, and taking a set formed by all the text classifications traversed in the text classification sequence as the head cluster.
4. The method of claim 3, wherein the method further comprises:
the following operations are repeatedly performed until a predetermined number of tail clusters are obtained:
traversing the remaining unretraversed text classification sequences in the text classification sequence, and accumulating the frequency of text classification;
and when the accumulated frequency of the text classification meets a preset condition, stopping the traversal, and taking a set formed by all the text classifications traversed this time in the text classification sequence as a tail cluster.
5. The method of claim 4, wherein the method further comprises:
assigning a class identification ID to each tail cluster respectively;
correspondingly, the class IDs of the predetermined number of tail clusters are also included in the head cluster.
6. The method of claim 1, wherein said separately determining a cluster class to which each of said second hidden layer expressions corresponds comprises:
applying a normalization index function corresponding to the head cluster to the second hidden layer expression to obtain probability distribution corresponding to the second hidden layer expression;
determining a text corresponding to the maximum value of the probability distribution corresponding to the second hidden layer expression;
and determining the cluster category corresponding to the second hidden layer expression according to the determined text.
7. The method of claim 1, wherein the method further comprises:
determining a cluster type corresponding to a second hidden layer expression of the training data;
and taking the second hidden layer expression of the training data as input, taking the target data corresponding to the training data as output, and training the normalization index function corresponding to the cluster type to predict the performance of the corresponding target data according to the second hidden layer expression of the training data.
8. The method of claim 1, wherein said fusing each of said second hidden layer representations with a corresponding probability distribution comprises:
determining a weight of each of the second hidden layer expressions under the respective hidden subject;
and based on the weight of each second hidden layer expression under the corresponding hidden subject, carrying out weighted summation on the probability distribution corresponding to each second hidden layer expression to obtain fused probability distribution.
9. The method of claim 1, wherein decomposing the first hidden layer expression into second hidden layer expressions corresponding to the first hidden layer expression under different hidden subjects comprises:
and applying the first hidden layer expression to the full-connection network models corresponding to different hidden topics, and calling an activation function to output second hidden layer expressions corresponding to the first hidden layer expression under different hidden topics respectively.
10. An apparatus for text prediction based on a neural network language model, wherein the neural network language model comprises an input layer, a hidden layer and an output layer, the apparatus comprising:
the mapping module is used for mapping the input text into corresponding feature vectors through the input layer;
the hidden layer expression module is used for calling an activation function through the hidden layer to obtain a first hidden layer expression corresponding to the characteristic vector;
the output module is used for decomposing the first hidden layer expression through the output layer to obtain second hidden layer expressions corresponding to the first hidden layer expression under different hidden subjects respectively;
respectively determining a cluster type corresponding to each second hidden layer expression, and calling a normalization index function corresponding to the cluster type to obtain probability distribution corresponding to the second hidden layer expressions; the cluster category comprises a head cluster and a tail cluster, and the output probability of the text classification in the head cluster is different from the output probability of the text classification in the tail cluster;
and fusing the probability distribution corresponding to each second hidden layer expression, and outputting the target text corresponding to the text based on the probability distribution obtained after fusion.
11. The apparatus of claim 10,
the output module is further configured to cluster the plurality of text classifications according to the frequency of occurrence of the text classifications in the training data to obtain at least one head cluster and at least one tail cluster.
12. The apparatus of claim 10,
the output module is further configured to apply a normalization index function corresponding to the head cluster to the second hidden layer expression to obtain a probability distribution corresponding to the second hidden layer expression;
determining a text corresponding to the maximum value of the probability distribution corresponding to the second hidden layer expression;
and determining a cluster corresponding to the second hidden layer expression according to the determined text.
13. The apparatus of claim 10,
the output module is further used for determining a cluster type corresponding to a second hidden layer expression of the training data;
and taking the second implicit expression of the training data as input, taking the target data corresponding to the training data as output, and training the normalized exponential function corresponding to the cluster category to predict the performance of the corresponding target data according to the second implicit expression of the training data.
14. A text prediction apparatus based on a neural network language model, comprising:
a memory for storing an executable program;
a processor for implementing the neural network language model-based text prediction method of any one of claims 1 to 9 when executing the executable program stored in the memory.
15. A storage medium storing an executable program which, when executed by a processor, implements the neural network language model-based text prediction method according to any one of claims 1 to 9.
CN201811435778.XA 2018-11-28 2018-11-28 Text prediction method and device based on neural network language model and storage medium Active CN110147444B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201811435778.XA CN110147444B (en) 2018-11-28 2018-11-28 Text prediction method and device based on neural network language model and storage medium
CN201910745810.2A CN110442721B (en) 2018-11-28 2018-11-28 Neural network language model, training method, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811435778.XA CN110147444B (en) 2018-11-28 2018-11-28 Text prediction method and device based on neural network language model and storage medium

Related Child Applications (1)

Application Number Title Priority Date Filing Date
CN201910745810.2A Division CN110442721B (en) 2018-11-28 2018-11-28 Neural network language model, training method, device and storage medium

Publications (2)

Publication Number Publication Date
CN110147444A CN110147444A (en) 2019-08-20
CN110147444B true CN110147444B (en) 2022-11-04

Family

ID=67589307

Family Applications (2)

Application Number Title Priority Date Filing Date
CN201811435778.XA Active CN110147444B (en) 2018-11-28 2018-11-28 Text prediction method and device based on neural network language model and storage medium
CN201910745810.2A Active CN110442721B (en) 2018-11-28 2018-11-28 Neural network language model, training method, device and storage medium

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN201910745810.2A Active CN110442721B (en) 2018-11-28 2018-11-28 Neural network language model, training method, device and storage medium

Country Status (1)

Country Link
CN (2) CN110147444B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110880040A (en) * 2019-11-08 2020-03-13 支付宝(杭州)信息技术有限公司 Method and system for automatically generating cumulative features
CN113159080A (en) * 2020-01-22 2021-07-23 株式会社东芝 Information processing apparatus, information processing method, and storage medium
CN111667069B (en) * 2020-06-10 2023-08-04 中国工商银行股份有限公司 Pre-training model compression method and device and electronic equipment
CN111898145B (en) * 2020-07-22 2022-11-25 苏州浪潮智能科技有限公司 Neural network model training method, device, equipment and medium
CN113243018A (en) * 2020-08-01 2021-08-10 商汤国际私人有限公司 Target object identification method and device
CN115243270B (en) * 2021-04-07 2023-09-22 ***通信集团设计院有限公司 5G network planning method, device, computing equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107578106A (en) * 2017-09-18 2018-01-12 中国科学技术大学 A kind of neutral net natural language inference method for merging semanteme of word knowledge
CN108197109A (en) * 2017-12-29 2018-06-22 北京百分点信息科技有限公司 A kind of multilingual analysis method and device based on natural language processing
CN108595632A (en) * 2018-04-24 2018-09-28 福州大学 A kind of hybrid neural networks file classification method of fusion abstract and body feature

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101546389A (en) * 2008-03-26 2009-09-30 中国科学院半导体研究所 Primary direction neural network system
US9235799B2 (en) * 2011-11-26 2016-01-12 Microsoft Technology Licensing, Llc Discriminative pretraining of deep neural networks
US20140156575A1 (en) * 2012-11-30 2014-06-05 Nuance Communications, Inc. Method and Apparatus of Processing Data Using Deep Belief Networks Employing Low-Rank Matrix Factorization
CN103823845B (en) * 2014-01-28 2017-01-18 浙江大学 Method for automatically annotating remote sensing images on basis of deep learning
CN104572504B (en) * 2015-02-02 2017-11-03 浪潮(北京)电子信息产业有限公司 A kind of method and device for realizing data pre-head
CN105760507B (en) * 2016-02-23 2019-05-03 复旦大学 Cross-module state topic relativity modeling method based on deep learning
CN107609055B (en) * 2017-08-25 2019-10-11 西安电子科技大学 Text image multi-modal retrieval method based on deep layer topic model
CN108563639B (en) * 2018-04-17 2021-09-17 内蒙古工业大学 Mongolian language model based on recurrent neural network

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107578106A (en) * 2017-09-18 2018-01-12 中国科学技术大学 A kind of neutral net natural language inference method for merging semanteme of word knowledge
CN108197109A (en) * 2017-12-29 2018-06-22 北京百分点信息科技有限公司 A kind of multilingual analysis method and device based on natural language processing
CN108595632A (en) * 2018-04-24 2018-09-28 福州大学 A kind of hybrid neural networks file classification method of fusion abstract and body feature

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于注意力长短时记忆网络的中文词性标注模型;司念文等;《计算机科学》;20180415(第04期);全文 *
融合潜在主题信息和卷积语义特征的文本主题分类;陈培新等;《信号处理》;20170825(第08期);全文 *

Also Published As

Publication number Publication date
CN110442721B (en) 2023-01-06
CN110147444A (en) 2019-08-20
CN110442721A (en) 2019-11-12

Similar Documents

Publication Publication Date Title
CN110147444B (en) Text prediction method and device based on neural network language model and storage medium
Chen et al. Supervised feature selection with a stratified feature weighting method
Luo et al. Online learning of interpretable word embeddings
JP6212217B2 (en) Weight generation in machine learning
CN116861995A (en) Training of multi-mode pre-training model and multi-mode data processing method and device
CN111160000B (en) Composition automatic scoring method, device terminal equipment and storage medium
CN112749274A (en) Chinese text classification method based on attention mechanism and interference word deletion
CN111858898A (en) Text processing method and device based on artificial intelligence and electronic equipment
Dupre et al. Improving dataset volumes and model accuracy with semi-supervised iterative self-learning
CN117150026B (en) Text content multi-label classification method and device
CN112988548A (en) Improved Elman neural network prediction method based on noise reduction algorithm
CN113971733A (en) Model training method, classification method and device based on hypergraph structure
CN113934851A (en) Data enhancement method and device for text classification and electronic equipment
CN114444476A (en) Information processing method, apparatus and computer readable storage medium
CN111783688B (en) Remote sensing image scene classification method based on convolutional neural network
CN116910549A (en) Model training method, device, computer equipment and storage medium
CN115617971B (en) Dialog text generation method based on ALBERT-Coref model
CN112463964B (en) Text classification and model training method, device, equipment and storage medium
Maleki et al. Improvement of credit scoring by lstm autoencoder model
He et al. An incremental kernel density estimator for data stream computation
CN115438755A (en) Incremental training method and device of classification model and computer equipment
CN113128544B (en) Method and device for training artificial intelligent model
CN116306612A (en) Word and sentence generation method and related equipment
CN114282058A (en) Method, device and equipment for model training and video theme prediction
CN114254106A (en) Text classification method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant