CN110147444B

CN110147444B - Text prediction method and device based on neural network language model and storage medium

Info

Publication number: CN110147444B
Application number: CN201811435778.XA
Authority: CN
Inventors: 陈强
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2018-11-28
Filing date: 2018-11-28
Publication date: 2022-11-04
Anticipated expiration: 2038-11-28
Also published as: CN110442721B; CN110147444A; CN110442721A

Abstract

The embodiment of the invention discloses a text prediction method, a text prediction device and a storage medium based on a neural network language model; the method comprises the following steps: mapping the input text into corresponding feature vectors through an input layer of the model; calling an activation function through a hidden layer of the model, and outputting a first hidden layer expression of the corresponding feature vector to an output layer; decomposing the first hidden layer expression through an output layer of the model to obtain second hidden layer expressions corresponding to the first hidden layer expression under different hidden subjects respectively; respectively determining a cluster corresponding to each second hidden layer expression, and calling a normalization index function corresponding to the cluster to obtain probability distribution corresponding to the second hidden layer expressions; the clustering comprises a head clustering and a tail clustering, and the output probability of the text classification in the head clustering is different from the output probability of the text classification in the tail clustering; and fusing the probability distribution corresponding to each second hidden layer expression, and outputting the target text corresponding to the text based on the probability distribution obtained after fusion.

Description

Text prediction method and device based on neural network language model and storage medium

Technical Field

The present invention relates to natural language processing technologies, and in particular, to a text prediction method, device, and storage medium based on a neural network language model.

Background

With the development of natural language processing technology, language models based on Recurrent Neural Network (RNN) architecture are increasingly applied to processing multi-category problems, however, when the categories to be processed are huge (such as 100K or even 1B), the training efficiency of language models in related technologies is low, and even training cannot be performed due to limited computing resources.

Disclosure of Invention

The embodiment of the invention provides a text prediction method, a text prediction device and a storage medium based on a neural network language model, which can improve the representation capability of the language model and improve the training efficiency of the language model.

The technical scheme of the embodiment of the invention is realized as follows:

in a first aspect, an embodiment of the present invention provides a text prediction apparatus based on a neural network language model, where the neural network language model includes an input layer, a hidden layer, and an output layer, the apparatus including:

the mapping module is used for mapping the input text into corresponding feature vectors through the input layer;

the hidden layer expression module is used for calling an activation function through the hidden layer to obtain a first hidden layer expression corresponding to the characteristic vector;

the output module is used for decomposing the first hidden layer expression through the output layer to obtain second hidden layer expressions corresponding to the first hidden layer expression under different hidden subjects respectively;

respectively determining a cluster type corresponding to each second hidden layer expression, and calling a normalization index function corresponding to the cluster type to obtain probability distribution corresponding to the second hidden layer expressions; the cluster category comprises a head cluster and a tail cluster, and the output probability of the text classification in the head cluster is different from the output probability of the text classification in the tail cluster;

and fusing the probability distribution corresponding to each second hidden layer expression, and outputting the target text corresponding to the text based on the probability distribution obtained after fusion.

In a second aspect, an embodiment of the present invention provides a text prediction method based on a neural network language model, including:

inputting a text to an input layer of the neural network language model to map the text into corresponding feature vectors;

calling an activation function through a hidden layer of the neural network language model to obtain a first hidden layer expression corresponding to the feature vector;

decomposing the first hidden layer expression through an output layer of the neural network language model to obtain second hidden layer expressions corresponding to the first hidden layer expression under different hidden subjects respectively;

In a third aspect, an embodiment of the present invention provides a text prediction apparatus based on a neural network language model, where the apparatus includes:

a memory for storing an executable program;

and the processor is used for realizing the text prediction method based on the neural network language model when executing the executable program stored in the memory.

In a fourth aspect, an embodiment of the present invention provides a storage medium, which stores an executable program, and when the executable program is executed by a processor, the method for text prediction based on a neural network language model is implemented.

The application of the embodiment of the invention has the following beneficial effects:

1) Decomposing the first hidden layer expression of the text by an output layer of the neural network language model to obtain second hidden layer expressions corresponding to the first hidden layer expression under different hidden subjects respectively; the actual expression dimension of the model is expanded, and the integral characterization capability of the model is improved;

2) The plurality of text classifications are clustered to form a plurality of clustering categories including head clustering and tail clustering, the specific clustering category corresponds to the normalization index function, and different clustering categories correspond to different normalization index functions; because the output probability of the text classification in the head cluster is different from the output probability of the text classification in the tail cluster, the training chances of the normalization index functions corresponding to different cluster classes are unequal in the training process of the neural network language model, the parameters of the normalization index functions corresponding to the cluster classes with high output probability of the text classification are obviously updated frequently in the training process, when the number of the text classifications is large, the parameters of the normalization index functions corresponding to the cluster classes with low output probability are prevented from being frequently updated in the model training process, the model training efficiency is improved, and hardware resources are saved.

Drawings

FIG. 1 is a schematic diagram of a neural network language model according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a neural network language model according to an embodiment of the present invention;

FIG. 3 is a functional diagram of a softmax layer provided by an embodiment of the present invention;

FIG. 4 is a schematic diagram of an architecture of a neural network language model according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of an architecture of a neural network language model according to an embodiment of the present invention;

fig. 6 is a schematic flowchart of a text prediction method based on a neural network language model according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of a text prediction apparatus based on a neural network language model according to an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail below with reference to the accompanying drawings and examples. It should be understood that the examples provided herein are merely illustrative of the present invention and are not intended to limit the present invention. In addition, the following embodiments are provided as partial embodiments for implementing the present invention, not all embodiments for implementing the present invention, and the technical solutions described in the embodiments of the present invention may be implemented in any combination without conflict.

It should be noted that, in the embodiments of the present invention, the terms "comprises", "comprising" or any other variation thereof are intended to cover a non-exclusive inclusion, so that a method or apparatus including a series of elements includes not only the explicitly recited elements but also other elements not explicitly listed or inherent to the method or apparatus. Without further limitation, an element defined by the phrases "comprising a component of 8230; \8230;" does not exclude the presence of additional related elements in a method or apparatus that comprises the element (e.g., steps in a method or elements in an apparatus, such as elements that may be part of a circuit, part of a processor, part of a program or software, etc.).

The neural network language model provided by the embodiment of the invention is used for predicting the probability distribution of the nth word through the input n-1 words, namely predicting the probability of a word appearing at the next position through the neural network language model when the previous words are known.

As an embodiment of a neural network language model, fig. 1 is a schematic structural diagram of the neural network language model provided in the embodiment of the present invention, and referring to fig. 1, the neural network language model includes an input layer, a hidden layer, and an output layer;

an input layer: by a mapping matrix C (the size of the matrix is | V | m, where | V | is the vocabulary size and V = { w = { _1， w _2，… w _|V| M is the dimension of a word vector), the first n-1 discrete words are mapped into n-1 m-dimensional vectors, namely, the words are changed into the word vectors in a table look-up mode, and then the n-1 m-dimensional vectors are connected end to form an m (n-1) vector, which is the input vector x of the neural network.

Hiding the layer: the number of nodes of the hidden layer is H, in order to convert an m (n-1) vector x output by the input layer into an input of the hidden layer (with a dimension of H), a parameter matrix H is required between the input layer and the hidden layer (the scale of H is H × m (n-1)), and a bias d is required, the change can be expressed as f (x) = Hx + d, which is a linear transformation, the output of the hidden layer needs to perform a non-linear transformation on the vector subjected to the linear transformation, in an embodiment, the selected activation function 1 is tanh/th (hyperbolic tangent), and the output of the corresponding hidden layer is tanh (Hx + d).

An output layer: the transfer from the hidden layer to the output layer also requires a linear transformation and a non-linear transformation, first converting the dimension of the output vector of the hidden layer into the number of nodes corresponding to the number of nodes of the output layer by the linear transformation, and in order to represent the output in the form of a probability distribution (the sum of the values in each dimension is 1), a non-linear transformation is required to be performed on the input of the output layer (i.e. the output of the hidden layer), and in one embodiment, the activation function 2 used is softmax (normalized exponential function) output probability distribution p.

In an embodiment, the number of hidden layers of the neural network language model is two, and the hidden layers are respectively present as feature layers, fig. 2 is a schematic architecture diagram of the neural network language model provided in the embodiment of the present invention, referring to fig. 2, a softmax layer is present as an output layer, data is processed by an input layer and the two feature layers, and finally, probability values with categories of y =0, y =1, and y =2 are obtained by the softmax layer.

With continuing reference to fig. 3, fig. 3 is a functional schematic diagram of the softmax layer according to the embodiment of the present invention, where 1, 2, and 3 represent three inputs, and the three inputs pass through the softmax to obtain an array [0.88,0.12,0], which respectively represents output probabilities of corresponding categories.

Fig. 4 is a schematic diagram of an architecture of a neural network language model according to an embodiment of the present invention, and referring to fig. 4, the neural network language model includes an input layer, a hidden layer, and an output layer; wherein, the hidden layer is implemented by a Recurrent Neural Network (RNN), specifically a Long Short-Term Memory Network (LSTM) in fig. 4; the activation function model corresponding to the output layer is a mixed Softmax model (MoS, mixture of Softmax).

Adopting a neural network language model based on a hybrid Softmax model, decomposing hidden layer expressions (hidden, vectors or matrixes) output by a hidden layer, namely hidden states output by the hidden layer into N hidden layer expressions (hidden, having the same dimensionality and data type with a source hidden) before the output of the neural network language model to a Softmax layer, then respectively carrying out weight distribution (weight sum is 1) according to the new N hidden layer expressions, carrying out independent Softmax calculation on each new hidden layer expression obtained by decomposition, outputting classification probability distribution, and finally carrying out weighted summation on all output classification probability distribution according to the weight distribution obtained by calculation to obtain the classification probability distributionAnd finally, carrying out classification probability distribution, and then carrying out target loss calculation. Referring to fig. 4, in which,

representing the t-th word in the word sequence w,

representing output hidden layer representation to LSTM layer

Performing expression decomposition into multiple hidden expressions, and passing each hidden expression

Obtaining corresponding multi-classification probability distribution expression after Softmax operation

，

Expressing the corresponding weight value of each hidden layer, and then according to

Weighting and summing the probability distribution of all the hidden layers to obtain the final probability distribution, thereby predicting the next word

The neural network language model based on the hybrid Softmax model is applied, actual expression dimensionality of Softmax is expanded in a mode of calculating a plurality of Softmax after hidden layer expression is decomposed, and finally the purpose of improving the overall representation capacity of the model is achieved; and the information is basically kept complete in the model processing process by a mode of implicit expression decomposition and information fusion. However, softmax needs to perform an exponential operation on all classifications at each calculation, and thus, when the vocabulary is large in scale, a large amount of computing resources are consumed, which requires a high-performance calculator (for example, most operations in a neural network are matrix operations, and thus an expensive Graphics Processing Unit (GPU) may need to be configured), and meanwhile, a large amount of intermediate values need to be stored in the calculation process, which occupies a storage environment, which requires a large-storage flash memory (for example, a large-storage memory) or a hard disk exchange area, which causes a large cost for model training and limits a hardware environment required for training. Because Softmax needs to perform an exponential operation on all classes during each calculation, and most of the classes (in an embodiment, each word can be regarded as one class) in some training examples or training examples batch are rarely involved, such calculation actually wastes not only computing resources, but also greatly increases training time, so that training speed is greatly reduced, and training efficiency of the language model is seriously affected.

In order to solve the problem that the traditional Softmax method occupies a large Memory and causes Memory leakage (OOM, out Of Memory) due to the fact that the number Of classes is large in a multi-classification (classification with a large number Of classes) task, and finally training cannot be performed under the condition that the current hardware storage is limited, a self-Adaptive Softmax model (Adaptive Softmax) can be adopted, firstly, classes are arranged in a reverse order (namely, the classes are arranged from high to low according to the frequency) appearing in training data according to the frequency Of the classes (in one embodiment, each word can be regarded as one class, and different words are different classes), then, the classes are sequentially traversed and accumulated, the classes are clustered according to a preset statistical strategy, the large difference Of the total frequency Of the clusters is ensured, each class identifier (ID, IDdentifier) is allocated to each class, and an independent Softmax model is designed in model training; and when the target output of the training data belongs to a certain cluster, training and updating the parameters of the softmax model to which the corresponding cluster belongs, and performing multiple rounds of training on the training data set until the training is converged.

In the Adaptive Softmax model, the first cluster is called the Head class (i.e., head cluster) because the probability of the total word frequency occurrence is the greatest, meaning that the frequency of updating in the training is the highest, while the subsequent cluster is called the Tail class (i.e., tail cluster) because the category in the data has a lower frequency of occurrence. In practical application, the scale of the head cluster is below 10K, so that less hardware resources are occupied under the condition of high-frequency access, the calculation speed is high, and frequent updating of a large number of Softmax model parameters where non-high-frequency classes are located in training is avoided, so that the training efficiency is guaranteed while hardware resources are saved. In order to ensure that the Softmax models of the clusters where all the categories are located are updated, adaptive Softmax adds IDs of all the Tail classes to the end of the first cluster, and when the categories in the training sample do not appear in the Head cluster, the Tail classes to which the Tail classes belong can be found according to the IDs of the Tail classes, and then the Softmax models corresponding to the Tail classes are trained.

After the Adaptive Softmax model is applied and the classification targets are clustered according to a certain strategy, partial classification is guaranteed to be called in the calculation process, so that resource exhaustion caused by calculation resource null calculation is avoided, and the adaptivity of the Adaptive Softmax model is embodied in that only partial classification is called for calculation according to self conditions in different training samples.

In an embodiment, the Adaptive Softmax method is used to replace the traditional Softmax method in the hybrid Softmax model, that is, the Adaptive Softmax is introduced into the MoS to form a hybrid Adaptive Softmax model (MoAS), and the respective advantages of the MoS and the Adaptive Softmax are combined to ensure that any multi-class model can be trained normally while improving the model performance.

As an embodiment of the neural network language model, fig. 5 is a schematic structural diagram of the neural network language model provided in the embodiment of the present invention, and referring to fig. 5, the neural network language model provided in the embodiment of the present invention includes: an input layer, a hidden layer and an output layer; wherein,

the input layer is used for mapping the input text into corresponding characteristic vectors and inputting the hidden layer;

the hidden layer is used for calling an activation function based on the input feature vector and outputting a first hidden layer expression corresponding to the feature vector to an output layer;

the output layer is used for decomposing the first hidden layer expression to obtain second hidden layer expressions corresponding to the first hidden layer expression under different hidden subjects respectively;

In one embodiment, the input text is words, the words are mapped into corresponding input vectors after being input into the input layer of the language model, and then the hidden layer expression is obtained by processing the words through the hidden layer

I.e. the first hidden layer representation.

Referring to fig. 5, the output layer in the neural network language model according to the embodiment of the present invention is mosas, that is, the conventional Softmax method in the hybrid Softmax model is replaced with the adaptive Softmax method, specifically, as shown in fig. 4

Alternative to that in figure 5

。

In one embodiment, the output layer adopts multiple sets of parameters to construct N independent full-connection networks

Expressing the hidden layer

Applying to the fully-connected network model corresponding to different hidden subjects to obtain

Hidden layer expression corresponding to multiple hidden subjects

I.e. the second hidden layer representation; in particular, the amount of the solvent to be used,

in an embodiment, the output layer is further configured to determine a weight of each of the second hidden layer expressions under the corresponding hidden subject

In the case of a liquid crystal display device, in particular,

wherein,

and expressing the weight of the ith second hidden layer obtained by decomposition under the corresponding hidden subject. It is explained here that hidden subject, in practical application, a sentence or a document is often belonged to a subject, and if a sentence about sports is found in a document of a technical subject suddenly, it will certainly feel strange, which is to say that the subject consistency is destroyed.

In an embodiment, the output layer is further configured to cluster a plurality of text classifications according to frequencies of occurrence of the text classifications in training data to obtain one head cluster and at least one tail cluster.

Specifically, the output layer sorts the plurality of text classifications in the order from high frequency to low frequency to obtain a text classification sequence; traversing the text classification sequence, and accumulating the frequency of text classification; stopping the traversal when the accumulated frequency of the text classification meets a preset condition, and taking a set formed by all the text classifications traversed in the text classification sequence as a head cluster; in practical application, the cumulative frequency of text classification meeting the preset condition may be: the percentage of the accumulated frequency of the text classification to the total frequency reaches a preset percentage threshold, such as 80%;

the output layer continuously traverses the remaining unretraversed text classification sequences in the text classification sequences and accumulates the frequency of text classification; stopping the traversal when the accumulated frequency of the text classification meets a preset condition, and taking a set formed by all the text classifications traversed this time in the text classification sequence as a tail cluster; the method for obtaining the tail clusters is adopted to obtain one or more tail clusters, if the number of the current tail clusters does not reach the preset number (specifically, the number can be set according to actual needs), the output layer repeatedly executes the following operations until the number of the tail clusters is the preset number:

continuously traversing the remaining unretraversed text classification sequences in the text classification sequences, and accumulating the frequency of text classification; and when the accumulated frequency of the text classification meets a preset condition, stopping the traversal, and taking a set formed by all the text classifications traversed this time in the text classification sequence as a tail cluster.

In practical applications, the cluster categories in the normal case include head clusters and tail clusters, and the special case may include only head clusters.

In an embodiment, the output layer is further configured to assign a class ID to each tail cluster; correspondingly, the head cluster also includes the class IDs of the tail clusters of the preset number.

In actual implementation, the number of clustering categories is set

(1 Head classification, M-1 Tail classification), performing frequency statistics on all classifications in training data, and arranging according to descending order to obtain a sequence classification sequence

(ii) a Then from high to low in frequencyThe sequence is traversed and classified, the frequency is accumulated and counted, and when the current classification is traversed

(in practical implementation, the p-th word in the word list V can be used), the accumulated frequency of classification reaches 80% of the total frequency, the traversal is stopped, and the position of the beginning of the sequence of sequential classification is determined

To the current position

As Head clusters (Head classes), while all tails (Tail classes) are clustered (i.e., IDs) of

) Adding head clustering to obtain:

；

reset order classification

Sequentially obtaining according to the acquisition mode of the head cluster

。

Based on the above description of clustering, the following description will discuss the training of the output layer MoAS model.

In an embodiment, the output layer is further configured to determine a cluster class corresponding to a second hidden layer expression of the training data, and then train a normalized exponential function corresponding to the cluster class to predict performance of corresponding target data according to the second hidden layer expression of the training data, with the second hidden layer expression of the training data as input and the target data corresponding to the training data as output. In practical implementation, after target clustering is performed on the vocabulary, each cluster corresponds to a respective Softmax model, and after the cluster to which the input training data belongs is determined, only the Softmax model parameters of the corresponding cluster are updated.

In practical implementation, the output layer applies a normalized exponential function (Softmax) corresponding to the head cluster to the second hidden layer expression to obtain a probability distribution corresponding to the second hidden layer expression; determining a text (word) corresponding to the maximum value of the probability distribution corresponding to the second hidden layer expression; and determining a cluster corresponding to the second hidden layer expression according to the determined text.

In particular, with continued reference to FIG. 5, in actual practice, for training data, it is first mapped to its corresponding cluster, e.g., for training data batch

Classified according to its target

(ii) a Wherein,

for batch size, the data is mapped to its corresponding cluster, respectively, assuming

The hidden layer obtained by LSTM calculation is expressed as

Then the mapping result is:

then, calculating the loss of Softmax corresponding to each hidden layer expression obtained after the hidden layer expression decomposition in the corresponding data item; for the

Where k corresponds to the decomposed k-th layer implicit expression, assuming

The corresponding loss calculation is shown in equation (3):

wherein,

which represents a cross-entropy calculation of the cross-entropy,

the model parameters of Softmax expressed for the corresponding hidden layer of the k layer.

The loss calculation of the whole training data batch is shown as the formula (4):

still taking the training data as batch

The training of the output layer MoAS model of the language model according to the embodiment of the present invention is described as an example.

batch

The input layer passing through the language model is mapped into corresponding characteristic vectors, and the hidden layer expression is output through the hidden layer

Then, through the hidden layer expression decomposition of the output layer, will

Is decomposed into

(ii) a Wherein,

for training samples

To (1) a

A decomposition of the hidden-layer expression vector,

to decompose the number of hidden layers; at the same time by the formula

Calculating weights corresponding to the hidden subjects

Wherein

，

Is a scalar quantity.

Mapping the data items corresponding to the training data batch under each cluster, and resetting the data batch data items obtained by each mask according to the number of the subjects, specifically:

respectively mapping the data to the corresponding clusters, and obtaining a mapping result as follows:

then, for each sub data batch block

And operating according to the formula (5) to obtain a new batch data block (block):

wherein,

it is shown that the operation of the cascade is,

and

respectively represent clusters

First of

First of a training example

Decomposition hidden layer expression and corresponding weight，

Representing clusters

First of

Target classification for each training example.

Then, the class probability distribution of each Softmax on the corresponding reset batch data item is calculated according to equation (6):

then, based on the weight, in accordance with equation (7)

And carrying out weighted summation to obtain the loss after the class probability distribution of each batch data item:

finally, the loss of the entire batch is calculated according to equation (8):

the model training adopts a feedforward neural network (BP) mode, and in practical application, the training of the neural network language model provided by the embodiment of the invention can adopt one machine with multiple cards or multiple machines with multiple cards for training; here, the multi-card refers to a device having a plurality of GPUs/Field Programmable Gate Arrays (FPGAs)/Application Specific Integrated Circuits (ASICs) for model parameter calculation, and the multi-card refers to a cluster of devices having the multi-card.

In an embodiment, class-based Softmax may also be introduced into MoS, and since the design of Class-based Softmax and adaptive Softmax is also to solve the training problem caused by the huge number of classes, it may be replaced with adaptive Softmax in the embodiment of the present invention.

In an embodiment, noise Contrast Estimation (NCE) can be introduced into MoS, and NCE adopts a negative sampling method, and trains a model by a positive and negative sample Loss comparison method, which is helpful for increasing the model training speed.

Next, an application scenario of the neural network language model provided in the embodiment of the present invention is explained.

In natural language processing and many scenarios in the speech field, language models play an important role, such as optimizing translation results by the language models in machine translation, and decoding by the language models together with acoustic model results in speech recognition to improve recognition effects. For example, the input pinyin string is nixianzaiganshenme, the corresponding output can be in various forms, such as what you do now, what you catch up with you again in west' an, and the like, and then which one is the correct conversion result, and by using the neural network language model, we know that the probability of the former is greater than that of the latter, so that the conversion into the former is more reasonable under most conditions. As another example of machine translation, given a Chinese sentence Li Ming watching TV at home, it can be translated into Li Ming is watching TV at home, li Ming at home watching TV, etc. also according to the language model, we know that the probability of the former is greater than that of the latter, so the translation into the former is more reasonable.

The language modeling based on the RNN framework is a typical multi-classification problem with huge categories, the number of word lists is the number of categories, and the scale of the word lists in natural language often reaches the magnitude of 100K or even 1B, which means that the problem that the model cannot be trained due to the limitation of computing resources is very likely to occur, and the neural network language model provided by the embodiment of the invention can be perfectly suitable for the language modeling problem with the large word lists.

Fig. 6 is a schematic flow diagram of a text prediction method based on a neural network language model according to an embodiment of the present invention, and referring to fig. 6, a text prediction method based on a neural network language model according to an embodiment of the present invention includes:

step 101: inputting text into an input layer of the neural network language model to map the text into corresponding feature vectors.

Here, in practical applications, the input text may be a word sequence, and the word sequence maps discrete words into corresponding m-dimensional vectors through the mapping matrix C of the input layer, as the input of the hidden layer.

Step 102: and calling an activation function through a hidden layer of the neural network language model to obtain a first hidden layer expression corresponding to the feature vector.

In one embodiment, the activation function called by the hidden layer is a tanh function, and the input vector passes through the hidden layer and outputs a first hidden layer expression (hidden, vector or matrix) corresponding to the input vector.

Step 103: and decomposing the first hidden layer expression through an output layer of the neural network language model to obtain second hidden layer expressions corresponding to the first hidden layer expression under different hidden subjects respectively.

Here, in practical implementation, the output layer adopts multiple sets of parameters to construct N independent fully-connected networks

Expressing the hidden layer

Applying the model to the fully-connected network corresponding to different hidden subjects to obtain

Hidden layer expressions corresponding to multiple hidden topics

I.e. the second hidden layer representation, the dimensions and data types of the second hidden layer representation are the same as those of the first hidden layer representation.

In an embodiment, after the output layer performs hidden layer expression decomposition, a weight of each second hidden layer expression under the corresponding hidden topic is further determined, which can be specifically implemented according to formula (2).

Step 104: respectively determining a cluster type corresponding to each second hidden layer expression, and calling a normalization index function corresponding to the cluster type to obtain probability distribution corresponding to the second hidden layer expression; the cluster category comprises a head cluster and a tail cluster, and the output probability of the text classification in the head cluster is different from the output probability of the text classification in the tail cluster.

Here, in actual implementation, the output layer clusters a plurality of text classifications according to the frequency of occurrence of the text classifications in the training data to obtain at least one head cluster and at least one tail cluster. Each cluster corresponds to a respective normalized exponential function (Softmax), in particular:

sequencing the plurality of text classifications according to the sequence of the frequency from high to low to obtain a text classification sequence; traversing the text classification sequence, and accumulating the frequency of text classification; when the accumulated frequency of the text classifications meets a preset condition, stopping the traversal, and taking a set formed by all the text classifications traversed in the text classification sequence as the head cluster;

traversing the remaining unretraversed text classification sequences in the text classification sequence, and accumulating the frequency of text classification; stopping the traversal when the accumulated frequency of the text classification meets a preset condition, and taking a set formed by all the text classifications traversed this time in the text classification sequence as a tail cluster; and repeatedly executing the operations of traversing, frequency accumulating and preset condition judging until the number of the acquired tail clusters is Q, wherein Q is a preset positive integer.

In an embodiment, the method further comprises: assigning a class ID to the tail cluster; correspondingly, the head cluster may further include the class IDs of the Q tail clusters.

In an embodiment, the cluster category corresponding to each of the second hidden layer expressions may be determined separately as follows:

applying a normalization index function corresponding to the head cluster to the second hidden layer expression to obtain probability distribution corresponding to the second hidden layer expression; determining a text corresponding to the maximum value of the probability distribution corresponding to the second hidden layer expression; and determining the cluster category corresponding to the second hidden layer expression according to the determined text.

Step 105: and fusing the probability distribution corresponding to each second hidden layer expression, and outputting the target text corresponding to the text based on the probability distribution obtained after fusion.

In one embodiment, the corresponding probability distribution of each of the second hidden layer expressions may be fused as follows:

determining a weight of each of the second hidden layer representations under the respective hidden topic; and based on the weight of each second hidden layer expression under the corresponding hidden subject, carrying out weighted summation on the probability distribution corresponding to each second hidden layer expression to obtain the fused probability distribution.

The embodiment of the invention also provides a text prediction device based on the neural network language model, the neural network language model comprises an input layer, a hidden layer and an output layer, and the device comprises:

In some embodiments, the output module is further configured to cluster the plurality of text classifications according to a frequency of occurrence of the text classifications in the training data to obtain at least one of the head clusters and at least one of the tail clusters.

In some embodiments, the output module is further configured to apply a normalized exponential function corresponding to the head cluster to the second hidden layer expression to obtain a probability distribution corresponding to the second hidden layer expression;

determining a text corresponding to the maximum value of the probability distribution corresponding to the second hidden layer expression;

and determining a cluster corresponding to the second hidden layer expression according to the determined text.

In some embodiments, the output module is further configured to determine a cluster category corresponding to a second hidden layer expression of the training data;

and taking the second implicit expression of the training data as input, taking the target data corresponding to the training data as output, and training the normalized exponential function corresponding to the cluster category to predict the performance of the corresponding target data according to the second implicit expression of the training data.

Fig. 7 is a schematic structural diagram of a text prediction apparatus based on a neural network language model according to an embodiment of the present invention, and referring to fig. 7, the text prediction apparatus based on the neural network language model according to the embodiment of the present invention includes: at least one processor 210, memory 240, at least one network interface 220, and a user interface 230. The various components in the device are coupled together by a bus system 250. It is understood that the bus system 250 is used to enable connected communication between these components. The bus system 250 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are designated as bus system 250 in FIG. 7.

The user interface 230 may include a display, keyboard, mouse, trackball, click wheel, keys, buttons, touch pad, touch screen, or the like.

The memory 240 may be either volatile memory or nonvolatile memory, and may include both volatile and nonvolatile memory. The nonvolatile Memory may be a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Programmable Read-Only Memory (EPROM), a Flash Memory (Flash Memory), or the like. Volatile Memory can be Random Access Memory (RAM), which acts as external cache Memory.

The Processor 210 may be an integrated circuit chip having Signal processing capabilities, such as a general purpose Processor, a Digital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like, wherein the general purpose Processor may be a microprocessor or any conventional Processor, or the like.

The memory 240 is capable of storing executable instructions 2401 to support the operation of the message processing apparatus, examples of which include: various forms of software modules such as programs, plug-ins, and scripts for operating on a message processing device may include, for example, an operating system and application programs, where the operating system contains various system programs such as a framework layer, a core library layer, a driver layer, etc. for implementing various underlying services and handling hardware-based tasks.

In one embodiment, a memory for storing an executable program;

a processor, configured to implement, when executing the executable program stored in the memory:

In an embodiment, the processor is further configured to cluster the plurality of text classifications according to frequencies of occurrence of the text classifications in the training data to obtain at least one head cluster and at least one tail cluster.

In an embodiment, the processor is further configured to sort the text classifications in an order from high to low according to the frequency to obtain a text classification sequence;

traversing the text classification sequence, and accumulating the frequency of text classification;

and when the accumulated frequency of the text classification meets a preset condition, stopping the traversal, and taking a set formed by all the text classifications traversed in the text classification sequence as the head cluster.

In an embodiment, the processor is further configured to repeatedly perform the following operations until a predetermined number of tail clusters are obtained:

traversing the remaining unretraversed text classification sequences in the text classification sequence, and accumulating the frequency of text classification;

and when the accumulated frequency of the text classification meets a preset condition, stopping the traversal, and taking a set formed by all the text classifications traversed this time in the text classification sequence as a tail cluster.

In an embodiment, the processor is further configured to assign a class ID to each of the tail clusters;

correspondingly, the class IDs of the predetermined number of tail clusters are also included in the head cluster.

In an embodiment, the processor is further configured to apply a normalized exponential function corresponding to the head cluster to the second hidden layer expression to obtain a probability distribution corresponding to the second hidden layer expression;

and determining the cluster category corresponding to the second hidden layer expression according to the determined text.

In an embodiment, the processor is further configured to determine a cluster category corresponding to a second hidden layer expression of the training data;

In an embodiment, the processor is further configured to determine a weight of each of the second hidden layers expressed under the corresponding hidden subject;

and based on the weight of each second hidden layer expression under the corresponding hidden subject, carrying out weighted summation on the probability distribution corresponding to each second hidden layer expression to obtain fused probability distribution.

In an embodiment, the processor is further configured to apply the first hidden layer expression to fully-connected network models corresponding to different hidden topics, and call an activation function to output second hidden layer expressions corresponding to the first hidden layer expression and the second hidden layer expressions corresponding to different hidden topics, respectively.

The embodiment of the invention also provides a storage medium which stores an executable program, and when the executable program is executed by a processor, the text prediction method based on the neural network language model is realized.

Here, it should be noted that: the description of the text prediction device related to the neural network language model is similar to the description of the method, and the description of the beneficial effects of the method is not repeated. For technical details not disclosed in the embodiment of the text prediction apparatus of the neural network language model of the present invention, please refer to the description of the embodiment of the method of the present invention.

All or part of the steps of the embodiments may be implemented by hardware associated with program instructions, and the program may be stored in a computer-readable storage medium, and when executed, performs the steps including the method embodiments; and the aforementioned storage medium includes: various media that can store program codes, such as a removable Memory device, a Random Access Memory (RAM), a Read-Only Memory (ROM), a magnetic disk, or an optical disk.

Alternatively, the integrated unit of the present invention may be stored in a computer-readable storage medium if it is implemented in the form of a software functional module and sold or used as a separate product. Based on such understanding, the technical solutions of the embodiments of the present invention or portions thereof contributing to the related art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: a removable storage device, a RAM, a ROM, a magnetic or optical disk, or various other media that can store program code.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims

1. A text prediction method based on a neural network language model is characterized by comprising the following steps:

respectively determining a cluster type corresponding to each second hidden layer expression, and calling a normalization index function corresponding to the cluster type to obtain probability distribution corresponding to the second hidden layer expression; the cluster category comprises a head cluster and a tail cluster, and the output probability of the text classification in the head cluster is different from the output probability of the text classification in the tail cluster;

2. The method of claim 1, wherein the method further comprises:

and clustering a plurality of text classifications according to the frequency of the text classifications appearing in the training data to obtain at least one head cluster and at least one tail cluster.

3. The method of claim 2, wherein clustering the plurality of text classifications based on how often the text classifications occur in the training data comprises:

sequencing the plurality of text classifications according to the sequence of the frequency from high to low to obtain a text classification sequence;

and when the accumulated frequency of the text classifications meets a preset condition, stopping the traversal, and taking a set formed by all the text classifications traversed in the text classification sequence as the head cluster.

4. The method of claim 3, wherein the method further comprises:

the following operations are repeatedly performed until a predetermined number of tail clusters are obtained:

5. The method of claim 4, wherein the method further comprises:

assigning a class identification ID to each tail cluster respectively;

6. The method of claim 1, wherein said separately determining a cluster class to which each of said second hidden layer expressions corresponds comprises:

applying a normalization index function corresponding to the head cluster to the second hidden layer expression to obtain probability distribution corresponding to the second hidden layer expression;

7. The method of claim 1, wherein the method further comprises:

determining a cluster type corresponding to a second hidden layer expression of the training data;

and taking the second hidden layer expression of the training data as input, taking the target data corresponding to the training data as output, and training the normalization index function corresponding to the cluster type to predict the performance of the corresponding target data according to the second hidden layer expression of the training data.

8. The method of claim 1, wherein said fusing each of said second hidden layer representations with a corresponding probability distribution comprises:

determining a weight of each of the second hidden layer expressions under the respective hidden subject;

9. The method of claim 1, wherein decomposing the first hidden layer expression into second hidden layer expressions corresponding to the first hidden layer expression under different hidden subjects comprises:

and applying the first hidden layer expression to the full-connection network models corresponding to different hidden topics, and calling an activation function to output second hidden layer expressions corresponding to the first hidden layer expression under different hidden topics respectively.

10. An apparatus for text prediction based on a neural network language model, wherein the neural network language model comprises an input layer, a hidden layer and an output layer, the apparatus comprising:

11. The apparatus of claim 10,

the output module is further configured to cluster the plurality of text classifications according to the frequency of occurrence of the text classifications in the training data to obtain at least one head cluster and at least one tail cluster.

12. The apparatus of claim 10,

the output module is further configured to apply a normalization index function corresponding to the head cluster to the second hidden layer expression to obtain a probability distribution corresponding to the second hidden layer expression;

13. The apparatus of claim 10,

the output module is further used for determining a cluster type corresponding to a second hidden layer expression of the training data;

14. A text prediction apparatus based on a neural network language model, comprising:

a memory for storing an executable program;

a processor for implementing the neural network language model-based text prediction method of any one of claims 1 to 9 when executing the executable program stored in the memory.

15. A storage medium storing an executable program which, when executed by a processor, implements the neural network language model-based text prediction method according to any one of claims 1 to 9.