CN108304501B

CN108304501B - Invalid hypernym filtering method and device and storage medium

Info

Publication number: CN108304501B
Application number: CN201810043574.5A
Authority: CN
Inventors: 郑孙聪; 李潇
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2018-01-17
Filing date: 2018-01-17
Publication date: 2020-09-04
Anticipated expiration: 2038-01-17
Also published as: CN108304501A

Abstract

The embodiment of the invention discloses a method and a device for filtering invalid hypernyms and a storage medium, which are applied to the technical field of information processing. When invalid hypernyms are filtered, performing word segmentation processing on the short text to be processed to obtain a first word segmentation processing result, then extracting semantic features of the first word segmentation processing result according to the hypernym classification model, and acquiring information whether the short text to be processed is the invalid hypernym according to the acquired semantic features to perform filtering processing. In this way, the semantic features of each character included in the short text to be processed are used as a reference for judging whether the short text to be processed is an invalid hypernym, so that the obtained information about whether the short text to be processed is the invalid hypernym is more accurate, and the filtering of the invalid hypernym is more accurate; in addition, the embodiment of the invention can directly obtain the information whether the short text to be processed is the invalid hypernym or not by adopting the hypernym classification model and each character included in the short text to be processed, and the calculation process is simpler.

Description

Invalid hypernym filtering method and device and storage medium

Technical Field

The present invention relates to the field of information processing technologies, and in particular, to a method and an apparatus for filtering invalid hypernyms, and a storage medium.

Background

The hypernyms refer to descriptors which can summarize at least two entities, such as animals, plants and the like, and the existing large-scale hypernym sets are obtained through the context relationship mined from the pure text. Because the plain text in the network is noisy and complex to express, some hypernyms without specific meaning, i.e. invalid hypernyms, are generated, such as: blue, or trivial, etc., which requires filtering of the invalid hypernym.

The filtering of the invalid hypernyms means that the invalid hypernyms are identified and filtered from the hypernym set, so that the quality of the hypernym set is improved. The existing invalid hypernym filtering method generally needs to manually define various invalid hypernyms and then filter the invalid hypernyms from the hypernym set, so that the method needs more manpower, and the final filtering result does not have generalization.

Another filtering method for invalid hypernyms in the prior art is a filtering method based on part-of-speech tagging, does not need manual participation, and specifically comprises the following steps: the part of speech of the hypernym is determined, and then the words consistent with the determined part of speech are filtered from the hypernym set. However, some hypernyms are generally embodied in the form of a phrase or short sentence, such as "poetry in the down dynasty" and the like, and the phrase or short sentence has no specific part of speech, so that the filtering method based on part of speech tagging is difficult to filter the hypernyms in the type of the "phrase or short sentence".

Disclosure of Invention

The embodiment of the invention provides a method and a device for filtering invalid hypernyms and a storage medium, which are used for determining whether a short text to be processed is an invalid hypernym according to a hypernym classification model.

A first aspect of an embodiment of the present invention provides a method for filtering an invalid hypernym, including:

performing word segmentation processing on the short text to be processed to obtain a first word segmentation processing result of the short text to be processed;

determining an hypernym classification model;

and extracting semantic features of the first word segmentation processing result according to the hypernym classification model, and acquiring information whether the short text to be processed is an invalid hypernym according to the semantic features so as to perform filtering processing.

A second aspect of the embodiments of the present invention provides a device for filtering invalid hypernyms, including:

the word segmentation unit is used for performing word segmentation processing on the short text to be processed to obtain a first word segmentation processing result of the short text to be processed;

the model determining unit is used for determining the hypernym classification model;

and the information classification unit is used for extracting the semantic features of the first character segmentation processing result according to the hypernym classification model and acquiring information whether the short text to be processed is an invalid hypernym or not according to the semantic features so as to carry out filtering processing.

A third aspect of the embodiments of the present invention provides a storage medium storing a plurality of instructions, the instructions being adapted to be loaded by a processor and to perform the method for filtering invalid hypernyms according to the first aspect of the embodiments of the present invention.

A fourth aspect of the embodiments of the present invention provides a terminal device, including a processor and a storage medium, where the processor is configured to implement each instruction; the storage medium stores a plurality of instructions adapted to be loaded by a processor and to perform a method of filtering invalid hypernyms according to the first aspect of an embodiment of the present invention.

As can be seen, in the method of this embodiment, when filtering the invalid hypernym, the word segmentation processing is mainly performed on the short text to be processed to obtain a first word segmentation processing result, then the semantic features of the first word segmentation processing result are extracted according to the hypernym classification model, and the information of whether the short text to be processed is the invalid hypernym is obtained according to the obtained semantic features, so as to perform the filtering processing. In this way, the semantic features of each character included in the short text to be processed are used as a reference for judging whether the short text to be processed is an invalid hypernym, so that the obtained information about whether the short text to be processed is the invalid hypernym is more accurate, and the filtering of the invalid hypernym is more accurate; and the information of whether the short text to be processed is the invalid hypernym can be directly obtained by adopting the hypernym classification model and each character included in the short text to be processed, and the calculation process is simpler.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a flow chart of a method for filtering invalid hypernyms according to an embodiment of the present invention;

FIG. 2 is a flow diagram of training a hypernym classification model according to an embodiment of the invention;

FIG. 3 is a schematic illustration of determining a first training sample in one embodiment of the present invention;

FIG. 4 is a flow diagram of determining initial values of fixed parameters of a feature extraction module in an initial module of hypernym classification, in accordance with an embodiment of the present invention;

FIG. 5 is a diagram illustrating training of hypernym classification models in an exemplary embodiment of the present invention;

FIG. 6 is a diagram illustrating a structure of a hypernym classification model according to an embodiment of the present invention;

FIG. 7 is a diagram illustrating a structure of another hypernym classification model according to an embodiment of the present invention;

FIG. 8 is a schematic structural diagram of a filtering apparatus for invalid hypernyms according to an embodiment of the present invention;

FIG. 9 is a schematic structural diagram of another filtering apparatus for invalid hypernyms according to an embodiment of the present invention;

fig. 10 is a schematic structural diagram of a terminal device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

The embodiment of the invention provides a method for filtering invalid hypernyms, which is mainly used for determining whether a short text to be processed is an invalid hypernym or not aiming at any short text (namely the short text to be processed), so that the invalid hypernym is filtered. Specifically, the method comprises the following steps:

performing word segmentation processing on the short text to be processed to obtain a first word segmentation processing result of the short text to be processed; determining an hypernym classification model; and extracting semantic features of the first word segmentation processing result according to the hypernym classification model, and acquiring information whether the short text to be processed is an invalid hypernym according to the semantic features so as to perform filtering processing.

In this way, the semantic features of each character included in the short text to be processed are used as a reference for judging whether the short text to be processed is an invalid hypernym, so that the obtained information about whether the short text to be processed is the invalid hypernym is more accurate, and the filtering of the invalid hypernym is more accurate; and the information of whether the short text to be processed is the invalid hypernym can be directly obtained by adopting the hypernym classification model and each character included in the short text to be processed, and the calculation process is simpler.

An embodiment of the present invention provides a method for filtering invalid hypernyms, which is mainly a method executed by a filtering apparatus for invalid hypernyms, and a flowchart is shown in fig. 1, where the method includes:

and 101, performing word segmentation on the short text to be processed to obtain a first word segmentation result of the short text to be processed.

Here, the short text to be processed is a text including at least one word, that is, the short text to be processed may be a word, such as "animal" or the like, or may be a phrase, such as "flying object" or the like.

It can be understood that, when the filter device for the invalid hypernym carries out word segmentation processing on the short text to be processed, one Chinese character can be taken as one character; one number as one word; for multiple english letters, if there is no space between any two adjacent english letters, the multiple english letters can be regarded as one word, for example, in the english letters, "china" and there is no space between two adjacent english letters, the "china" can be regarded as one whole word. Therefore, the first segmentation processing result obtained by the filtering means of the invalid hypernym may include all the words included in the short text to be processed.

And step 102, determining a hypernym classification model.

The hypernym classification model may be preset in the filtering apparatus for invalid hypernyms in advance before initiating the process of the embodiment, and is obtained mainly by using the marked valid hypernyms and the marked invalid hypernyms as training samples and training according to a certain training method. The hypernym classification model can be any type of classification model, such as a Convolutional Neural Network (CNN), or a Long Short-term memory (LSTM), or a combination of LSTM and CNN Networks.

Specifically, the hypernym classification model is mainly used for performing classification processing on whether the hypernym is an invalid hypernym, where the invalid hypernym refers to a descriptor that cannot summarize more than two entities, such as "under the leadership of the chinese political party", "common and commonly used", and the like.

And 103, extracting semantic features of the first character segmentation processing result according to the hypernym classification model determined in the step 102, and acquiring information whether the short text to be processed is an invalid hypernym according to the semantic features to perform filtering processing.

Here, the semantic feature of a certain word is used to indicate a specific meaning of the word in the language, and the semantic feature of the first word segmentation processing result may specifically include word vectors corresponding to all words included in the short text to be processed, each of the word vectors respectively describing the semantic feature of the corresponding word.

It should be noted that, the method in the foregoing embodiment determines whether a short text to be processed is an invalid hypernym according to the hypernym classification model, so as to perform filtering processing. In a specific embodiment, the hypernym classification model is obtained by training a filtering device requiring an invalid hypernym in advance through the following steps, and a flowchart is shown in fig. 2 and includes:

step 201, determining a first training sample, wherein the first training sample comprises marked effective hypernyms and marked ineffective hypernyms, and determining an initial model of hypernym classification.

It can be understood that the filtering means for invalid hypernyms can be specifically implemented by the following steps a1 to A3 when determining the first training sample, and the schematic diagram is shown in fig. 3:

a1, selecting invalid hypernyms in a preset hypernym set, and setting invalid marks of the selected invalid hypernyms, specifically, the invalid hypernym filtering device can select at least one type of invalid hypernyms as follows: an orientation word type, a year type, an adjective type, a non-noun type, and other types with obvious rules, etc. The first description template of various types of invalid hypernyms can be obtained as follows:

(1) type of azimuth word

The first description template of the invalid hypernym of the azimuth word type is as follows: short text that can end with an orientation word or end with an orientation word, and can begin with "at" or begin with "at," such as: under the leadership of Chinese administrative Party, in a control system, or on the basis of a pharmaceutical factory in Hebei, etc. Wherein, the leader of the Chinese political party can be added with the root word 'under' before the leader.

(2) The type of the year

The first description template of the invalid hypernym of the chronological type is: including dates, short text of special symbols "-" or the Chinese characters "to", etc. For example, "1948-1998", "1862 to 1873", "1871 to 1945", or "down to germany three years" etc.

(3) Adjective type

The first description template of the invalid hypernym of the adjective type is: the word before "can end with" or with "can be non-nouns, such as short text that is an adjective or the like. For example: the earliest, the happy, the common and common, the trivial or the salty and fresh and the like.

(4) Non-noun type

The first description template of the invalid hypernym of the non-noun type is: short text of non-nouns. For example: "counterfeiting", "expanding", "on the other hand", or "sweet" and the like.

(5) Other types of rules being obvious

The first descriptive template for other types of invalidity hypernyms for which the rule is obvious is: short text that can begin with certain words, where certain words may be "for" or "original name". For example, "do" may be added before "promoting socialist moral fashion" and "improving the steady-state and dynamic performance of the power system", and "original name" may be added before "monster".

When setting the null flag, the filtering device for the null hypernym may generate the null flag first, for example, by using the number "0" or other means, and then may add the null flag at the forefront or the rearmost of the selected null hypernym, or other means, which is not illustrated here. For example, if the selected invalid hypernym is "trivial", the invalid hypernym after the invalid flag is set is "0 trivial", and the like.

The filtering device for the invalid hypernym can determine the structure of the initial model of the hypernym classification when determining the initial model of the hypernym classification, specifically, the initial model of the hypernym classification can comprise a feature extraction module and a classification module, the feature extraction module is used for extracting semantic features, and the classification module is used for classifying the invalid hypernym or the valid hypernym according to the semantic features extracted by the feature extraction module; and then determining initial values of fixed parameters used by a feature extraction module and a classification module in an initial module for upper-level word classification in the calculation process.

The feature extraction module may specifically include a word vector layer, a convolution layer, and a pooling layer in the CNN network, and the classification module may include a classification layer. The word vector layer is used for acquiring a word vector of each word, and the word vector is used for expressing semantic features of the corresponding word; the convolution layer is used for fusing adjacent words according to word vectors of all words, for example, a word vector of 'north' and a word vector of 'Beijing' can be fused into a vector of 'Beijing'; and the pooling layer is used for performing dimension reduction processing on the vector after the convolution layer is fused.

In other embodiments, the feature extraction module may further include at least one type of extraction network and a concatenation layer, where any type of extraction network is used to obtain a word vector of each word; and the splicing layer is used for splicing at least one word vector extracted by the extraction network with the vector obtained by the pooling layer and then inputting the word vector into the classification layer for classification.

Specifically, at least one type of extraction network may be an LSTM network, and the splicing layer may directly perform simple splicing on the vectors, for example, the vector obtained by the pooling layer is a 100-dimensional vector, and the vector obtained by splicing the word vector extracted by at least one type of extraction network is a 100-dimensional vector, and the vector obtained by splicing the word vector extracted by the splicing layer is a 200-dimensional vector. Therefore, semantic features (namely word vectors) obtained by various types of extraction networks for each word included in any short text can be integrated, so that the vector input to the classification layer can reflect the actual semantics of any short text better.

The fixed parameter values are fixed parameters used by the feature extraction module and the classification module included in the initial model for hypernym classification in the calculation process respectively, and do not need to be assigned at any time, such as the parameter values of the parameters of weight, angle and the like. When determining the initial value of a fixed parameter, the initial value of a fixed parameter may be directly determined as a preset value, or may be determined by another method. Since the extraction of semantic features in the feature extraction module is important, and is directly related to the accuracy of obtaining whether the short text to be processed is invalid, in an application example, the initial value of the fixed parameter of the word vector layer in the feature extraction module, especially in the feature extraction module, may be determined through the following steps a11 to a14, and the flowchart is shown in fig. 4:

a11, determining a second training sample, wherein the second training sample comprises training texts.

A12, performing word segmentation processing on the training text to obtain a third word segmentation processing result, wherein the third word segmentation processing result comprises all words in the training text.

And A13, training according to the third character segmentation processing result to obtain a character vector of each character in the training text.

The training texts in the second training sample may be from the same source as the valid hypernyms and the invalid hypernyms in the first training sample, for example, both the training texts may be from a certain text system, so that it is ensured that the word vectors corresponding to the words included in the valid hypernyms and the invalid hypernyms in the first training sample can be found from the word vectors obtained by training in step a 13.

The word vectors of all the words in the training text are used for representing semantic features of all the words in the training text, and each word can correspond to one word vector. Specifically, the filtering apparatus for invalid hypernyms may perform training by using a deep learning method, for example, training by using a model such as Word Vector (Word Vector, Word2vec) when executing the training Word Vector in this step.

A14, determining the corresponding relation between each word and the word vector in the training text as the initial value of the fixed parameter of the word vector layer in the feature extraction module.

And A2, determining a second description template of the valid hypernym, wherein the second description template has the common information with the first description template of the selected invalid hypernym.

Specifically, if the invalid hypernym selected in the step a1 includes an invalid hypernym of the type of the hypernym, determining that the corresponding second description template is: ending with a noun, and can be short text beginning "at" or beginning "at," e.g., "a web novel carried on a reading site," or "a web novel carried on a reading site," etc. The common information of the second description template and the first description template is as follows: it can start with "on" or start with "on".

If the invalid hypernym selected in the above step a1 includes a chronological type of invalid hypernym, determining that the corresponding second description template is: short texts combining the age with nouns, such as "martial general in three kingdoms", or "poetry from down to the Ming dynasty" and the like. The second description template shares information with the first description template described above as the age.

If the invalid hypernym selected in the step a1 includes an invalid hypernym of the adjective type, determining that the corresponding second description template is: can end in "or" and include short texts of nouns, such as "Fujian", or "Tencent". The common information of the second description template and the first description template is as follows: can end in "at" or "at".

If the invalid hypernym selected in the step a1 includes a non-noun type invalid hypernym, determining that the corresponding second description template is: including short text of nouns. The common information of the second description template and the first description template is as follows: including a word.

If the invalid hypernym selected in the step a1 includes the obvious invalid hypernym with other rules, determining that the corresponding second description template is: short text that can begin with certain words, where certain words can be words such as "for," e.g., "a character in a game," etc. The common information of the second description template and the first description template is as follows: can start with certain words.

And A3, selecting the valid hypernym consistent with the second description template from the preset hypernym set, and setting the valid mark of the selected valid hypernym.

When an active marker is provided, the filtering means for the invalid hypernym may generate an active marker first, for example by the number "1" or otherwise, and then may add the active marker at the forefront or rearmost of the selected active hypernym, or other additions. For example, if the selected valid hypernym is "fujian", the valid hypernym after the valid flag is set is "1 fujian", and the like.

Step 202, performing word segmentation processing on each hypernym in the first training sample respectively to obtain a second word segmentation processing result, where the second word segmentation processing result includes all words included in each hypernym.

And step 203, classifying each hypernym in the first training sample according to the initial model of the hypernym classification and the second word segmentation processing result to obtain an initial classification result of whether each hypernym is invalid or not.

Specifically, the feature extraction module in the initial model for hypernym classification may extract semantic features of the second segmentation processing result, and the classification module may obtain information on whether each hypernym in the first training sample is invalid, that is, an initial classification result, according to the semantic features.

And step 204, calculating a function value of a loss function related to the initial model of the hypernym classification according to the initial classification result.

Here, the loss function related to the initial model for hypernym classification may be obtained according to the initial classification result, and may specifically be a cross entropy loss function, which is used to indicate information whether each hypernym in the first training sample determined according to the initial model for hypernym classification is invalid, and a difference between an actual valid flag or invalid flag of each hypernym in the training sample, that is, an error.

For example, if the initial model of hypernym classification determines that a hypernym is invalid and the hypernym has a valid flag, an error occurs; if the initial model of hypernym classification determines that a certain hypernym is invalid and the hypernym has an invalid flag, no errors are generated, and the errors are embodied by the cross entropy loss function.

Step 205, adjusting the fixed parameter value in the initial model of the hypernym classification according to the function value of the loss function to obtain the hypernym classification model.

Specifically, if the calculated loss function has a larger function value, for example, larger than a preset value, the fixed parameter value needs to be changed, for example, a weight value of a certain weight is increased, or an angle value of a certain angle is decreased, so that the function value of the loss function calculated according to the adjusted fixed parameter value is decreased.

It should be noted that, in the above steps 203 to 205, after performing the word segmentation processing on each hypernym in the first training sample to obtain the second word segmentation processing result, the fixed parameter value in the initial model for classifying hypernyms is adjusted once according to the second word segmentation processing result, and in practical applications, the above steps 203 to 205 need to be continuously executed in a loop until the adjustment on the fixed parameter value meets a certain stop condition.

Therefore, after the filtering apparatus for invalid hypernyms performs steps 201 to 205 of the above embodiment, it is further required to determine whether the current adjustment on the fixed parameter value meets the preset stop condition, and if so, the process is ended; if not, returning to the step of executing the steps 203 to 205 for the initial model of the hypernym classification after the fixed parameter value is adjusted. Namely, the steps of obtaining an initial classification result, calculating a function value of the loss function and adjusting a fixed parameter value are executed.

Wherein the preset stop condition includes but is not limited to any one of the following conditions: a first difference value between the currently adjusted fixed parameter value and the last adjusted fixed parameter value is smaller than a first threshold value, namely the adjusted fixed parameter value reaches convergence; and the adjustment times of the fixed parameter values reach preset times and the like.

The filtering method for invalid hypernyms of the present invention is described below with a specific application example, where the hypernym classification model in this embodiment is specifically a CNN model, and the method of this embodiment mainly includes the following two parts:

firstly, a schematic diagram of training a hypernym classification model is shown in fig. 5, and the training method includes:

step 301, selecting an effective hypernym and an ineffective hypernym from the preset hypernym combination, setting an effective mark of the effective hypernym, and setting an ineffective mark of the ineffective hypernym to form a first training sample.

Step 302, performing word segmentation processing on each hypernym in the first training sample respectively to obtain a second word segmentation processing result.

Step 303, determining an initial model of the hypernym classification, where the initial model may be specifically as shown in fig. 6, and includes: the word vector layer, the convolution layer, the pooling layer and the classification layer, and initializing parameter values of each fixed parameter in the initial model, namely determining initial values of the fixed parameters in each layer. Wherein:

(1) word vector layer:

suppose that a word sequence s corresponding to a certain hypernym is input as { w }₁,w₂...w_i+1,w_i+2...,w_nGet the word vector of each word after passing through the word vector layer

Wherein

Representing the word vector corresponding to the ith word.

The word vector layer is mainly obtained by means of searching, and the corresponding relation between each word and the word vector needs to be preset in the word vector layer. Therefore, when initializing the word vector layer, the training text may be determined first by the method shown in fig. 4, and the word segmentation processing is performed on the training text to obtain all words included in the training text, and a certain method is adopted to train to obtain the word vector corresponding to each word in the training text. And then inputting the corresponding relation between each word in the training text and the word vector into a word vector layer to initialize the word vector layer.

(2) And the convolution layer performs convolution operation and is used for fusing each word vector obtained by the word vector layer.

Suppose W_c∈R^k×dRepresenting a convolution kernel, b is the corresponding offset vector, k is the window size corresponding to the convolution kernel, and d is the dimension of the word vector. Convolution kernel W_cAlong the edge

Slide to

The calculation method is shown in the following formula 1:

wherein the content of the first and second substances,

is a convolution kernel W_cAccording to the word vector

To

The local semantic features of the hypernym are obtained through calculation, and in practical application, a plurality of different convolution kernels W can be used_ciMining the semantic features of keywords in a certain hypernym from different dimensions, and therefore, performing convolution kernel W_ciThe semantic features obtained can be shown as the following formula 2:

wherein, W_ciAnd n-k +1 ═ m represents the number of convolution kernels.

When initializing a convolutional layer, b, d, k and W in the convolutional layer can be determined_cAnd the initial value of the fixed parameter value.

(3) The pooling layer is used for performing dimensionality reduction processing on the vector obtained by the convolution layer fusion to select the most significant semantic features, and the specific calculation method can be shown as the following formula 3:

the semantic features obtained by the pooling layer can be expressed as:

(4) the classification layer may specifically be a flexible maximum transfer function (softmax) classification layer, and the calculation method may be shown in the following formula 4:

wherein W_outIs the weight matrix of the softmax classification level, T is the number of classes,

the symbols represent the vector point layers, r is a binary 0,1 vector sampled based on Bernoulli distribution, p_iRepresenting the probability that the input hypernym belongs to category i.

Upon initialization of the classification layer, T in the classification layer may be determined,

and r, and the like.

Step 304, inputting the second segmentation processing result obtained in step 302 into the initial model of hypernym classification determined in step 303, and finally obtaining information whether each hypernym in the first training sample is invalid or valid, namely an initial classification result.

Step 305, calculating a function value of a loss function related to the initial model of the hypernym classification according to the initial classification result, which can be specifically shown in the following formula 5:

wherein S is the hypernym marked with valid and the hypernym marked with invalid in the training sample, and theta is a fixed parameter value of the initial model of the hypernym classification.

Step 306, according to the function value of the loss function obtained in step 305, adjusting a fixed parameter value in the initial model of the hypernym classification, and returning to execute step 304 for the adjusted initial model of the hypernym classification. When the fixed parameter value in the initial model of the hypernym classification is adjusted, the fixed parameter value in the convolution layer and the classification layer is adjusted.

The above steps 304 to 306 are repeated, so that the initial model of the hypernym classification is continuously adjusted such that the function value of the loss function L obtained in the above formula 5 converges, i.e., is minimized. And if the function value of the loss function L is minimum after the initial model of the hypernym classification is adjusted for a certain time, the initial model of the hypernym classification adjusted for the time is the hypernym classification model obtained by final training.

Secondly, processing any short text according to the hypernym classification model

For any short text to be processed, word segmentation processing can be performed on the short text to be processed to obtain a first word segmentation processing result, and the first word segmentation processing result is input into the hypernym classification model obtained through training, so that information about whether the short text to be processed is an invalid hypernym can be obtained.

And if the short text is the invalid hypernym, filtering the short text to be processed, and if the short text is the valid hypernym, keeping the short text to be processed.

It should be noted that, the hypernym classification model described in the foregoing embodiment is specifically a CNN model, while in other specific embodiments, the hypernym classification model may be a combination of an LSTM network and a CNN network, and the specific structure may be as shown in fig. 7, specifically:

the word sequence corresponding to any hypernym can be respectively input into the CNN network and the LTSM network, wherein the word sequence can obtain the semantic features 1 (embodied in a vector form) of the keywords in the hypernym after passing through a word vector layer, a convolution layer and a pooling layer in the CNN network, and the word sequence also can obtain the semantic features 2 of the keywords in the hypernym after passing through the LTSM network.

After the semantic features 1 and the semantic features 2 are spliced through the splicing layer, the spliced semantic features are input into the classification layer, and the classification layer can obtain information whether a certain hypernym is an invalid hypernym according to the spliced semantic features.

An embodiment of the present invention further provides a filtering apparatus for invalid hypernyms, where a schematic structural diagram of the filtering apparatus is shown in fig. 8, and the filtering apparatus may specifically include:

the word segmentation unit 10 is configured to perform word segmentation on the short text to be processed to obtain a first word segmentation result of the short text to be processed;

a model determining unit 11, configured to determine a hypernym classification model, where the hypernym classification model is used to perform classification processing on whether the hypernym is an invalid hypernym;

and the information classification unit 12 is configured to extract semantic features of the first word segmentation processing result obtained by the word segmentation unit 10 according to the hypernym classification model determined by the model determination unit 11, and obtain information on whether the short text to be processed is an invalid hypernym according to the semantic features, so as to perform filtering processing.

It can be seen that, in the apparatus of this embodiment, when filtering an invalid hypernym, the word segmentation unit 10 is mainly used to perform word segmentation on a short text to be processed to obtain a first word segmentation result, and then the information classification unit 12 extracts semantic features of the first word segmentation result according to the hypernym classification model, and acquires information on whether the short text to be processed is an invalid hypernym according to the acquired semantic features to perform filtering processing. In this way, the semantic features of each character included in the short text to be processed are used as a reference for judging whether the short text to be processed is an invalid hypernym, so that the obtained information about whether the short text to be processed is the invalid hypernym is more accurate, and the filtering of the invalid hypernym is more accurate; and the information of whether the short text to be processed is the invalid hypernym can be directly obtained by adopting the hypernym classification model and each character included in the short text to be processed, and the calculation process is simpler.

Referring to fig. 9, in a specific embodiment, the filtering device for invalid hypernyms may further include, in addition to the structure shown in fig. 8: a sample determination unit 13, a function calculation unit 14, an adjustment unit 15, and a stop determination unit 16, wherein:

the sample determining unit 13 is configured to determine a first training sample, where the first training sample includes a marked valid hypernym and a marked invalid hypernym, and determine an initial model of the hypernym classification.

The sample determining unit 13 is configured to, when determining the first training sample, specifically select an invalid hypernym from a preset hypernym set, and set an invalid flag of the selected invalid hypernym; determining a second description template of a valid hypernym, the second description template having common information with the selected first description template of the invalid hypernym; and selecting effective hypernyms consistent with the second description template from the hypernym set, and setting effective marks of the selected effective hypernyms.

When selecting an invalid hypernym, the sample determining unit 13 may specifically select at least one of the following types of invalid hypernyms from a preset hypernym set: an orientation word type, a year type, an adjective type, and a non-noun type.

And when the sample determining unit 13 determines the second description template, if the selected invalid hypernym includes an invalid hypernym of the type of the hypernym, determining that the corresponding second description template is: short text ending with a noun and which can begin with "at" or begin with "at"; if the selected invalid hypernym comprises a chronological type of invalid hypernym, determining that the corresponding second description template is: short text of the year in combination with nouns; if the selected invalid hypernym comprises an invalid hypernym of the adjective type, determining that the corresponding second description template is: can end with "or" and includes short text of nouns; if the selected invalid hypernym comprises a non-noun type invalid hypernym, determining that the corresponding second description template is: including short text of nouns.

The sample determining unit 13 is specifically configured to determine a structure of an initial model of a hypernym classification when determining the initial model of the hypernym classification, where the initial model of the hypernym classification includes a feature extraction module and a classification module, the feature extraction module is configured to extract semantic features, and the classification module is configured to classify an invalid hypernym or an valid hypernym according to the semantic features extracted by the feature extraction module; and determining initial values of fixed parameters in the feature extraction module and the classification module.

Wherein the feature extraction module comprises: word vector layer, convolutional layer and pooling layer, the classification module comprising: a classification layer; the word vector layer is used for obtaining a word vector of each word, the word vector is used for representing semantic features of the corresponding word, the convolution layer is used for fusing adjacent words according to the word vector of each word, and the pooling layer is used for performing dimension reduction processing on the vector fused by the convolution layer.

When determining the initial value of the fixed parameter in the feature extraction module, the sample determination unit 13 is specifically configured to determine a second training sample, where the second training sample includes a training text; performing word segmentation processing on the training text to obtain a third word segmentation processing result, wherein the third word segmentation processing result comprises all words in the training text; training according to the third character segmentation processing result to obtain character vectors of all characters in the training text; and determining the initial value of the fixed parameter in the feature extraction module according to the word vectors of all the words in the training text.

The word segmentation unit 10 is further configured to perform word segmentation on each hypernym in the first training sample determined by the sample determination unit 13, so as to obtain a second word segmentation result.

The information classification unit 12 is configured to classify each hypernym in the first training sample according to the initial model of the hypernym classification and the second segmentation processing result obtained by the segmentation unit 10, so as to obtain an initial classification result of whether each hypernym is invalid.

A function calculating unit 14, configured to calculate a function value of a loss function related to the initial model of the hypernym classification according to the initial classification result obtained by the information classifying unit 12;

the adjusting unit 15 is further configured to adjust a fixed parameter value of the initial model of the hypernym classification according to the function value of the loss function calculated by the function calculating unit 14, so as to obtain the hypernym classification model.

A stop determining unit 16, configured to determine whether the adjustment of the fixed parameter value by the adjusting unit meets a preset stop condition, and if not, notify the information classifying unit 12 to obtain the initial classification result for the initial model of the hypernym classification after the fixed parameter value is adjusted.

Here, the preset stop condition includes any one of the following conditions: a first difference value between the currently adjusted fixed parameter value and the last adjusted fixed parameter value is smaller than a first threshold value; and adjusting the fixed parameter value for a preset number of times.

The present invention further provides a terminal device, a schematic structural diagram of which is shown in fig. 10, where the terminal device may generate a relatively large difference due to different configurations or performances, and may include one or more Central Processing Units (CPUs) 20 (e.g., one or more processors) and a memory 21, and one or more storage media 22 (e.g., one or more mass storage devices) storing the application programs 221 or the data 222. Wherein the memory 21 and the storage medium 22 may be a transient storage or a persistent storage. The program stored in the storage medium 22 may include one or more modules (not shown), each of which may include a series of instruction operations for the terminal device. Still further, the central processor 20 may be arranged to communicate with the storage medium 22, and to execute a series of instruction operations in the storage medium 22 on the terminal device.

Specifically, the application 221 stored in the storage medium 22 includes an application for filtering invalid hypernyms, and the application may include the word segmentation unit 10, the model determination unit 11, the information classification unit 12, the sample determination unit 13, the function calculation unit 14, the adjustment unit 15, and the stop determination unit 16 in the above-mentioned filtering device for invalid hypernyms, which will not be described in detail herein. Further, the central processor 20 may be configured to communicate with the storage medium 22 to execute a series of operations corresponding to the application of invalid hypernym filtering stored in the storage medium 22 on the terminal device.

The terminal equipment may also include one or more power supplies 23, one or more wired or wireless network interfaces 24, one or more input-output interfaces 25, and/or one or more operating systems 223, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, and the like.

The steps executed by the filtering means for invalid hypernyms in the above-mentioned method embodiment may be based on the structure of the terminal device shown in fig. 10.

The embodiment of the invention also provides a storage medium, wherein the storage medium stores a plurality of instructions, and the instructions are suitable for being loaded by a processor and executing the invalid hypernym filtering method executed by the invalid hypernym filtering device.

The embodiment of the invention also provides terminal equipment, which comprises a processor and a storage medium, wherein the processor is used for realizing each instruction;

the storage medium is used for storing a plurality of instructions which are used for loading and executing the invalid hypernym filtering method by the processor.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable storage medium, and the storage medium may include: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.

The method, the apparatus and the storage medium for filtering invalid hypernyms provided by the embodiments of the present invention are described in detail above, and a specific example is applied in the present disclosure to explain the principle and the implementation of the present invention, and the description of the above embodiments is only used to help understanding the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A method for filtering invalid hypernyms, comprising:

determining an hypernym classification model; the hypernym classification model is obtained by training according to a first training sample, and when the first training sample is determined, the hypernym classification model comprises the following steps: selecting an invalid hypernym from a preset hypernym set, and setting an invalid mark of the selected invalid hypernym; determining a second description template of a valid hypernym, the second description template having common information with the selected first description template of the invalid hypernym; selecting effective hypernyms consistent with the second description template from the hypernym set, and setting effective marks of the selected effective hypernyms;

2. The method of claim 1, wherein the determining the hypernym classification model specifically comprises:

determining a first training sample, wherein the first training sample comprises marked effective hypernyms and marked ineffective hypernyms, and determining an initial model of hypernym classification;

respectively carrying out word segmentation processing on each hypernym in the first training sample to obtain a second word segmentation processing result;

classifying each hypernym in the first training sample according to the initial model of the hypernym classification and the second word segmentation processing result to obtain an initial classification result whether each hypernym is invalid or not;

calculating a function value of a loss function related to the initial model of the hypernym classification according to the initial classification result;

and adjusting the fixed parameter value of the initial model of the hypernym classification according to the function value of the loss function so as to obtain the hypernym classification model.

3. The method of claim 2, wherein selecting an invalid hypernym in a preset hypernym set specifically comprises:

selecting at least one type of invalid hypernym from a preset hypernym set: an orientation word type, a year type, an adjective type, and a non-noun type.

4. The method of claim 3, wherein determining the second descriptive template of valid hypernyms comprises:

if the selected invalid hypernym comprises an invalid hypernym of the azimuth word type, determining that the corresponding second description template is as follows: short text ending with a noun and which can begin with "at" or begin with "at";

if the selected invalid hypernym comprises a chronological type of invalid hypernym, determining that the corresponding second description template is: short text of the year in combination with nouns;

if the selected invalid hypernym comprises an invalid hypernym of the adjective type, determining that the corresponding second description template is: can end with "or" and includes short text of nouns;

if the selected invalid hypernym comprises a non-noun type invalid hypernym, determining that the corresponding second description template is: including short text of nouns.

5. The method of claim 2, wherein the determining the initial model of hypernym classification specifically comprises:

determining the structure of an initial model of the hypernym classification, wherein the initial model of the hypernym classification comprises a feature extraction module and a classification module, the feature extraction module is used for extracting semantic features, and the classification module is used for classifying invalid hypernyms or valid hypernyms according to the semantic features extracted by the feature extraction module;

and determining initial values of fixed parameters in the feature extraction module and the classification module.

6. The method of claim 5,

the feature extraction module includes: word vector layer, convolutional layer and pooling layer, the classification module comprising: a classification layer;

the word vector layer is used for obtaining a word vector of each word, the word vector is used for representing semantic features of the corresponding word, the convolution layer is used for fusing adjacent words according to the word vector of each word, and the pooling layer is used for performing dimension reduction processing on the vector fused by the convolution layer.

7. The method of claim 6, wherein the feature extraction module further comprises at least one type of extraction network and a concatenation layer, the extraction network of any type being used to obtain word vectors for respective words; and the splicing layer is used for splicing the word vectors extracted by the at least one type of extraction network with the vectors obtained by the pooling layer and inputting the spliced word vectors into the sorting layer.

8. The method of claim 6, wherein determining initial values for fixed parameters of a word vector layer in the feature extraction module comprises:

determining a second training sample, wherein the second training sample comprises a training text;

performing word segmentation processing on the training text to obtain a third word segmentation processing result, wherein the third word segmentation processing result comprises all words in the training text;

training according to the third character segmentation processing result to obtain a character vector of each character in the training text;

and determining the corresponding relation between each word and the word vector in the training text as the initial value of the fixed parameter of the word vector layer.

9. The method of any of claims 2 to 8, further comprising:

and if the adjustment of the fixed parameter value does not meet the preset stop condition, executing the steps of obtaining the initial classification result, calculating the function value of the loss function and adjusting the fixed parameter value aiming at the initial model of the superior word classification after the fixed parameter value is adjusted.

10. A filtering apparatus for invalid hypernyms, comprising:

the model determining unit is used for determining the hypernym classification model; the hypernym classification model is obtained by training according to a first training sample, and when the first training sample is determined, the hypernym classification model comprises the following steps: selecting an invalid hypernym from a preset hypernym set, and setting an invalid mark of the selected invalid hypernym; determining a second description template of a valid hypernym, the second description template having common information with the selected first description template of the invalid hypernym; selecting effective hypernyms consistent with the second description template from the hypernym set, and setting effective marks of the selected effective hypernyms;

11. The apparatus of claim 10, further comprising: a sample determination unit, a function calculation unit and an adjustment unit, wherein:

the sample determining unit is used for determining a first training sample, wherein the first training sample comprises marked effective hypernyms and marked ineffective hypernyms, and an initial model for determining hypernym classification;

the word segmentation unit is further configured to perform word segmentation processing on each hypernym in the first training sample, so as to obtain a second word segmentation processing result;

the information classification unit is further configured to classify each hypernym in the first training sample according to the initial model for the hypernym classification and the second segmentation processing result to obtain an initial classification result of whether each hypernym is invalid;

the function calculation unit is used for calculating a function value of a loss function related to the initial model of the hypernym classification according to the initial classification result;

the adjusting unit is further configured to adjust a fixed parameter value of the initial model of the hypernym classification according to the function value of the loss function, so as to obtain the hypernym classification model.

12. The apparatus of claim 11, further comprising:

a stopping judgment unit, configured to judge whether the adjustment of the fixed parameter value by the adjustment unit satisfies a preset stopping condition, and if not, notify the information classification unit of the initial model classified by the hypernym after the fixed parameter value is adjusted to obtain the initial classification result, where the function calculation unit calculates a function value of the loss function, and the adjustment unit adjusts the fixed parameter value.

13. A computer-readable storage medium storing instructions adapted to be loaded by a processor and to perform a method of filtering invalid hypernyms according to any of claims 1 to 9.

14. A terminal device, comprising a processor and a memory, wherein the processor is configured to implement instructions;

the memory is configured to store instructions for loading and executing the method of filtering invalid hypernyms as claimed in any one of claims 1 to 9 by the processor.