CN111159999B - Method and device for filling word slot, electronic equipment and storage medium - Google Patents

Method and device for filling word slot, electronic equipment and storage medium Download PDF

Info

Publication number
CN111159999B
CN111159999B CN201911233540.3A CN201911233540A CN111159999B CN 111159999 B CN111159999 B CN 111159999B CN 201911233540 A CN201911233540 A CN 201911233540A CN 111159999 B CN111159999 B CN 111159999B
Authority
CN
China
Prior art keywords
text data
word slot
matching
word
analyzed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911233540.3A
Other languages
Chinese (zh)
Other versions
CN111159999A (en
Inventor
单彦会
周宇涵
郭晗暄
荣玉军
罗红
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Hangzhou Information Technology Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Hangzhou Information Technology Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN201911233540.3A priority Critical patent/CN111159999B/en
Publication of CN111159999A publication Critical patent/CN111159999A/en
Application granted granted Critical
Publication of CN111159999B publication Critical patent/CN111159999B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)

Abstract

The embodiment of the invention relates to the technical field of computers, and discloses a method and a device for filling word slots, electronic equipment and a storage medium. The method for filling the word slot obtains text data to be analyzed; selecting a matching model matched with the text category of the text data to be analyzed from the stored multiple matching models, wherein the matching model comprises a corresponding relation between a word slot label and the text data, and the word slot label is used for identifying a word slot; determining a word slot label of the text data to be analyzed according to the matched matching model; and extracting text data corresponding to the word slot labels from the text data to be analyzed, and taking the extracted text data as word slot contents of the word slots. According to the embodiment, the speed and the accuracy of filling the word groove are improved.

Description

Method and device for filling word slot, electronic equipment and storage medium
Technical Field
The embodiment of the invention relates to the technical field of computers, in particular to a method and a device for filling word slots, electronic equipment and a storage medium.
Background
With the development of neural networks, particularly the extraterrestrial prominence of deep neural network technology, the Natural Language Processing (NLP) field has also made great progress, and the semantic understanding in NLP (Natural Language understanding, NLU for short) has also made remarkable progress, but there is a great gap from the expectation of people, so that more and more deep neural network technologies are applied to NLP, and revolutionary progress in the NLP field is expected. The most important thing in the NLP field is semantic understanding, and the most important thing in semantic understanding is to fill a word slot, wherein the word slot is determined according to the content of an input text, the content of the word slot is extracted to fill the word slot, and semantic analysis of the content of the input text can be quickly completed through the word slot.
The inventors found that at least the following problems exist in the related art: the existing word slot filling method comprises deep learning, template matching and the like; the deep learning is used for building a model, and the text to be analyzed is input into the model, so that the word slot content of the text to be analyzed can be obtained, but the model obtained by the training mode consumes a large amount of manpower and time, so that the generated model is high in cost. The template matching mode is to design a matched template according to different expression modes of the same semantic meaning, and obtain the word slot content of the text to be analyzed in a module matching mode, but the workload of the mode is large, and meanwhile, if one character string is different in the matching process, the matching is failed, the accurate word slot content cannot be obtained, and the applicability of template matching is poor.
Disclosure of Invention
The embodiment of the invention aims to provide a method and a device for filling word slots, electronic equipment and a storage medium, which improve the speed and accuracy of filling the word slots.
In order to solve the technical problem, an embodiment of the present invention provides a method for filling word slots, which obtains text data to be analyzed; selecting a matching model matched with the text category of the text data to be analyzed from the stored multiple matching models, wherein the matching model comprises a corresponding relation between a word slot label and the text data, and the word slot label is used for identifying a word slot; determining a word slot label of the text data to be analyzed according to the matched matching model; and extracting text data corresponding to the word slot labels from the text data to be analyzed, and taking the extracted text data as word slot contents of the word slots.
The embodiment of the invention also provides a device for filling word slots, which comprises: the device comprises an acquisition module, a selection module, a determination module and an extraction module; the acquisition module is used for acquiring text data to be analyzed; the selecting module is used for selecting a matching model matched with the text type of the text data to be analyzed from a plurality of stored matching models, the matching model comprises a corresponding relation between a word slot label and the text data, and the word slot label is used for identifying the word slot; the determining module is used for determining a word slot label of the text data to be analyzed according to the matched matching model; the extraction module is used for extracting text data corresponding to the word slot labels from the text data to be analyzed, and taking the extracted text data as word slot content of the word slots.
An embodiment of the present invention also provides an electronic device, including: a processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor, and the instructions are executable by the at least one processor to enable the at least one processor to perform the method for filling word slots.
Embodiments of the present invention also provide a computer-readable storage medium storing a computer program, which when executed by a processor implements the above method for filling a word slot.
Compared with the prior art, the embodiment of the invention has the advantages that the matching models matched with the text types of the text data to be analyzed are determined firstly, each matching model comprises the corresponding relation between the word slot label and the text data, the word slot label of the text data to be analyzed is determined according to the matching models, the text data corresponding to the word slot label is extracted through the word slot label, the filling of the word slot is completed, and the matching models of a plurality of different text types are stored, so that the matching models are determined firstly, the range of matching the text data to be analyzed is reduced, the situation of wrong word slot filling is avoided, the accuracy of subsequent word slot filling is improved, and the speed of word slot filling is improved; meanwhile, the matched matching model is determined, the text type of the text data to be analyzed is also determined, and the classification requirement of the text data to be analyzed is met; in addition, because a plurality of matching models are stored, the text type of each matching model is different, so that the number of the matching models can be conveniently expanded according to the text type; the matching models comprise the corresponding relation between the text data and the word slot labels, so that the corresponding relation in each matching model can be flexibly expanded or changed, and the cost for constructing the matching models and the cost for filling the word slots are reduced because the training is not required again.
In addition, the matching model is a multi-mode matching automaton constructed on the basis of a dictionary, the dictionary stores the corresponding relation between text data and word slot labels in a key value pair mode, and the text category of the matching model is the same as that of the word slot labels in the dictionary; before selecting a matching model matched with the text category of the text data to be analyzed from the stored multiple matching models, the method for filling the word slot further comprises the following steps: determining each key value pair in the dictionary according to a corpus, wherein the corpus comprises text data and word slot labels; and selecting key value pairs where the word slot labels of the same text category are located to construct the matching model. The matching model is a multi-mode matching automaton constructed based on a dictionary, and the multi-mode matching automaton stores data in a tree structure, so that the matching model can quickly match text data, and the matching speed of the text data to be analyzed is improved; in addition, the text types of the word slot labels corresponding to the text data in each matching model are the same, so that the matching with the text type of the text data to be analyzed is facilitated, the matching model suitable for the text data to be analyzed is selected, and the accuracy of filling the word slot is improved.
In addition, determining each key-value pair in the dictionary according to the corpus specifically includes: determining text data and a word slot label corresponding to the text data from a corpus; and taking the determined text data as a key in the key value pair, and taking the word slot label corresponding to the text data as a numerical value in the key value pair. Due to the uniqueness of the keys, the text data is used as the keys, the situation that different word slot labels correspond to the same text data is avoided, and the accuracy of filling the word slots is improved.
In addition, determining text data and a word slot label corresponding to the text data from the corpus specifically includes: extracting initial word slot text data and an initial word slot label corresponding to the initial word slot text data from the corpus; judging whether any two extracted initial word slot text data are the same, and if yes, taking the initial word slot text data as text data; respectively acquiring the frequency of each initial word slot label in the corpus, and selecting the initial word slot label with the maximum frequency as a word slot label corresponding to the text data; and if the determination result is different, using the initial word slot text data as text data, and using the initial word slot label as a word slot label corresponding to the text data. Various initial word slot text data and initial word slot labels corresponding to the initial word slot text data are stored in the corpus, and because the same initial word slot text data has the condition of corresponding to a plurality of different initial word slot labels, in order to ensure the accuracy of the constructed matching model, the word slot labels corresponding to the text data are determined according to the frequency of occurrence in the corpus.
In addition, selecting a matching model matched with the text type of the text data to be analyzed from the stored multiple matching models specifically comprises: respectively matching the text data to be analyzed in each matching model to obtain a matching result of each matching model; determining the matching confidence coefficient of each matching model according to each matching result, wherein the matching confidence coefficient is the ratio of the total length of the successfully matched text data to the total length of the text data to be analyzed; and determining a matching model matched with the text data to be analyzed according to the obtained matching confidence of each matching model. And selecting the matching model based on the matching confidence of each matching model, so that the speed and the accuracy of selecting the matching model are improved.
In addition, matching the text data to be analyzed in each matching model respectively to obtain the matching result of each matching model, which specifically comprises: for each matching model, the following processing is performed: matching each key in the matching model with the text data to be analyzed respectively; and judging whether the same word slot labels exist in the word slot labels corresponding to the matched keys, if so, selecting the key with the matched maximum length as successfully matched text data, and otherwise, taking the matched key as successfully matched text data. The matched key corresponding to the maximum length is selected as successfully matched text data by judging whether the same word slot label exists or not, the word slot label corresponding to the matched key is the word slot label corresponding to the successfully matched text data, the matched repeated word slot label is removed, the subsequent word slot filling is facilitated, and the accuracy of word slot filling is improved.
In addition, according to the obtained matching confidence of each matching model, determining the matching model matched with the text data to be analyzed specifically comprises: sequencing each matching model according to the matching confidence; judging whether a plurality of highest matching confidence coefficients exist or not, if so, acquiring user information, and selecting a matching model from matching models corresponding to the highest matching confidence coefficients according to the user information; otherwise, selecting the matching model corresponding to the highest matching confidence as the matching model, wherein the user information comprises the data of the user intention. And selecting a matched matching model from the matching models corresponding to the highest matching confidence degrees based on the user information, thereby further improving the accuracy of selecting the matching model.
Drawings
One or more embodiments are illustrated by way of example in the accompanying drawings which correspond to and are not to be construed as limiting the embodiments, in which elements having the same reference numeral designations represent like elements throughout, and in which the drawings are not to be construed as limiting in scale unless otherwise specified.
Fig. 1 is a detailed flowchart of an apparatus for filling word slots according to a first embodiment of the present invention;
fig. 2 is a schematic diagram of an implementation of determining text data and a word slot tag corresponding to the text data according to a first embodiment of the present invention;
FIG. 3 is a diagram illustrating an embodiment of a matching model for selecting a match according to a first embodiment of the present invention;
FIG. 4 is a flowchart illustrating an apparatus for filling word slots according to a second embodiment of the present invention;
FIG. 5 is a schematic structural diagram of an apparatus for filling word slots according to a third embodiment of the present invention;
fig. 6 is a schematic structural diagram of an electronic device according to a fourth embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, embodiments of the present invention will be described in detail below with reference to the accompanying drawings. However, it will be appreciated by those of ordinary skill in the art that numerous technical details are set forth in order to provide a better understanding of the present application in various embodiments of the present invention. However, the technical solution claimed in the present application can be implemented without these technical details and various changes and modifications based on the following embodiments.
The following embodiments are divided for convenience of description, and should not constitute any limitation to the specific implementation manner of the present invention, and the embodiments may be mutually incorporated and referred to without contradiction.
The inventor finds that in the related art, achieving semantic understanding of text data to be analyzed usually means that filling word slots in the text data to be analyzed is completed, and through filling the word slots, the intention of a user is determined, so that semantic understanding of the text data to be analyzed is achieved. The method comprises the steps that a current word slot filling mode can be realized through a deep learning mode, a large amount of corpus data needs to be collected in the deep learning mode, word slot labeling is carried out on the corpus data according to a preset rule, after the corpus data are labeled, a deep learning model is used for training to obtain a trained model, then the trained model is used for filling the word slot into newly input text data to be analyzed, and therefore the user intention expressed by the text data to be analyzed is predicted; however, the process of labeling word slots on the speech data consumes labor cost, and has long training time, and when the use range needs to be expanded, retraining is needed, which is high in cost. Another way to fill word slots is to design a template for semantic understanding to fill word slots, but in this way, the template design needs to consume a lot of manpower, and meanwhile, the designed template is not necessarily comprehensive, which may result in that the word slots cannot be well filled, and the expansibility is poor and the maintenance cost is high.
The first embodiment of the present invention relates to a method for filling a word slot, which can be used in an electronic device such as a server, a client, etc., for example, a cloud server, a robot, etc. The specific flow of the method for filling word slots is shown in fig. 1.
Step 101: and acquiring text data to be analyzed.
Specifically, the text data to be analyzed can be obtained by inputting through an input interface by a user, and the voice data of the user can also be collected and converted into the text data to be analyzed; or may be obtained by uploading through other third-party devices.
Step 102: and selecting a matching model matched with the text type of the text data to be analyzed from the stored multiple matching models, wherein the matching model comprises a corresponding relation between a word slot tag and the text data, and the word slot tag is used for identifying a word slot.
Before selecting a matching model matched with the text type of the text data to be analyzed, a plurality of matching models can be stored in advance, the matching models are a multi-mode matching automaton constructed based on a dictionary, the dictionary stores the corresponding relation between the text data and the word slot labels in a key value pair mode, and the text type of the matching models is the same as the text type of the word slot labels in the dictionary.
Specifically, a multimodal matching automaton (AC automaton) stores data in a structure of a dictionary tree and queries the stored data based on a multimodal matching algorithm. The dictionary in the matching model stores the corresponding relation between the text data and the word slot labels in the form of key value pairs, and N key value pairs can be stored in the dictionary, wherein N is an integer larger than 0. To ensure the accuracy of the matching model, a plurality of key-value pairs may be included in the dictionary. The specific process of constructing each matching model is as follows: determining each key value pair in the dictionary according to a corpus, wherein the corpus comprises text data and word slot labels; and selecting key value pairs where the word slot labels of the same text category are located to construct a matching model.
In one example, the specific process of determining each key-value pair in the dictionary from the corpus is: determining text data and the word slot labels corresponding to the text data from a corpus; and taking the determined text data as a key in a key value pair, and taking a word slot label corresponding to the text data as a numerical value in the key value pair.
Specifically, the corpus stores initial word slot text data of various text categories and corresponding initial word slot labels, and the word slot labels are used for identifying the word slots, for example, a word slot is a destination, and the word slot destination can be identified by a word slot label "city". Each word slot has a unique word slot label, and the relation between the word slot label and the text data is conveniently stored in the matching model, and the subsequent classification of the text category of the matching model based on the word slot label is also convenient. The initial word slot text data is different expressions of each word slot label, so different initial word slot text data correspond to the same initial word slot label, for example, the initial word slot text data 1 is "shenzhen", the initial word slot text data 1 is one expression of the word slot [ destination ], the initial word slot text data 2 is "shanghai", the initial word slot text data 2 is another expression of the word slot [ destination ], so the initial word slot label corresponding to the initial word slot text data 1 is "city", and the word slot label corresponding to the initial word slot text data is "city".
The specific process of determining text data and word slot labels corresponding to the text data from a corpus, which may include the sub-steps shown in fig. 2, is described in detail below.
Substep S11: and extracting initial word slot text data and initial word slot labels corresponding to the initial word slot text data from the corpus.
Specifically, a plurality of initial word slot text data and initial word slot labels corresponding to the initial word slot text data are directly extracted from the corpus, and the number of the initial word slot text data and the number of the corresponding initial word slot labels can be selected and extracted according to actual needs. It will be appreciated that the number of initial word slot text data and corresponding initial word slot labels should be as large as possible in order to subsequently fill the word slot accurately.
Substep S12: and judging whether any two extracted initial word slot text data are the same, if so, executing step S13, and if not, executing substep S15.
Specifically, the initial word groove text data in the corpus is very large, the initial word groove label is very limited, and there may be a case where the character string of "a song" is the same as the character string of "a book", for example, the song name is "akathi", the character string of the book name is "akathi", and the key (key) of each key value pair in the dictionary is unique, it needs to first determine whether any two extracted initial word groove text data are the same, that is, determine whether the character strings in the two initial text data are completely the same, and if yes, determine that the two initial word groove text data are the same.
Substep S13: and taking the initial word slot text data as text data.
Substep S14: and respectively acquiring the frequency of each initial word slot label in the corpus, and selecting the initial word slot label with the maximum frequency as the word slot label corresponding to the text data.
When any two initial word slot text data are identical, the accurate initial word slot text data are selected as keys in a key value pair according to the condition, the uniqueness of the keys is ensured, the selection mode can be through counting the frequency of occurrence of a certain song and a certain book name in a corpus, the initial word slot label with high frequency of occurrence is selected as a word slot label corresponding to the text data, and therefore the optimal matching result of matching the subsequently constructed matching model to the text data to be analyzed can be ensured.
Substep S15: and taking the initial word slot text data as text data, and taking the initial word slot label as a word slot label corresponding to the text data.
Specifically, if there are no plurality of same initial word slot text data, the extracted initial word slot text data may be directly used as text data, and the corresponding word slot label may be used as the word slot label corresponding to the text data.
The electronic device stores a plurality of matching models of different categories, and a specific process of selecting a matching model may include the sub-steps shown in fig. 3:
substep S21: and respectively matching the text data to be analyzed in each matching model to obtain the matching result of each matching model.
Specifically, each key and the text data to be analyzed are automatically compared by each matching model, if the text data identical to the key in the text data to be analyzed is found, the key is the matched key, and the matching result includes each matched key and the word slot label corresponding to the matched key.
Substep S22: and determining the matching confidence of each matching model according to each matching result, wherein the matching confidence is the ratio of the total length of the successfully matched text data to the total length of the text data to be analyzed.
According to the matching result of each matching model, the matching confidence of each matching model can be calculated, and the matching confidence is the ratio of the total length of the successfully matched text data to the total length of the text data to be analyzed. The total length of the successfully matched text data is equal to the sum of the lengths of each matched key. For example, the text data to be parsed is "abcdefg", the matched keys are "bc" and "efg", and then the matching confidence = (2+3)/7.
Substep S23: and determining a matching model matched with the text data to be analyzed according to the obtained matching confidence of each matching model.
The matching model with the maximum matching confidence may be selected as the matching model matching the text data to be parsed. Specifically, sequencing each matching model according to the matching confidence; judging whether a plurality of highest matching confidence coefficients exist or not, if so, acquiring user information, and selecting a matched matching model from matching models corresponding to the highest matching confidence coefficients according to the user information; otherwise, selecting the matching model corresponding to the highest matching confidence as the matching model, wherein the user information comprises the data of the user intention. For example, the matching confidence of the matching model a is 0.9, the matching confidence of the matching model B is 0.9, and the matching confidence of the matching confidence C is 0.8, and a plurality of matching confidences are determined by sorting according to the matching confidences, and the highest matching confidences are respectively the matching model a and the matching model B, so as to obtain user information, which may be historical data, and determine to select the matching model a.
Step 103: and determining a word slot label of the text data to be analyzed according to the matched matching model.
Specifically, the text data to be analyzed can be input into the matching model again, and matching is performed again to obtain the matched keys and the corresponding word slot labels. The matching result of the matched matching model obtained in the substep S21 may also be obtained. The matching result comprises the matched key and the corresponding word slot label, a useful word slot label and a useless word slot label can be determined according to the text category of the word slot label, and the useful word slot label is used as the word slot label of the text data to be analyzed. For example, if i want to listen to a certain song of a person "as the text data to be analyzed, the matching result is < i want to listen > < play >, < someone > < singer, < other >, < a certain song > < song >, and according to the text category of the word slot tag, the word slot tag" other "can be determined as a useless word slot tag, so that the word slot tag of the text data to be analyzed is determined as: play, singer and song.
Step 104: and extracting text data corresponding to the word slot labels from the text data to be analyzed, and taking the extracted text data as word slot contents of the word slots.
For example, if i want to listen to a certain song of a person "as the text data to be analyzed, the matching result is < i want to listen > < play >, < someone > < singer, < other >, < a certain song > < song >, and according to the text category of the word slot tag, the word slot tag" other "can be determined as a useless word slot tag, so that the word slot tag of the text data to be analyzed is determined as: play, singer and song, wherein the word slot labels the word slot [ play action ] corresponding to the play; the word slot label singer corresponds to the word slot [ singer ], and the word slot label song corresponds to the word slot [ singer name ]; the method comprises the steps of respectively obtaining word slots corresponding to three word slot labels, extracting text data corresponding to the word slot labels to serve as word slot contents of the word slots, namely, filling the word slots [ playing actions ] by 'i want to hear', filling the word slots [ singers ] by 'someone', filling the word slots [ song names ] by 'certain songs', identifying the playing actions by electronic equipment due to the word slots [ playing actions ], wherein the word slots related to the actions have no relation with the word slot contents, and understanding of the electronic equipment on the text data to be analyzed is not influenced.
Compared with the prior art, the embodiment of the invention has the advantages that the matching models matched with the text types of the text data to be analyzed are determined firstly, each matching model comprises the corresponding relation between the word slot label and the text data, the word slot label of the text data to be analyzed is determined according to the matching models, the text data corresponding to the word slot label is extracted through the word slot label, the filling of the word slot is completed, and the matching models of a plurality of different text types are stored, so that the matching models are determined firstly, the range of matching the text data to be analyzed is reduced, the situation of wrong word slot filling is avoided, the accuracy of subsequent word slot filling is improved, and the speed of word slot filling is improved; meanwhile, the matched matching model is determined, the text type of the text data to be analyzed is also determined, and the classification requirement of the text data to be analyzed is met; in addition, because a plurality of matching models are stored, the text type of each matching model is different, so that the number of the matching models can be conveniently expanded according to the text type; the matching models comprise the corresponding relation between the text data and the word slot labels, so that the corresponding relation in each matching model can be flexibly expanded or changed, and the cost for constructing the matching models and the cost for filling the word slots are reduced because the training is not required again.
A second embodiment of the invention is directed to a method of filling word slots. This embodiment is another implementation of obtaining the matching result of each matching model.
The processing steps shown in fig. 4 are performed for each matching model.
Step 201: and matching each key in the matching model with the text data to be analyzed respectively.
Step 202: and judging whether the word slot labels corresponding to the matched keys have the same word slot label, if so, executing step 203, otherwise, executing step 204.
Specifically, each key in the matching model is matched with text data to be analyzed to obtain a plurality of matched keys, because keys are different expressions of word slots, the condition that the same word slot label corresponds to a plurality of keys exists, because the lengths of the keys are different, a certain key may contain all character strings of another key, in the process of matching the text data to be analyzed, repeated word slot labels are matched, and because the word slot labels are unique marks of the word slots, the content of filling the same word slot is different, and the accuracy of filling the word slots is influenced. In this embodiment, whether the same word slot label exists in the word slot labels corresponding to the multiple matched keys is determined, and if yes, it is indicated that the word slot label duplication removal operation needs to be performed.
Step 203: and selecting the key with the matched maximum length as the successfully matched text data.
For example, key1= abcd, key2= bc, key1 corresponds to word slot tag1, and key2 corresponds to word slot tag1, if the text data to be analyzed is input into the matching model, the matched keys are key1 and key2, it is determined that the word slot tags corresponding to key1 and key2 are the same, and at this time, it is determined that the matched key with the maximum length is selected as the successfully matched text data in the text data to be analyzed, that is, key1 is selected as the successfully matched text data.
Step 204: and taking the matched key as the successfully matched text data.
If the matched key does not exist, the matched key can be used as the text data which is successfully matched.
In the method for filling word slots provided by this embodiment, by determining whether the same word slot tag exists, the matched key corresponding to the maximum length is selected as the successfully matched text data, and the word slot tag corresponding to the matched key is the word slot tag corresponding to the successfully matched text data, so that the repeated word slot tag is removed, which is beneficial to subsequently filling word slots and improves the accuracy of filling word slots.
The steps of the above methods are divided for clarity, and the implementation may be combined into one step or split some steps, and the steps are divided into multiple steps, so long as the same logical relationship is included, which are all within the protection scope of the present patent; it is within the scope of the patent to add insignificant modifications to the algorithms or processes or to introduce insignificant design changes to the core design without changing the algorithms or processes.
A third embodiment of the present invention relates to a word groove filling device, which has a specific structure as shown in fig. 5, and includes: an acquisition module 301, a selection module 302, a determination module 303, and an extraction module 304.
The obtaining module 301 is configured to obtain text data to be parsed, and the selecting module 302 is configured to select a matching model that matches a text category of the text data to be parsed from a plurality of stored matching models, where the matching model includes a correspondence between a word slot tag and the text data, and the word slot tag is used to identify the word slot. The determining module 303 is configured to determine a word slot tag of the text data to be parsed according to the matched matching model; the extraction module 304 is configured to extract text data corresponding to the word slot tag from the text data to be analyzed, and use the extracted text data as word slot content of the word slot.
It should be understood that this embodiment is an example of an apparatus corresponding to the first embodiment, and may be implemented in cooperation with the first embodiment. The related technical details mentioned in the first embodiment are still valid in this embodiment, and are not described herein again in order to reduce repetition. Accordingly, the related-art details mentioned in the present embodiment can also be applied to the first embodiment.
It should be noted that each module referred to in this embodiment is a logical module, and in practical applications, one logical unit may be one physical unit, may be a part of one physical unit, and may be implemented by a combination of multiple physical units. In addition, in order to highlight the innovative part of the present invention, elements that are not so closely related to solving the technical problems proposed by the present invention are not introduced in the present embodiment, but this does not indicate that other elements are not present in the present embodiment.
A fourth embodiment of the present invention relates to an electronic device, and a specific configuration of the electronic device 40 is as shown in fig. 6, and includes: at least one processor 401; and a memory 402 communicatively coupled to the at least one processor 401; the memory 402 stores instructions executable by the at least one processor 401, and the instructions are executable by the at least one processor 401 to enable the at least one processor 401 to perform the above-mentioned method of filling word slots.
The memory 402 and the processor 401 are connected by a bus, which may include any number of interconnected buses and bridges that link one or more of the various circuits of the processor 401 and the memory 402. The bus may also link various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. A bus interface provides an interface between the bus and the transceiver. The transceiver may be one element or a plurality of elements, such as a plurality of receivers and transmitters, providing a means for communicating with various other apparatus over a transmission medium. The data processed by the processor 401 may be transmitted over a wireless medium via an antenna, which may receive the data and transmit the data to the processor 401.
The processor 401 is responsible for managing the bus and general processing and may also provide various functions including timing, peripheral interfaces, voltage regulation, power management, and other control functions. And memory 402 may be used to store data used by the processor in performing operations.
A fifth embodiment of the present invention relates to a computer-readable storage medium storing a computer program which, when executed by a processor, implements the method of filling a word slot in the first embodiment or the second embodiment.
Those skilled in the art can understand that all or part of the steps in the method according to the above embodiments may be implemented by a program to instruct related hardware, where the program is stored in a storage medium and includes several instructions to enable a device (which may be a single chip, a chip, or the like) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk, and various media capable of storing program codes.
It will be understood by those of ordinary skill in the art that the foregoing embodiments are specific examples for carrying out the invention, and that various changes in form and details may be made therein without departing from the spirit and scope of the invention in practice.

Claims (10)

1. A method of filling word slots, comprising:
acquiring text data to be analyzed;
selecting a matching model matched with the text category of the text data to be analyzed from a plurality of stored matching models, wherein the matching model comprises a corresponding relation between a word slot label and the text data, and the word slot label is used for identifying the word slot;
determining a word slot label of the text data to be analyzed according to the matched matching model;
and extracting text data corresponding to the word slot label from the text data to be analyzed, and taking the extracted text data as the word slot content of the word slot.
2. The method of filling word slots according to claim 1, wherein the matching model is a multi-mode matching automaton constructed based on a dictionary storing correspondence between the text data and the word slot labels in the form of key-value pairs, wherein a text category of the matching model is the same as a text category of the word slot labels in the dictionary;
before selecting a matching model matched with the text category of the text data to be analyzed from the stored multiple matching models, the method for filling the word slot further comprises the following steps:
determining each key-value pair in the dictionary according to a corpus, wherein the corpus comprises text data and word slot labels;
and selecting key value pairs where the word slot labels of the same text category are located to construct the matching model.
3. The method according to claim 2, wherein determining each key-value pair in the dictionary from a corpus comprises:
determining text data and the word slot labels corresponding to the text data from the corpus;
and taking the determined text data as a key in the key value pair, and taking the word slot label corresponding to the text data as a numerical value in the key value pair.
4. The method according to claim 3, wherein the determining text data and the word slot labels corresponding to the text data from the corpus specifically includes:
extracting initial word slot text data and an initial word slot label corresponding to the initial word slot text data from the corpus;
judging whether any two extracted initial word slot text data are the same, and if yes, taking the initial word slot text data as the text data; respectively acquiring the frequency of each initial word slot label in the corpus, and selecting the initial word slot label with the maximum frequency as a word slot label corresponding to the text data;
and if the initial word slot text data is different from the text data, using the initial word slot text data as the text data, and using the initial word slot label as a word slot label corresponding to the text data.
5. The method for filling a word slot according to claim 2, wherein the selecting a matching model matching a text category of the text data to be parsed from the stored multiple matching models specifically comprises:
respectively matching the text data to be analyzed in each matching model to obtain a matching result of each matching model;
determining the matching confidence coefficient of each matching model according to each matching result, wherein the matching confidence coefficient is the ratio of the total length of successfully matched text data to the total length of the text data to be analyzed;
and determining a matching model matched with the text data to be analyzed according to the obtained matching confidence of each matching model.
6. The method for filling word slots according to claim 5, wherein the text data to be analyzed is respectively matched in each matching model to obtain a matching result of each matching model, and specifically comprises:
for each of the matching models, the following is performed:
matching each key in the matching model with the text data to be analyzed respectively;
and judging whether the same word slot labels exist in the word slot labels corresponding to the matched keys, if so, selecting the key with the matched maximum length as the successfully matched text data, otherwise, using the matched key as the successfully matched text data.
7. The method according to claim 5 or 6, wherein the determining a matching model matching the text data to be parsed according to the obtained matching confidence of each matching model specifically comprises:
sequencing each matching model according to the matching confidence;
judging whether a plurality of highest matching confidence coefficients exist or not, if so, acquiring user information, and selecting the matched matching model from matching models corresponding to the highest matching confidence coefficients according to the user information; otherwise, selecting the matching model corresponding to the highest matching confidence degree as the matching model of the matching, wherein the user information comprises the data of the user intention.
8. An apparatus for filling word slots, comprising: the device comprises an acquisition module, a selection module, a determination module and an extraction module;
the acquisition module is used for acquiring text data to be analyzed;
the selection module is used for selecting a matching model matched with the text category of the text data to be analyzed from a plurality of stored matching models, wherein the matching model comprises a corresponding relation between a word slot label and the text data, and the word slot label is used for identifying the word slot;
the determining module is used for determining a word slot label of the text data to be analyzed according to the matched matching model;
the extraction module is used for extracting text data corresponding to the word slot labels from the text data to be analyzed, and taking the extracted text data as word slot content of the word slots.
9. An electronic device, comprising: at least one processor; and (c) a second step of,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of filling a word slot as claimed in any one of claims 1 to 7.
10. A computer-readable storage medium storing a computer program, wherein the computer program is configured to implement the method of filling a word slot according to any one of claims 1 to 7 when executed by a processor.
CN201911233540.3A 2019-12-05 2019-12-05 Method and device for filling word slot, electronic equipment and storage medium Active CN111159999B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911233540.3A CN111159999B (en) 2019-12-05 2019-12-05 Method and device for filling word slot, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911233540.3A CN111159999B (en) 2019-12-05 2019-12-05 Method and device for filling word slot, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111159999A CN111159999A (en) 2020-05-15
CN111159999B true CN111159999B (en) 2023-04-18

Family

ID=70556418

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911233540.3A Active CN111159999B (en) 2019-12-05 2019-12-05 Method and device for filling word slot, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111159999B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111831823B (en) * 2020-07-10 2022-05-13 亿咖通(湖北)技术有限公司 Corpus generation and model training method
CN112084770B (en) * 2020-09-14 2024-07-05 深圳前海微众银行股份有限公司 Word slot filling method, device and readable storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109241269A (en) * 2018-07-27 2019-01-18 深圳追科技有限公司 Task humanoid robot word slot fill method
CN109712617A (en) * 2018-12-06 2019-05-03 珠海格力电器股份有限公司 Voice control method and device, storage medium and air conditioner
CN109918479A (en) * 2019-02-28 2019-06-21 百度在线网络技术(北京)有限公司 For handling the method and device of information
CN110059163A (en) * 2019-04-29 2019-07-26 百度在线网络技术(北京)有限公司 Generate method and apparatus, the electronic equipment, computer-readable medium of template
US10453117B1 (en) * 2016-06-29 2019-10-22 Amazon Technologies, Inc. Determining domains for natural language understanding
CN110472030A (en) * 2019-08-08 2019-11-19 网易(杭州)网络有限公司 Man-machine interaction method, device and electronic equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10453117B1 (en) * 2016-06-29 2019-10-22 Amazon Technologies, Inc. Determining domains for natural language understanding
CN109241269A (en) * 2018-07-27 2019-01-18 深圳追科技有限公司 Task humanoid robot word slot fill method
CN109712617A (en) * 2018-12-06 2019-05-03 珠海格力电器股份有限公司 Voice control method and device, storage medium and air conditioner
CN109918479A (en) * 2019-02-28 2019-06-21 百度在线网络技术(北京)有限公司 For handling the method and device of information
CN110059163A (en) * 2019-04-29 2019-07-26 百度在线网络技术(北京)有限公司 Generate method and apparatus, the electronic equipment, computer-readable medium of template
CN110472030A (en) * 2019-08-08 2019-11-19 网易(杭州)网络有限公司 Man-machine interaction method, device and electronic equipment

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于情感交互的服务机器人对话***研究与设计;栗梦媛;《中国优秀硕士学位论文电子期刊》;全文 *
查询意图识别的关键技术研究;崔建青;《中国优秀硕士学位论文电子期刊》;全文 *

Also Published As

Publication number Publication date
CN111159999A (en) 2020-05-15

Similar Documents

Publication Publication Date Title
CN107291783B (en) Semantic matching method and intelligent equipment
CN108959242B (en) Target entity identification method and device based on part-of-speech characteristics of Chinese characters
CN110619051B (en) Question sentence classification method, device, electronic equipment and storage medium
CN108334493B (en) Question knowledge point automatic extraction method based on neural network
CN104715063B (en) search ordering method and device
CN112633003A (en) Address recognition method and device, computer equipment and storage medium
CN112115232A (en) Data error correction method and device and server
CN108628830A (en) A kind of method and apparatus of semantics recognition
CN111159999B (en) Method and device for filling word slot, electronic equipment and storage medium
CN114005015B (en) Training method of image recognition model, electronic device and storage medium
CN113836316B (en) Processing method, training method, device, equipment and medium for ternary group data
CN111401034B (en) Semantic analysis method, semantic analysis device and terminal for text
CN110765276A (en) Entity alignment method and device in knowledge graph
CN112925895A (en) Natural language software operation and maintenance method and device
CN113326363A (en) Searching method and device, prediction model training method and device, and electronic device
KR20220068462A (en) Method and apparatus for generating knowledge graph
WO2024138859A1 (en) Cross-language entity word retrieval method, apparatus and device, and storage medium
EP4127957A1 (en) Methods and systems for searching and retrieving information
CN112966501B (en) New word discovery method, system, terminal and medium
CN114254642A (en) Entity information processing method, device, electronic equipment and medium
CN113536772A (en) Text processing method, device, equipment and storage medium
CN113553415A (en) Question and answer matching method and device and electronic equipment
CN112784600A (en) Information sorting method and device, electronic equipment and storage medium
CN111159421A (en) Knowledge graph-based fund query method and device
CN117992601B (en) Document generation method and device based on artificial intelligence

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant