CN111462734A - Semantic slot filling model training method and system - Google Patents

Semantic slot filling model training method and system Download PDF

Info

Publication number
CN111462734A
CN111462734A CN202010248117.7A CN202010248117A CN111462734A CN 111462734 A CN111462734 A CN 111462734A CN 202010248117 A CN202010248117 A CN 202010248117A CN 111462734 A CN111462734 A CN 111462734A
Authority
CN
China
Prior art keywords
training
semantic slot
semantic
value pair
filling model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010248117.7A
Other languages
Chinese (zh)
Other versions
CN111462734B (en
Inventor
俞凯
刘辰
朱苏
陈露
曹瑞升
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
AI Speech Ltd
Original Assignee
AI Speech Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by AI Speech Ltd filed Critical AI Speech Ltd
Priority to CN202010248117.7A priority Critical patent/CN111462734B/en
Publication of CN111462734A publication Critical patent/CN111462734A/en
Application granted granted Critical
Publication of CN111462734B publication Critical patent/CN111462734B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1822Parsing for meaning understanding

Landscapes

  • Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Machine Translation (AREA)

Abstract

The embodiment of the invention provides a semantic slot filling model training method. The method comprises the following steps: training a first training data set with labels to generate a first semantic slot filling model; inputting a second training data set of automatic voice recognition into the first semantic slot filling model, and determining a first semantic slot value pair; the rule-based error correction module corrects the first semantic slot value pair and determines a second semantic slot value pair, wherein the error correction module corrects the first semantic slot value pair based on a preset rule; and performing strategy gradient training on the first semantic slot filling model based on the second semantic slot value pair, and determining a trained second semantic slot filling model. The embodiment of the invention also provides a semantic slot filling model training system. The embodiment of the invention directly introduces the error correction based on the rule into the training method through reinforcement learning, and is used for the slot filling task in the spoken language semantic understanding. Thereby improving the robustness of semantic understanding to speech recognition errors.

Description

Semantic slot filling model training method and system
Technical Field
The invention relates to the field of intelligent voice, in particular to a semantic slot filling model training method and system.
Background
Spoken semantic understanding is a technique for converting the output produced by automatic speech recognition into a structured semantic representation and is therefore very sensitive to speech recognition errors. In semantic understanding, semantic slot filling will typically be used. In order to improve the robustness of semantic understanding to speech recognition errors, the predicted slot values of semantic slot filling are corrected by using a rule-based correction model. Thereby ensuring the accuracy of the semantic understanding of the spoken language.
In the process of implementing the invention, the inventor finds that at least the following problems exist in the related art:
the drawback of these methods is that the slot filling model and the rule-based error correction model are independent of each other, and even though the two models are trained separately, the quality of the correction result is greatly limited by the rule error correction model. However, the error correction should be a post-processing module and should not affect the semantic understanding of the spoken language too much. Making spoken semantic understanding less robust to speech recognition.
Disclosure of Invention
The method aims to at least solve the problem that in the prior art, a slot filling model and a rule-based error correction model are independent from each other in the spoken language semantic understanding, so that the robustness of the spoken language understanding to speech recognition errors is poor.
In a first aspect, an embodiment of the present invention provides a semantic slot filling model training method, including:
training a first training data set with labels to generate a first semantic slot filling model;
inputting a second training data set of automatic voice recognition into the first semantic slot filling model, and determining a first semantic slot value pair;
correcting the first semantic slot value pair by an error correction module based on rules to determine a second semantic slot value pair, wherein the error correction module corrects the first semantic slot value pair based on preset rules;
and performing strategy gradient training on the first semantic slot filling model based on the second semantic slot value pair, and determining a trained second semantic slot filling model.
In a second aspect, an embodiment of the present invention provides a semantic slot filling model training system, including:
the data training program module is used for training a first training data set with labels to generate a first semantic slot filling model;
a semantic slot value pair determining program module, configured to input a second training data set for automatic speech recognition to the first semantic slot filling model, and determine a first semantic slot value pair;
a correcting program module, configured to correct the first semantic slot value pair by using a rule-based error correcting module, and determine a second semantic slot value pair, where the error correcting module corrects the first semantic slot value pair based on a preset rule;
and the semantic slot filling model training program module is used for performing strategy gradient training on the first semantic slot filling model based on the second semantic slot value pair to determine a trained second semantic slot filling model.
In a third aspect, an electronic device is provided, comprising: the apparatus includes at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the steps of the semantic slot filling model training method of any of the embodiments of the present invention.
In a fourth aspect, an embodiment of the present invention provides a storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the steps of the semantic slot filling model training method according to any embodiment of the present invention.
The embodiment of the invention has the beneficial effects that: rule-based error correction is directly introduced into the training method through reinforcement learning and is used for a slot filling task in spoken language semantic understanding. On the one hand domain knowledge is utilized and on the other hand two modules of slot filling and error correction are connected. Thereby improving the robustness of semantic understanding to speech recognition errors.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
FIG. 1 is a flowchart of a semantic slot filling model training method according to an embodiment of the present invention;
FIG. 2 is a model architecture diagram of a semantic slot filling model training method according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating results of a test set of a semantic slot filling model training method according to an embodiment of the present invention;
FIG. 4 is a diagram illustrating an example of a semantic slot filling model training method according to an embodiment of the present invention;
FIG. 5 is a performance diagram of a semantic slot filling model training method according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of a semantic slot filling model training system according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a flowchart of a semantic slot filling model training method according to an embodiment of the present invention, which includes the following steps:
s11: training a first training data set with labels to generate a first semantic slot filling model;
s12: inputting a second training data set of automatic voice recognition into the first semantic slot filling model, and determining a first semantic slot value pair;
s13: correcting the first semantic slot value pair by an error correction module based on rules to determine a second semantic slot value pair, wherein the error correction module corrects the first semantic slot value pair based on preset rules;
s14, based on the second semantic slot value pair, strategy gradient training is carried out on the first semantic slot filling model, and the trained second semantic slot filling model is determined.
In the embodiment, in order to solve the defects in the prior art, an error correction module is further introduced in the training process of the slot filling model, and since the correction process is a rule-based non-derivable process, the training is performed by using a strategy gradient transfer method in reinforcement learning. The correction module is considered in the training process, so that the output of the slot filling model can be better used for the correction module, and the robustness of the semantic understanding on the speech recognition is improved. The method further comprises two modules, a semantic slot filling model, and a rule-based error correction module.
For step S11, appropriate data, including manually labeled real text, and text for speech recognition hypotheses, needs to be prepared for training the semantic slot filling model. Both of which are used during the training phase. The first data is manually labeled real text data, and each word is clearly labeled, so that the slot filling task can be converted into a sequence labeling task to be processed, and the semantic slot filling model is trained.
As an embodiment, the first training data set with labels is trained via a bidirectional long-term and short-term memory network.
For step S12, the text of the speech recognition hypothesis, i.e., the text of the automatic speech recognition, is input to the semantic slot filling model trained in step S11, as shown in the model architecture diagram of fig. 2, the user speaks "i want to go to the quiet zone", but mistakenly recognizes "i want to go to the quiet bay" due to an error in the speech recognition. The wrong first semantic slot value pair is obtained, and for convenience of representation, represented by a semantic triple, resulting in (interior-end-quiet bay area). Because of the error of semantic slot value pair, "i want to go to quiet bay" can not get the correct alignment mark.
For step S13, the error correction module is composed of a plurality of rules that are gradually enriched by continuously collecting errors of speech recognition in daily use. The "i want to go to quiet bay" (in-end-quiet bay) is corrected based on the error correction module, for example, the corrected user's original meaning "i want to go to quiet bay". The resulting slot value pair is "in form-end-quiet zone". The real text is aligned and labeled to obtain 'I, O, B-in-endpoint, I-in-endpoint area and I-in-endpoint'.
For step S14, the correction process is considered to be a rule-based non-derivable process and is thus trained using a strategic gradient delivery method in reinforcement learning, wherein the strategic gradient delivery method includes Pre-training and R L-training reinforcement learning training.
It can be seen from this embodiment that rule-based error correction is introduced directly into the training method through reinforcement learning for slot filling tasks in spoken semantic understanding. On the one hand domain knowledge is utilized and on the other hand two modules of slot filling and error correction are connected. Thereby improving the robustness of semantic understanding to speech recognition errors.
As an implementation manner, in this embodiment, after determining the trained second semantic slot filling model, the method further includes:
receiving a test data set;
inputting the test data set into the second semantic slot filling model, and determining slot value pairs before correction;
and inputting the slot value pair before correction into the error correction module to obtain a final slot value pair.
In this embodiment, in order to verify the effect of the trained second semantic slot filling model, a pre-prepared test data set is input into the second semantic slot filling model to obtain the slot value pair before correction, for example, the test data set may be a hypothetical text of speech recognition. And inputting the slot value pair before correction into an error correction module for correction to obtain a final slot value pair.
It can be seen from this embodiment that the robustness of semantic understanding to speech recognition errors is further improved by test checking.
To address this issue, the method proposes a policy gradient-based reinforcement learning (R L) method to optimize the S2U model to take into account the final performance after error correction 539.
The present method is fully described by defining some symbols which will be used hereinafter. Wherein, let r ═ (r)1...r|r|) And u ═ u (u)1...u|u|) Respectively represent ASR (Automatic Speech Recognition) best-recognized text and real text, y ═ y1...y|y|) Stands for sentences in the form of act (slot) triplesSub-level semantic tags, and o ═ o (o)1... O | u |) represents a word-level label on u in "BIO" mode (B-begin, I-inside, O-outslide).
B L STM (Bidirectional L ong Short-Term Memory Network, Bidirectional long-and Short-Term Memory Network) encoder reads input sequence x (u or r), and reads input sequence x (u or r) through the encoder
Figure BDA0002432847010000061
Generating a hidden state at the t-th time step, wherein
Figure BDA0002432847010000062
And
Figure BDA0002432847010000063
is the hidden state L STM decoder passes s at the t time stept=LSTM(st-1,ψ(ot-1),ct) Recursively updating its hidden state, where ψ (●) is a label embedding function, ctIn the focusing mechanism, i.e. htOnly the hidden state of the alignment is considered. s0By using
Figure BDA0002432847010000069
And (5) initializing. Then, passing P (o)t|o<t;x)=g(St) Generating a Slot tag otWhere g denotes a linear layer followed by a Softmax function for classification.
Assume that there is a predicted act triple a (s v). Representing a corresponding act-slot-value triple candidate set in the current domain ontology as V ═ V (V)1...V|V|). Based on the ontology, firstly constructing an n-gram word Gn. Each value is considered as a word sequence v ═ (v ═ v)1...vM) N-gram set of vn={(vi...vi+n-1) M-n +1 }. Then, a binary-valued eigenvector d ═ is established for v (d)1...d|Gn|) Wherein
Figure BDA0002432847010000064
And by
Figure BDA0002432847010000065
It is normalized. Similarly, the candidate set of values V can be represented as a feature matrix (after normalization)
Figure BDA0002432847010000066
The k-th column is VkThe feature vector of (2). Therefore, the best candidate value can be found in a manner similar to cosine similarity. The index of the best value is:
Figure BDA0002432847010000067
since some slots have many possible values in the ontology, efficiency can be greatly improved by simply performing matrix multiplication. In practice, n ranges from 1 to 2, so the vocabulary amount is equal to | G1| + | G2 |. A threshold (here 0.5) is set to reject bad selections.
To prune the larger search space, the model is pre-trained using labeled real text to lead R L training.
Let DtscpDenotes real text with alignment labels. The slot filling model is supervised by negative log likelihood loss:
Figure BDA0002432847010000068
where θ represents the model parameters.
In the R L training phase, automatic speech recognition hypotheses are used in conjunction with a misalignment marker, labeled Dhyp{ (r, y) }. The slot filling model samples through a beam search to generate K tag sequences and then triples act (slot). Finally, a set of semantic tuples is generated after the EC module
Figure BDA0002432847010000071
For each input speech r, the reward is considered at both the three levels and the sentence level:
Figure BDA0002432847010000072
where the first term penalizes False Positives (FP) and False Negatives (FN) at the triplet level and the second term is a binary value that indicates whether the entire sentence is predicted correctly. The model is optimized by maximizing the expected cumulative reward using strategic gradient descent. The policy gradient can be calculated as:
Figure BDA0002432847010000073
wherein
Figure BDA0002432847010000074
Is a baseline for reducing the variance of the gradient estimate, which is obtained by averaging the rewards inside the bundle.
To stabilize the training process, use D alternatelytscpAnd DhypTraining is beneficial.
Experiments were conducted on the above on the first chinese audio text spoken language understanding challenge (CATS L U) dataset containing four dialog domains (map, music, video, weather).
The 200-dimensional char embedding may be initialized by pre-training a L STM-based bi-directional language model (bi L M) using a zhwiki3 corpus, L STM is a single layer with 256 hidden cells, in the training process, the parameters are uniformly sampled in the range of (-0.2; 0.2), dropout with a probability of 0.5 is applied to the acyclic layer, Adpout is selected as the optimizer, the learning rate is set to 0.001 in pre-am training, 0.0005 in R L training, fixed during training, the bundle size is set to 5 in the decoding phase, the best model is selected based on the performance on the validation set, and then the F-score and sentence-level accuracy of act (slot) triplets are evaluated.
By displaying the primary results compared to different benchmarks. In the evaluation phase, all experiments were error corrected. The following criteria were studied:
HD: only unaligned data is employed.
Focus: the annotated real text is trained and the ASR hypotheses are evaluated.
UA, changing the groove filling model from B L STM to Focus.
DA: a data enhancement method in which training data is enhanced by pseudo-aligned ASR hypotheses in two ways: (1) generated from a pre-trained marker model (Gen); (2) aligned with the real text by the minimum edit distance (Align).
The results diagram of the test set shown in fig. 3 shows the overall results of the test set. The results show that models trained in an end-to-end fashion using unaligned data ("HD") are less effective than labeled models ("Focus"). The "UA" method is transferred from real text to ASR hypotheses and obtains results comparable to Focus. No increase was found using the "UA" and "DA" methods, possibly due to noisy datasets. Compared with the "focus" and "DA" benchmarks, the proposed model has significant improvement except in the music domain (the significance level is 95% in the video and weather domain and 90% in the map domain).
FIG. 4 gives an example of how R L training may provide benefits for vacancy filling.A reference model identifies two bin blocks, "company" and "Ganhezi town," separated by the special word "is," which would generate erroneous bin value pairs.S L U models learn by R L to produce outputs more suitable for correction.
The effectiveness of each sub-module in the model was studied by ablation studies. As can be seen from the upper half of the performance diagram shown in FIG. 5, if only ASR hypothesis D is usedhyp(i.e. without "Tscp") training, then due to the lack of a strong supervisory signal from the real text, the training is not performedThe performance may be degraded. Without pre-training ("PT"), the performance of the system also decreased (F score of 0.47%, joint accuracy of 0.72%), indicating the importance of pre-training. Furthermore, without any real text supervision, the average performance drops dramatically. This is because searching in a large space is difficult.
Fig. 6 is a schematic structural diagram of a semantic slot filling model training system according to an embodiment of the present invention, which can execute the semantic slot filling model training method according to any of the above embodiments and is configured in a terminal.
The semantic slot filling model training system provided by the embodiment comprises: data training program module 11, semantic slot value pair determination program module 12, correction program module 13, and semantic slot filling model training program module 14.
The data training program module 11 is configured to train a first training data set with labels to generate a first semantic slot filling model; the semantic slot value pair determining program module 12 is configured to input the second training data set for automatic speech recognition to the first semantic slot filling model, and determine a first semantic slot value pair; the correcting program module 13 is configured to correct the first semantic slot value pair by using a rule-based error correcting module, and determine a second semantic slot value pair, where the error correcting module corrects the first semantic slot value pair based on a preset rule; the semantic slot filling model training program module 14 is configured to perform policy gradient training on the first semantic slot filling model based on the second semantic slot value pair, and determine a trained second semantic slot filling model.
Further, the system includes a test program module for:
receiving a test data set;
inputting the test data set into the second semantic slot filling model, and determining slot value pairs before correction;
and inputting the slot value pair before correction into the error correction module to obtain a final slot value pair.
Further, the data training program module is to:
and training the first training data set with the labels through a bidirectional long-time memory network.
Further, the semantic slot value pair comprises a semantic triple.
Further, the strategy gradient training comprises Pre-training and R L-training reinforcement learning training.
The embodiment of the invention also provides a nonvolatile computer storage medium, wherein the computer storage medium stores computer executable instructions which can execute the semantic slot filling model training method in any method embodiment;
as one embodiment, a non-volatile computer storage medium of the present invention stores computer-executable instructions configured to:
training a first training data set with labels to generate a first semantic slot filling model;
inputting a second training data set of automatic voice recognition into the first semantic slot filling model, and determining a first semantic slot value pair;
correcting the first semantic slot value pair by an error correction module based on rules to determine a second semantic slot value pair, wherein the error correction module corrects the first semantic slot value pair based on preset rules;
and performing strategy gradient training on the first semantic slot filling model based on the second semantic slot value pair, and determining a trained second semantic slot filling model.
As a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules, such as program instructions/modules corresponding to the methods in embodiments of the present invention. One or more program instructions are stored in a non-transitory computer readable storage medium that, when executed by a processor, perform a semantic slot filling model training method in any of the method embodiments described above.
The non-volatile computer-readable storage medium may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the device, and the like. Further, the non-volatile computer-readable storage medium may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, the non-transitory computer readable storage medium optionally includes memory located remotely from the processor, which may be connected to the device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
An embodiment of the present invention further provides an electronic device, which includes: the apparatus includes at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the steps of the semantic slot filling model training method of any of the embodiments of the present invention.
The client of the embodiment of the present application exists in various forms, including but not limited to:
(1) mobile communication devices, which are characterized by mobile communication capabilities and are primarily targeted at providing voice and data communications. Such terminals include smart phones, multimedia phones, functional phones, and low-end phones, among others.
(2) The ultra-mobile personal computer equipment belongs to the category of personal computers, has calculation and processing functions and generally has the characteristic of mobile internet access. Such terminals include PDA, MID, and UMPC devices, such as tablet computers.
(3) Portable entertainment devices such devices may display and play multimedia content. The devices comprise audio and video players, handheld game consoles, electronic books, intelligent toys and portable vehicle-mounted navigation devices.
(4) Other electronic devices with data processing capabilities.
In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A semantic slot filling model training method comprises the following steps:
training a first training data set with labels to generate a first semantic slot filling model;
inputting a second training data set of automatic voice recognition into the first semantic slot filling model, and determining a first semantic slot value pair;
correcting the first semantic slot value pair by an error correction module based on rules to determine a second semantic slot value pair, wherein the error correction module corrects the first semantic slot value pair based on preset rules;
and performing strategy gradient training on the first semantic slot filling model based on the second semantic slot value pair, and determining a trained second semantic slot filling model.
2. The method of claim 1, wherein after determining the trained second semantic slot filling model, the method further comprises:
receiving a test data set;
inputting the test data set into the second semantic slot filling model, and determining slot value pairs before correction;
and inputting the slot value pair before correction into the error correction module to obtain a final slot value pair.
3. The method of claim 1, wherein the training the labeled first training data set comprises:
and training the first training data set with the labels through a bidirectional long-time memory network.
4. The method of claim 1, wherein the semantic slot value pairs comprise semantic triples.
5. The method of claim 1, wherein the strategy gradient training comprises Pre-training and R L-training reinforcement learning training.
6. A semantic slot filling model training system, comprising:
the data training program module is used for training a first training data set with labels to generate a first semantic slot filling model;
a semantic slot value pair determining program module, configured to input a second training data set for automatic speech recognition to the first semantic slot filling model, and determine a first semantic slot value pair;
a correcting program module, configured to correct the first semantic slot value pair by using a rule-based error correcting module, and determine a second semantic slot value pair, where the error correcting module corrects the first semantic slot value pair based on a preset rule;
and the semantic slot filling model training program module is used for performing strategy gradient training on the first semantic slot filling model based on the second semantic slot value pair to determine a trained second semantic slot filling model.
7. The system of claim 6, wherein the system further comprises a test program module to:
receiving a test data set;
inputting the test data set into the second semantic slot filling model, and determining slot value pairs before correction;
and inputting the slot value pair before correction into the error correction module to obtain a final slot value pair.
8. The system of claim 6, wherein the data training program module is to:
and training the first training data set with the labels through a bidirectional long-time memory network.
9. The system of claim 6, wherein the semantic slot value pairs comprise semantic triples.
10. The system of claim 6, wherein the strategy gradient training includes Pre-training and R L-training reinforcement learning training.
CN202010248117.7A 2020-03-31 2020-03-31 Semantic slot filling model training method and system Active CN111462734B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010248117.7A CN111462734B (en) 2020-03-31 2020-03-31 Semantic slot filling model training method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010248117.7A CN111462734B (en) 2020-03-31 2020-03-31 Semantic slot filling model training method and system

Publications (2)

Publication Number Publication Date
CN111462734A true CN111462734A (en) 2020-07-28
CN111462734B CN111462734B (en) 2022-07-26

Family

ID=71684351

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010248117.7A Active CN111462734B (en) 2020-03-31 2020-03-31 Semantic slot filling model training method and system

Country Status (1)

Country Link
CN (1) CN111462734B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111951789A (en) * 2020-08-14 2020-11-17 北京达佳互联信息技术有限公司 Training of speech recognition model, speech recognition method, apparatus, device and medium
CN112380327A (en) * 2020-11-09 2021-02-19 天翼爱音乐文化科技有限公司 Cold-start slot filling method, system, device and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110144986A1 (en) * 2009-12-10 2011-06-16 Microsoft Corporation Confidence calibration in automatic speech recognition systems
CN107240398A (en) * 2017-07-04 2017-10-10 科大讯飞股份有限公司 Intelligent sound exchange method and device
CN108417205A (en) * 2018-01-19 2018-08-17 苏州思必驰信息科技有限公司 Semantic understanding training method and system
CN108628830A (en) * 2018-04-24 2018-10-09 北京京东尚科信息技术有限公司 A kind of method and apparatus of semantics recognition
CN108920497A (en) * 2018-05-23 2018-11-30 北京奇艺世纪科技有限公司 A kind of man-machine interaction method and device
CN108962224A (en) * 2018-07-19 2018-12-07 苏州思必驰信息科技有限公司 Speech understanding and language model joint modeling method, dialogue method and system
CN110929875A (en) * 2019-10-12 2020-03-27 平安国际智慧城市科技股份有限公司 Intelligent language learning method, system, device and medium based on machine learning

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110144986A1 (en) * 2009-12-10 2011-06-16 Microsoft Corporation Confidence calibration in automatic speech recognition systems
CN107240398A (en) * 2017-07-04 2017-10-10 科大讯飞股份有限公司 Intelligent sound exchange method and device
CN108417205A (en) * 2018-01-19 2018-08-17 苏州思必驰信息科技有限公司 Semantic understanding training method and system
CN108628830A (en) * 2018-04-24 2018-10-09 北京京东尚科信息技术有限公司 A kind of method and apparatus of semantics recognition
CN108920497A (en) * 2018-05-23 2018-11-30 北京奇艺世纪科技有限公司 A kind of man-machine interaction method and device
CN108962224A (en) * 2018-07-19 2018-12-07 苏州思必驰信息科技有限公司 Speech understanding and language model joint modeling method, dialogue method and system
CN110929875A (en) * 2019-10-12 2020-03-27 平安国际智慧城市科技股份有限公司 Intelligent language learning method, system, device and medium based on machine learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
侯丽仙等: "融合多约束条件的意图和语义槽填充联合识别", 《计算机科学与探索》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111951789A (en) * 2020-08-14 2020-11-17 北京达佳互联信息技术有限公司 Training of speech recognition model, speech recognition method, apparatus, device and medium
CN111951789B (en) * 2020-08-14 2021-08-17 北京达佳互联信息技术有限公司 Training of speech recognition model, speech recognition method, apparatus, device and medium
CN112380327A (en) * 2020-11-09 2021-02-19 天翼爱音乐文化科技有限公司 Cold-start slot filling method, system, device and storage medium
CN112380327B (en) * 2020-11-09 2022-03-04 天翼爱音乐文化科技有限公司 Cold-start slot filling method, system, device and storage medium

Also Published As

Publication number Publication date
CN111462734B (en) 2022-07-26

Similar Documents

Publication Publication Date Title
US11238845B2 (en) Multi-dialect and multilingual speech recognition
US11586930B2 (en) Conditional teacher-student learning for model training
CN110516253B (en) Chinese spoken language semantic understanding method and system
CN110556100B (en) Training method and system of end-to-end speech recognition model
CN107844481B (en) Text recognition error detection method and device
CN112528637B (en) Text processing model training method, device, computer equipment and storage medium
CN111382231B (en) Intention recognition system and method
CN116127953B (en) Chinese spelling error correction method, device and medium based on contrast learning
CN114596844A (en) Acoustic model training method, voice recognition method and related equipment
CN111462734B (en) Semantic slot filling model training method and system
CN112767921A (en) Voice recognition self-adaption method and system based on cache language model
CN110992943B (en) Semantic understanding method and system based on word confusion network
CN115017890A (en) Text error correction method and device based on character pronunciation and character font similarity
CN113571045B (en) Method, system, equipment and medium for identifying Minnan language voice
CN117877460A (en) Speech synthesis method, device, speech synthesis model training method and device
CN113705207A (en) Grammar error recognition method and device
CN113160801B (en) Speech recognition method, device and computer readable storage medium
CN115525749A (en) Voice question-answering method, device, electronic equipment and storage medium
CN115376547A (en) Pronunciation evaluation method and device, computer equipment and storage medium
CN115713082A (en) Named entity identification method, device, equipment and storage medium
CN113012685B (en) Audio recognition method and device, electronic equipment and storage medium
CN115240712A (en) Multi-mode-based emotion classification method, device, equipment and storage medium
CN112560431A (en) Method, apparatus, device, storage medium, and computer program product for generating test question tutoring information
CN113096646A (en) Audio recognition method and device, electronic equipment and storage medium
CN112735380B (en) Scoring method and voice recognition method for re-scoring language model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 215123 building 14, Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou City, Jiangsu Province

Applicant after: Sipic Technology Co.,Ltd.

Address before: 215123 building 14, Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou City, Jiangsu Province

Applicant before: AI SPEECH Co.,Ltd.

GR01 Patent grant
GR01 Patent grant