CN109637536B

CN109637536B - Method and device for automatically identifying semantic accuracy

Info

Publication number: CN109637536B
Application number: CN201811611680.5A
Authority: CN
Inventors: 林婷; 吴有宝
Original assignee: AI Speech Ltd
Current assignee: Sipic Technology Co Ltd
Priority date: 2018-12-27
Filing date: 2018-12-27
Publication date: 2020-11-13
Anticipated expiration: 2038-12-27
Also published as: CN109637536A

Abstract

The invention discloses a method for automatically identifying semantic accuracy, which comprises the following steps: acquiring a voice instruction; recognizing and analyzing the voice command to obtain a semantic analysis result; detecting the semantic analysis result, and determining the accuracy of the semantic analysis result; and outputting a detection result according to the accuracy of the semantic parsing result. The invention also discloses a device for automatically identifying the semantic accuracy. The method and the device disclosed by the invention can be used for recognizing the voice information in batch and outputting the corresponding correct semantic content, and the fields, slots, control values and the like corresponding to the linguistic data in the voice information can be seen clearly only by simple one-key operation, so that the correct rate of semantic recognition can be counted efficiently, the performance of the semantic recognition can be analyzed, and the working efficiency and the adaptation accuracy of the voice integrated client are improved.

Description

Method and device for automatically identifying semantic accuracy

Technical Field

The invention relates to the technical field of voice processing, in particular to a method and a device for automatically identifying semantic accuracy.

Background

Speech recognition is a technology for recognizing corresponding text content from speech waveforms, and is one of important technologies in the field of artificial intelligence. Current speech recognition methods are generally based on: an acoustic model, a pronunciation dictionary, and a language model. The acoustic model is trained through a deep neural network, the language model is generally a statistical language model, and the pronunciation dictionary records the corresponding relation between words and phonemes and is a link connecting the acoustic model and the language model.

With the wide application of voice interaction technology, voice-based interaction equipment is more and more favored by users, and the key for the equipment to complete corresponding voice interaction scenes is to recognize voice instructions and make correct response by using the voice recognition technology, so that the accuracy of voice recognition is a key element for determining the performance of the voice interaction equipment and the user experience.

In order to provide a high-quality voice interactive product to users, before the voice interactive product is marketed, the voice interactive product is firstly subjected to a voice test to detect the accuracy of the voice command recognition and response. In the existing mode, voice testing is performed, testers manually read voice instructions one by one to judge, the efficiency is very low, and the testing cost is also very high.

Disclosure of Invention

In order to solve the above problems, an objective of the present invention is to provide a tool capable of automatically identifying semantic accuracy, so as to meet the test requirement for processing a large amount of speech and improve the test efficiency.

Meanwhile, the invention also aims to automatically judge the voice processing effect, such as accuracy, on the basis of automatically processing the voice in batch, so as to further improve the testing efficiency and reduce the testing cost.

In addition, the invention also aims to simplify the implementation method of the tool, so that the tool is easy to implement.

Based on this, according to a first aspect of the present invention, there is provided a method for automatically recognizing semantic accuracy, comprising the steps of:

acquiring a voice instruction;

recognizing and analyzing the voice command to obtain a semantic analysis result;

detecting the semantic analysis result, and determining the accuracy of the semantic analysis result;

and outputting a detection result according to the accuracy of the semantic parsing result.

According to a second aspect of the present invention, there is provided an apparatus for automatically identifying semantic accuracy, comprising:

the voice acquisition module is used for acquiring a voice instruction;

the voice analysis module is used for identifying and analyzing the voice command to obtain a semantic analysis result;

the checking module is used for detecting the semantic analysis result and determining the accuracy of the semantic analysis result;

and the result presentation module is used for outputting the detection result according to the accuracy of the semantic parsing result.

According to a third aspect of the present invention, there is provided an electronic apparatus comprising: the computer-readable medium includes at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor, the instructions being executable by the at least one processor to enable the at least one processor to perform the steps of the above-described method.

According to a fourth aspect of the invention, a storage medium is provided, on which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method.

The device and the method provided by the invention can be used for recognizing the voice information in batch, presenting the accuracy of the recognized corresponding semantic content, and seeing the detection result of the accuracy of the semantic analysis result clearly only by simple one-key operation, thereby being capable of efficiently counting the accuracy of the semantic recognition, analyzing the performance of the semantic recognition and improving the working efficiency and the adaptation accuracy of the voice integration client.

Drawings

FIG. 1 is a flow chart of a method for automatically identifying semantic accuracy in accordance with one embodiment of the present invention;

FIG. 2 is a schematic block diagram of an apparatus for automatically identifying semantic accuracy in accordance with an embodiment of the present invention;

fig. 3 is a block diagram of an electronic device according to an embodiment of the invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.

The invention may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

As used in this disclosure, "module," "device," "system," and the like are intended to refer to a computer-related entity, either hardware, a combination of hardware and software, or software in execution. In particular, for example, an element may be, but is not limited to being, a process running on a processor, an object, an executable, a thread of execution, a program, and/or a computer. Also, an application or script running on a server, or a server, may be an element. One or more elements may be in a process and/or thread of execution and an element may be localized on one computer and/or distributed between two or more computers and may be operated by various computer-readable media. The elements may also communicate by way of local and/or remote processes based on a signal having one or more data packets, e.g., from a data packet interacting with another element in a local system, distributed system, and/or across a network in the internet with other systems by way of the signal.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The method for automatically identifying semantic accuracy in the embodiment of the invention can be applied to any terminal equipment configured with a voice function, such as a smart phone, a tablet computer, a smart home and other terminal equipment, and the invention is not limited to this.

The present invention will be described in further detail with reference to the accompanying drawings.

FIG. 1 schematically shows a flow diagram of a method for automatically identifying semantic accuracy according to one embodiment of the present invention. As shown in fig. 1, the present embodiment includes the following steps:

step S101: and acquiring a voice instruction.

For example, the voice instruction may be obtained by synthesizing configured corpora, specifically, a corpus configuration file is first obtained, where the corpus configuration file stores specific corpus information, and the specific corpus information is stored in a specified path position, and the corpus configuration file may be obtained by reading a corresponding file in the specified path position. Preferably, the corpus configuration file is implemented as an excel file, the corpus configuration file in the excel format is read through a python script to obtain corpus content, and then the obtained corpus content is synthesized into TTS voice through a voice synthesis technology, that is, a voice instruction.

In some embodiments, the acquiring of the voice instruction may also be a voice instruction for directly acquiring an audio file, in which case, the audio file corresponding to the voice instruction may be stored in the specified path, and when the processing is started, the audio file of all the voice instructions may be acquired directly in the specified path.

Step S102: and identifying and analyzing the voice command to obtain a semantic analysis result.

The voice command can be recognized and analyzed according to the voice recognition and semantic analysis technology in the prior art, that is, the voice command is converted into a text one by one through the voice recognition technology, and the recognized text is subjected to semantic analysis to obtain a final semantic analysis result. As a preferred embodiment, the obtained semantic parsing result may be implemented as a domain, a classification (slot), a control value, and the like, where the domain is used to identify a service domain corresponding to the voice command, such as navigation, MUSIC, and the like, the classification is used to identify an object pointed by the voice command, and the control value is used to identify an action to be performed by the voice command.

Step S103: the semantic parsing result is detected, the accuracy of the semantic parsing result is determined, the accuracy of the semantic parsing result is judged, the semantic parsing result is mainly achieved by comparison based on standard linguistic data, and based on different modes of acquiring a voice command, the following two specific implementation examples can be provided:

firstly, as for the semantic parsing result of the voice instruction synthesized by the corpus configuration file, when the corpus configuration file is configured, the corpus content is set in the corpus configuration file, at this time, the corpus in the corpus configuration file is used as a standard column, namely, a standard corpus, the semantic parsing result is compared with the corresponding corpus in the corpus configuration file, and the accuracy is determined according to the comparison result.

Secondly, for a voice instruction obtained by directly obtaining an audio file, detecting a semantic parsing result of the voice instruction through a pre-configured standard corpus, specifically, configuring a standard corpus file for the audio file, performing associated binding on an identifier (such as a name, a path or an ID) of the audio file and standard corpus content in the standard corpus file to form a mapping file, after obtaining the semantic parsing result, comparing the semantic parsing result with the standard corpus content in the standard corpus file, and determining accuracy according to the comparison result.

Step S104: and outputting a detection result according to the accuracy of the semantic parsing result.

The manner of outputting the detection result may be, for example, by outputting and displaying the corresponding voice command with the detection result being an error, or by highlighting the file line corresponding to the voice command with the detection result being an error, taking the latter as an example, for different manners of acquiring the voice command, and the step may be implemented as:

for the case of obtaining the voice command through the corpus configuration file, the background color of the corpus row corresponding to the voice command (i.e. the row where the corpus corresponding to the voice command is located) is highlighted, which may be the background color of the row where the corresponding corpus of the corpus configuration file is located is modified through a python script according to the accuracy of the semantic parsing result. Illustratively, if the semantic result of a certain voice instruction is low in accuracy, the background color of the corpus with low accuracy of the corpus configuration excel file corresponding to the voice instruction is modified to red by the corresponding function (calling the corresponding interface function) of the python script. The method has the advantages that the file reading and the file background color modification are carried out through the pyhton script, the method is simple to realize, convenient to research and develop, high in execution and response efficiency, and easy to realize and simplify the development process compared with other realization modes.

In the case of obtaining a voice command through an audio file, the background color of the corresponding row of the standard corpus file may be modified.

Through the steps, the voice instructions can be led in batch by one key to carry out recognition testing, the accuracy of the recognition result can be automatically analyzed, the accuracy of the recognition result can be visually displayed, and the method is simple and visual. Therefore, according to the method of the embodiment, automatic recognition of a large batch of voice instructions can be realized, the accuracy of semantic recognition can be efficiently counted, the performance of the semantic recognition can be analyzed, and the working efficiency and the adaptation accuracy of the voice integration client are improved.

As a preferred embodiment, in the process of performing the above method, in addition to outputting the detection result, the semantic analysis result may be directly output and displayed in the corresponding corpus row, and it is also convenient for the user to directly check whether the voice processing result is reasonable, in this case, the processing of background color change may be combined, or only the semantic analysis result may be output alone without background color change, and the user performs comparison and confirmation, obviously, the above implementation is more preferable. Because the output semantic analysis result comprises the field, classification and control value (action corresponding to the voice instruction) of the corpus corresponding to the voice instruction, compared with the traditional test mode, the semantic analysis method also achieves the effects of high efficiency, intuition and rapidness.

Fig. 2 schematically shows a schematic block diagram of an apparatus for automatically identifying semantic accuracy according to an embodiment of the present invention, as shown in fig. 2,

the device for automatically identifying semantic accuracy comprises a voice acquisition module 201, a voice analysis module 202, a proofreading module 203 and a result presentation module 204.

The voice acquiring module 201 is configured to acquire a voice instruction, and may be implemented as an audio acquiring apparatus having a sound pickup function, or may be a file reading module configured to read a corpus configuration file or an audio file, and when the file reading module is the latter, the module may be implemented by a python script.

When the voice instruction is obtained by reading the corpus configuration file, the voice obtaining module 201 is implemented to include a corpus obtaining unit 2011 and a voice synthesizing unit 2012. The corpus acquiring unit 2011 is configured to acquire a corpus configuration file and read a corpus. The speech synthesis unit 2012 is configured to synthesize the read corpus into speech instruction output.

The voice parsing module 202 is configured to recognize and parse the voice command to obtain a semantic parsing result, and the implementation manner of the voice parsing module may refer to the voice recognition and semantic parsing technology in the prior art.

The checking module 203 is configured to detect a semantic parsing result and determine the accuracy of the semantic parsing result, and the implementation manner of the checking module may refer to the above method.

The result presenting module 204 is configured to output a detection result according to the accuracy of the semantic analysis result, where the detection result is to highlight a background color of a row where a corpus corresponding to the voice instruction with a low accuracy of the semantic analysis result is located, for example, to change the background color into red.

In other embodiments, the device may not have a checking module, but directly perform output display of the semantic parsing result through the result presentation module 204, where the output displayed semantic parsing result includes a field where the voice command is located, a category, and a control value (an action corresponding to the voice command), and is displayed in different columns of a row where the standard corpus corresponding to the voice command is located, so as to achieve an effect of facilitating checking and comparing.

The device according to the embodiment can automatically perform voice recognition on the obtained large-batch voice instructions, and can show parameters such as the field, the object, the control value, the accuracy and the like of the voice instructions, so that the accuracy of semantic recognition can be efficiently counted, the performance of the semantic recognition can be analyzed, and the working efficiency and the adaptation accuracy of the voice integrated client are improved.

In the above embodiment, the operation on the corpus configuration file (such as reading corpus, writing semantic parsing result, writing corpus, and changing background color) and the judgment on the accuracy of the semantic parsing result may be implemented by a python script, and the conversion of the input corpus into a voice command and the generation of the semantic parsing result may be implemented by C language, so as to simplify the development process.

In some embodiments, the present invention provides a non-transitory computer readable storage medium, in which one or more programs including executable instructions are stored, and the executable instructions can be read and executed by an electronic device (including but not limited to a computer, a server, or a network device, etc.) to perform any one of the above methods for automatically recognizing semantic accuracy.

In some embodiments, the present invention further provides a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, cause the computer to perform any of the above methods for automatically identifying semantic accuracy.

In some embodiments, an embodiment of the present invention further provides an electronic device, which includes: the system includes at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor, the instructions being executable by the at least one processor to enable the at least one processor to perform a method of automatically identifying semantic accuracy.

In some embodiments, the present invention further provides a storage medium having a computer program stored thereon, wherein the computer program is configured to automate the method of identifying semantic accuracy when executed by a processor.

The device for automatically identifying semantic accuracy of the embodiment of the invention can be used for executing the method for automatically identifying semantic accuracy of the embodiment of the invention, and accordingly achieves the technical effect achieved by the method for automatically identifying semantic accuracy of the embodiment of the invention, and is not repeated here. In the embodiment of the present invention, the relevant functional module may be implemented by a hardware processor (hardware processor).

Fig. 3 is a schematic hardware structure diagram of an electronic device for performing a method for automatically identifying semantic accuracy according to an embodiment of the present invention, and as shown in fig. 3, the electronic device includes:

one or more processors 410 and a memory 420, with one processor 410 being an example in fig. 3.

The apparatus for performing the method of automatically recognizing semantic accuracy may further include: an input device 430 and an output device 440.

The processor 410, the memory 420, the input device 430, and the output device 440 may be connected by a bus or other means, such as the bus connection in fig. 3.

The memory 420, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules, such as program instructions/modules corresponding to the method for automatically recognizing semantic accuracy in the embodiments of the present application. The processor 410 executes various functional applications of the server and data processing by executing nonvolatile software programs, instructions and modules stored in the memory 420, namely, implementing the method for automatically recognizing semantic accuracy of the above-described method embodiment.

The memory 420 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created from use of a device that automatically recognizes semantic accuracy, and the like. Further, the memory 420 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, memory 420 may optionally include memory located remotely from processor 410, which may be connected over a network to a device that automates the recognition of semantic accuracy. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The input device 430 may receive input numeric or character information and generate signals related to user settings and function control of the device for automated recognition of semantic accuracy. The output device 440 may include a display device such as a display screen.

The one or more modules described above are stored in the memory 420 and, when executed by the one or more processors 410, perform a method of automatically identifying semantic accuracy in any of the method embodiments described above.

The product can execute the method provided by the embodiment of the application, and has the corresponding functional modules and beneficial effects of the execution method. For technical details that are not described in detail in this embodiment, reference may be made to the methods provided in the embodiments of the present application.

The electronic device of the embodiments of the present application exists in various forms, including but not limited to:

(1) mobile communication devices, which are characterized by mobile communication capabilities and are primarily targeted at providing voice and data communications. Such terminals include smart phones (e.g., iphones), multimedia phones, functional phones, and low-end phones, among others.

(2) The ultra-mobile personal computer equipment belongs to the category of personal computers, has calculation and processing functions and generally has the characteristic of mobile internet access. Such terminals include PDA, MID, and UMPC devices, such as ipads.

(3) An in-vehicle device: the equipment is applied to vehicle-mounted driving, and can be connected with other auxiliary systems of the automobile and the like.

(4) The server is similar to a general computer architecture, but has higher requirements on processing capability, stability, reliability, safety, expandability, manageability and the like because of the need of providing highly reliable services.

(5) And other electronic devices with data interaction functions.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a general hardware platform, and certainly can also be implemented by hardware. Based on such understanding, the above technical solutions substantially or contributing to the related art may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. The method for automatically identifying the semantic accuracy is characterized by comprising the following steps of:

setting specific language material information in an excel file, and setting a standard language material column to form a language material configuration file;

storing the corpus configuration file at a specified path position;

reading a corpus configuration file from a specified path position, and acquiring a voice instruction according to the corpus configuration file;

obtaining a semantic analysis result of a voice instruction, wherein the semantic analysis result comprises a field, a classification and a control value, the field is used for identifying a service field corresponding to the voice instruction, the classification is used for identifying an object pointed by the voice instruction, and the control value is used for identifying an action to be made by the voice instruction;

and outputting a detection result according to the accuracy of the semantic parsing result, wherein the detection result is realized by outputting and displaying the semantic parsing result in a corresponding corpus row of the corpus configuration file, and simultaneously, according to the accuracy of the semantic parsing result, modifying the background color of the row in which the corresponding corpus is located in the corpus configuration file.

2. The method according to claim 1, wherein said obtaining voice commands according to a corpus profile comprises

Obtaining a corpus configuration file and reading a corpus;

synthesizing the read corpus into a voice command and outputting the voice command;

the outputting the detection result according to the accuracy of the semantic parsing result comprises

And according to the accuracy of the semantic parsing result, modifying the background color of the row where the corresponding corpus of the corpus configuration file is located.

3. The method of claim 2, wherein the detecting the semantic analysis result and the determining the accuracy of the semantic analysis result comprise

And comparing the semantic parsing result with the standard linguistic data in the standard linguistic data column of the corresponding linguistic data in the linguistic data configuration file, and determining the accuracy according to the comparison result.

4. The method according to claim 2, wherein the obtaining of the corpus configuration file and the reading of the corpus, and the modifying of the background color of the row of the corresponding corpus of the corpus configuration file according to the accuracy of the semantic parsing result are all realized by a python script.

5. The method for automatically identifying the semantic accuracy is characterized by comprising the following steps of:

storing the audio file corresponding to the voice instruction to a specified path;

configuring a standard corpus file for the audio file, wherein an identifier of the audio file and a standard corpus corresponding to the audio file are arranged in the standard corpus file;

reading an audio file from a specified path position, and acquiring a voice instruction according to the audio file;

and outputting a detection result according to the accuracy of the semantic parsing result, wherein the detection result is realized by outputting and displaying the semantic parsing result on a corresponding corpus row of the standard corpus file corresponding to the audio file, and meanwhile, according to the accuracy of the semantic parsing result, modifying the background color of the row where the corresponding corpus in the standard corpus file corresponding to the audio file is located.

6. The method of claim 5, wherein the detecting the semantic analysis result and the determining the accuracy of the semantic analysis result comprise

And comparing the semantic parsing result with semantic content in the standard corpus file, and determining the accuracy according to the comparison result.

7. An apparatus for automatically recognizing semantic accuracy, comprising

The voice acquisition module is used for reading the corpus configuration file or the audio file from the specified path position and acquiring a voice instruction according to the corpus configuration file or the audio file;

the voice analysis module is used for identifying and analyzing the voice command to obtain a semantic analysis result, wherein the semantic analysis result comprises a field, a classification and a control value, the field is used for identifying a service field corresponding to the voice command, the classification is used for identifying an object pointed by the voice command, and the control value is used for identifying an action to be made by the voice command;

and the result presentation module is used for outputting the detection result according to the accuracy of the semantic parsing result, and outputting and displaying the semantic parsing result in a corresponding corpus row of the corpus configuration file or the standard corpus file corresponding to the audio file, and meanwhile, according to the accuracy of the semantic parsing result, modifying the background color of the row in which the corresponding corpus is located in the corpus configuration file or the standard corpus file corresponding to the audio file.

8. The apparatus of claim 7, wherein the voice capture module comprises

The corpus acquiring unit is used for acquiring a corpus configuration file from a specified path position and reading a corpus;

and the voice synthesis unit is used for synthesizing the read linguistic data into a voice command and outputting the voice command.

9. An electronic device, comprising: at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the steps of the method of any one of claims 1-6.

10. Storage medium on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 6.