CN111223481A - Information extraction method and device, computer readable storage medium and electronic equipment - Google Patents

Information extraction method and device, computer readable storage medium and electronic equipment Download PDF

Info

Publication number
CN111223481A
CN111223481A CN202010022597.5A CN202010022597A CN111223481A CN 111223481 A CN111223481 A CN 111223481A CN 202010022597 A CN202010022597 A CN 202010022597A CN 111223481 A CN111223481 A CN 111223481A
Authority
CN
China
Prior art keywords
character
vector
character sequence
sequence
segment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010022597.5A
Other languages
Chinese (zh)
Other versions
CN111223481B (en
Inventor
葛屾
王锴
晏阳天
乔治
吴贤
范伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202010022597.5A priority Critical patent/CN111223481B/en
Publication of CN111223481A publication Critical patent/CN111223481A/en
Application granted granted Critical
Publication of CN111223481B publication Critical patent/CN111223481B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Machine Translation (AREA)

Abstract

The present disclosure provides an information extraction method, an information extraction apparatus, a computer-readable storage medium, and an electronic device; relates to the technical field of natural language processing; the method comprises the following steps: converting the received audio signal into a character sequence; selecting target field recognition models corresponding to the text types one by one from the field recognition models according to the text types contained in the character sequence; identifying a reference character segment in a character sequence through the target field identification model; determining a set of reference character segments respectively output by each target field recognition model; and removing the duplicate of the set, and extracting information corresponding to the specific field from the removed duplicate set according to the specific field corresponding to the character sequence. Therefore, the method can improve the accuracy of voice recognition, and further effectively meets the voice recognition requirements of users.

Description

Information extraction method and device, computer readable storage medium and electronic equipment
Technical Field
The present disclosure relates to the field of natural language processing technologies, and in particular, to an information extraction method, an information extraction device, a computer-readable storage medium, and an electronic device.
Background
With the development of science and technology, the mobile terminal can recognize not only the input text information but also the text information corresponding to the input voice information, so as to determine the user requirements according to the recognized text information and execute corresponding operations. For example, the user inputs voice information "open map", which the mobile terminal can recognize and open a map application.
In addition to search applications, users can also speak diaries, search articles, etc. through voice input. The current speech recognition method mainly comprises: and comparing the voice signal with a preset signal in a database to determine text information corresponding to the preset signal matched with the voice signal. However, the accuracy of speech recognition is high when the user needs to dictate diaries, search articles, and the like, but the current speech recognition method has limited recognition accuracy and cannot effectively meet the requirements of the user.
It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.
Disclosure of Invention
The purpose of the present disclosure is to provide an information extraction method, an information extraction apparatus, a computer-readable storage medium, and an electronic device, which are implemented to improve accuracy of speech recognition, so as to effectively meet speech recognition requirements of users.
Additional features and advantages of the disclosure will be set forth in the detailed description which follows, or in part will be obvious from the description, or may be learned by practice of the disclosure.
According to a first aspect of the present disclosure, there is provided an information extraction method including:
converting the received audio signal into a character sequence;
selecting target field recognition models corresponding to the text types one by one from the field recognition models according to the text types contained in the character sequence;
identifying a reference character segment in the character sequence through a target field identification model;
determining a set of reference character segments respectively output by each target field recognition model;
and removing the duplicate of the set, and extracting information corresponding to the specific field from the removed duplicate set according to the specific field corresponding to the character sequence.
In an exemplary embodiment of the present disclosure, converting a received audio signal into a character sequence includes:
slicing a received audio signal into at least two audio signal segments; wherein the at least two audio signals are time domain signals;
converting the time domain signal into a frequency domain signal, and extracting acoustic features in the frequency domain signal;
and encoding the acoustic features through an encoder, decoding the encoding result through a decoder corresponding to the encoder, and generating a character sequence according to the decoding result.
In an exemplary embodiment of the present disclosure, selecting a target field recognition model corresponding to a text type one to one from field recognition models according to the text type included in a character sequence includes:
converting the character sequence into a first feature vector;
generating a second feature vector for representing the context relation in the character sequence through the first feature vector;
classifying the second feature vector, and determining the text type contained in the character sequence according to the classification result;
at least one target field recognition model belonging to the text type is selected from the at least two field recognition models, and the at least one target field recognition model is matched with the character sequence.
In an exemplary embodiment of the present disclosure, if the target field recognition model is used for entity recognition, recognizing the reference character segment in the character sequence by the target field recognition model includes:
converting the character sequence into a word vector and a pinyin vector through a target field recognition model, and splicing the word vector and the pinyin vector to obtain a first reference vector;
extracting first reference features in the first reference vector and classifying the first reference features;
and determining the reference character segment in the character sequence according to the classification result.
In an exemplary embodiment of the present disclosure, extracting a first reference feature in a first reference vector includes:
extracting character features in the first reference vector through a character feature extraction network, and extracting context features in the first reference vector through a context feature extraction network;
and splicing the character features and the context features, and determining a splicing result as a first reference feature in the first reference vector.
In an exemplary embodiment of the present disclosure, after determining the reference character segment in the character sequence according to the classification result, the method may further include the steps of:
updating the character sequence according to the reference character segment and calculating a conditional random field loss function corresponding to the updated character sequence;
and updating parameters in the target field recognition model according to the conditional random field loss function.
In an exemplary embodiment of the present disclosure, if the target field recognition model is used for number recognition, recognizing the reference character segment in the character sequence by the target field recognition model includes:
converting the character sequence into a character vector through a target field recognition model, and extracting a context vector corresponding to the character sequence according to the character vector;
splicing the character vector and the context vector to obtain a second reference vector;
extracting second reference features in the second reference vector and classifying the second reference features;
and determining the reference character segment in the character sequence according to the classification result.
In an exemplary embodiment of the disclosure, after determining the reference character segment in the character sequence according to the classification result, the method may further include the following steps:
and when detecting that the character metering unit to be converted exists in the character sequence, converting the character metering unit to be converted into a specific character metering unit through a preset conversion rule.
In an exemplary embodiment of the disclosure, after determining the reference character segment in the character sequence according to the classification result, the method may further include the following steps:
calculating a cross entropy loss function between a reference character segment in the character sequence and a standard character segment in the character sequence;
and updating parameters in the target field recognition model according to the cross entropy loss function.
In an exemplary embodiment of the present disclosure, before converting the received audio signal into a character sequence, the method may further include:
and receiving the audio signal when the user touch operation aiming at the audio detection identifier is detected.
In an exemplary embodiment of the present disclosure, deduplication is performed on a set, comprising:
sorting the reference character segments in the set according to the sequence of the confidence degrees from high to low;
and selecting a target reference character segment with the highest confidence from the reference character segments with the intersection according to the sequencing result, and deleting the reference character segments except the target reference character segments in the reference character segments with the intersection until no intersection exists among the reference character segments in the set.
In an exemplary embodiment of the present disclosure, the specific field includes at least one of blood pressure, weight, heartbeat, and taking medicine.
According to a second aspect of the present disclosure, there is provided an information extraction apparatus including a speech recognition module, a scene selection module, a character segment recognition module, and an information extraction module, wherein:
the voice recognition module is used for converting the received audio signal into a character sequence;
the scene selection module is used for selecting target field identification models corresponding to the text types one by one from the field identification models according to the text types contained in the character sequence;
the character segment recognition module is used for recognizing the reference character segment in the character sequence through the target field recognition model;
the information extraction module is used for determining a set of reference character segments respectively output by each target field recognition model; and removing the duplicate of the set, and extracting information corresponding to the specific field from the removed duplicate set according to the specific field corresponding to the character sequence.
In an exemplary embodiment of the disclosure, a manner for the speech recognition module to convert the received audio signal into the character sequence may specifically be:
the voice recognition module divides the received audio signal into at least two audio signal segments; wherein the at least two audio signals are time domain signals;
the voice recognition module converts the time domain signal into a frequency domain signal and extracts acoustic features in the frequency domain signal;
the voice recognition module encodes the acoustic features through an encoder, decodes the encoding result through a decoder corresponding to the encoder, and generates a character sequence according to the decoding result.
In an exemplary embodiment of the present disclosure, a manner in which the scene selection module selects the target field recognition model corresponding to the text type one to one from the field recognition models according to the text type included in the character sequence may specifically be:
the scene selection module converts the character sequence into a first feature vector;
the scene selection module generates a second feature vector for representing the context relation in the character sequence through the first feature vector;
the scene selection module classifies the second feature vector and determines the text type contained in the character sequence according to the classification result;
the scene selection module selects at least one target field recognition model belonging to the text type from the at least two field recognition models, and the at least one target field recognition model is matched with the character sequence.
In an exemplary embodiment of the disclosure, if the target field recognition model is used for entity recognition, the manner in which the character segment recognition module recognizes the reference character segment in the character sequence through the target field recognition model may specifically be:
the character segment recognition module converts the character sequence into a word vector and a pinyin vector through a target field recognition model, and splices the word vector and the pinyin vector to obtain a first reference vector;
the character segment recognition module extracts first reference features in the first reference vector and classifies the first reference features;
and the character segment recognition module determines the reference character segment in the character sequence according to the classification result.
In an exemplary embodiment of the disclosure, the manner of extracting the first reference feature in the first reference vector by the character segment recognition module may specifically be:
the character segment recognition module extracts character features in the first reference vector through a character feature extraction network and extracts context features in the first reference vector through a context feature extraction network;
the character segment recognition module splices the character features and the context features and determines a splicing result as a first reference feature in a first reference vector.
In an exemplary embodiment of the present disclosure, the character segment identifying module is further configured to, after determining a reference character segment in the character sequence according to the classification result, update the character sequence according to the reference character segment and calculate a conditional random field loss function corresponding to the updated character sequence;
and the character fragment recognition module is also used for updating parameters in the target field recognition model according to the conditional random field loss function.
In an exemplary embodiment of the disclosure, if the target field recognition model is used for performing number recognition, the manner in which the character segment recognition module recognizes the reference character segment in the character sequence through the target field recognition model may specifically be:
the character segment recognition module converts the character sequence into a character vector through a target field recognition model and extracts a context vector corresponding to the character sequence according to the character vector;
the character segment recognition module splices the character vector and the context vector to obtain a second reference vector;
the character segment recognition module extracts second reference features in the second reference vector and classifies the second reference features;
and the character segment recognition module determines the reference character segment in the character sequence according to the classification result.
In an exemplary embodiment of the present disclosure, the apparatus may further include a unit conversion module, wherein:
and the unit conversion module is used for converting the character metering units to be converted into specific character metering units through a preset conversion rule after the reference character segments in the character sequence are determined according to the classification result and when the character metering units to be converted are detected to exist in the character sequence.
In an exemplary embodiment of the disclosure, the character segment identifying module is further configured to calculate a cross entropy loss function between a reference character segment in the character sequence and a standard character segment in the character sequence after determining the reference character segment in the character sequence according to the classification result;
and the character segment recognition module is also used for updating parameters in the target field recognition model according to the cross entropy loss function.
In an exemplary embodiment of the present disclosure, before converting the received audio signal into a character sequence, the apparatus may further include a signal receiving unit, wherein:
and the signal receiving unit is used for receiving the audio signal when the user touch operation aiming at the audio detection identifier is detected.
In an exemplary embodiment of the disclosure, the manner of the information extraction module performing deduplication on the set may specifically be:
the information extraction module sorts the reference character segments in the set according to the sequence of the confidence degrees from high to low;
and the information extraction module selects a target reference character segment with the highest confidence from the reference character segments with the intersection according to the sequencing result, and deletes the reference character segments except the target reference character segments in the reference character segments with the intersection until no intersection exists between the reference character segments in the set.
In an exemplary embodiment of the present disclosure, the specific field includes at least one of blood pressure, weight, heartbeat, and taking medicine.
According to a third aspect of the present disclosure, there is provided an electronic device comprising: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to perform the method of any one of the above via execution of the executable instructions.
According to a fourth aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method of any one of the above.
Exemplary embodiments of the present disclosure may have some or all of the following benefits:
in the information extraction method provided by an example embodiment of the present disclosure, a received audio signal (i.e., a voice input by a user) may be converted into a character sequence (i.e., text information); selecting a target field recognition model corresponding to the text type one by one from the field recognition models according to the text type contained in the character sequence; and, a reference character segment in the character sequence (i.e., a keyword in the recognized text information) may be recognized by the target field recognition model; and determining a set of reference character segments respectively output by each target field recognition model, removing the duplication of the set, and extracting information corresponding to specific fields from the set after the duplication removal according to the specific fields corresponding to the character sequences. According to the scheme, on one hand, the accuracy of voice recognition can be improved, and the voice recognition requirements of users can be effectively met; on the other hand, the use experience of the user can be improved to a certain extent through the improvement of the voice recognition accuracy, and the use viscosity of the user is further improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without the exercise of inventive faculty.
Fig. 1 is a schematic diagram illustrating an exemplary system architecture to which an information extraction method and an information extraction apparatus according to an embodiment of the present disclosure may be applied;
FIG. 2 illustrates a schematic structural diagram of a computer system suitable for use with the electronic device used to implement embodiments of the present disclosure;
FIG. 3 schematically shows a flow diagram of an information extraction method according to one embodiment of the present disclosure;
FIG. 4 schematically shows a block diagram for performing an information extraction method according to one embodiment of the present disclosure;
FIG. 5 schematically shows a flow diagram of an information extraction method according to another embodiment of the present disclosure;
fig. 6 schematically shows a block diagram of the structure of an information extraction apparatus in one embodiment according to the present disclosure;
fig. 7 schematically shows an application diagram of the information extraction method in one embodiment according to the present disclosure.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and the like. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the present disclosure.
Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.
Fig. 1 is a schematic diagram illustrating a system architecture of an exemplary application environment to which an information extraction method and an information extraction apparatus according to an embodiment of the present disclosure can be applied.
As shown in fig. 1, the system architecture 100 may include one or more of terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few. The terminal devices 101, 102, 103 may be various electronic devices having a display screen, including but not limited to desktop computers, portable computers, smart phones, tablet computers, and the like. It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. For example, server 105 may be a server cluster comprised of multiple servers, or the like.
The information extraction method provided by the embodiment of the present disclosure is generally performed by the server 105, and accordingly, the information extraction device is generally disposed in the server 105. However, it is easily understood by those skilled in the art that the information extraction method provided in the embodiment of the present disclosure may also be executed by the terminal devices 101, 102, and 103, and accordingly, the information extraction apparatus may also be disposed in the terminal devices 101, 102, and 103, which is not particularly limited in the exemplary embodiment. For example, in one exemplary embodiment, the server 105 may convert the received audio signal into a sequence of characters; selecting a target field recognition model corresponding to the text type one by one from the field recognition models according to the text type contained in the character sequence; and identifying a reference character segment in the character sequence through the target field identification model; determining a set of reference character segments respectively output by each target field recognition model; and removing the duplication of the set, and extracting information corresponding to the specific field from the set after the duplication removal according to the specific field corresponding to the character sequence.
FIG. 2 illustrates a schematic structural diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present disclosure.
It should be noted that the computer system 200 of the electronic device shown in fig. 2 is only an example, and should not bring any limitation to the functions and the scope of the application of the embodiments of the present disclosure.
As shown in fig. 2, the computer system 200 includes a Central Processing Unit (CPU)201 that can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)202 or a program loaded from a storage section 208 into a Random Access Memory (RAM) 203. In the RAM 203, various programs and data necessary for system operation are also stored. The CPU201, ROM 202, and RAM 203 are connected to each other via a bus 204. An input/output (I/O) interface 205 is also connected to bus 204.
The following components are connected to the I/O interface 205: an input portion 206 including a keyboard, a mouse, and the like; an output section 207 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 208 including a hard disk and the like; and a communication section 209 including a network interface card such as a LAN card, a modem, or the like. The communication section 209 performs communication processing via a network such as the internet. A drive 210 is also connected to the I/O interface 205 as needed. A removable medium 211 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 210 as necessary, so that a computer program read out therefrom is mounted into the storage section 208 as necessary.
In particular, the processes described below with reference to the flowcharts may be implemented as computer software programs, according to embodiments of the present disclosure. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 209 and/or installed from the removable medium 211. The computer program, when executed by a Central Processing Unit (CPU)201, performs various functions defined in the methods and apparatus of the present application.
The technical solution of the embodiment of the present disclosure is explained in detail below:
for disease management, it is often desirable to record the user's persistent signs. Due to the difference of mobile equipment use proficiency between patients, most patients can be conveniently recorded by utilizing the voice recognition technology, namely, the physical sign recording can be carried out by means of voice input. At present, voice of a user is mostly converted into characters by voice technology, semantic characteristics of the voice are not considered, only simple character conversion is carried out, errors may exist in the characters directly converted, and therefore deviation exists between the characters and original intentions of a patient. Therefore, the applicant thinks that the method can perform error correction processing on the characters converted by the voice, optimize the voice recognition scene at the same time, and further extract corresponding information, so that the recognition accuracy of the voice input by the user can be improved, and the accuracy of the extracted information can be improved.
Accordingly, the present example embodiments provide an information extraction method based on one or more of the problems described above. The information extraction method may be applied to the server 105, and may also be applied to one or more of the terminal devices 101, 102, and 103, which is not particularly limited in this exemplary embodiment. Referring to fig. 3, the information extraction method may include the following steps S310 to S340:
step S310: the received audio signal is converted into a sequence of characters.
Step S320: and selecting target field recognition models corresponding to the text types one by one from the field recognition models according to the text types contained in the character sequences.
Step S330: and identifying the reference character segment in the character sequence through the target field identification model.
Step S340: determining a set of reference character segments respectively output by each target field recognition model, carrying out duplication removal on the set, and extracting information corresponding to specific fields from the duplicated set according to the specific fields corresponding to the character sequences.
The above steps of the present exemplary embodiment will be described in more detail below.
Before step S310, optionally, before converting the received audio signal into a character sequence, the following steps may be further included: detecting the audio signal input by the user in real time, performing voiceprint detection on the detected audio signal, and if the voiceprint detection result indicates that the user inputting the audio signal is a legal user, executing step S310. When the embodiment of the disclosure is applied to a scene in which the patient records daily physical sign parameters and medicine taking conditions (i.e., information corresponding to the specific field), a plurality of patients who record the physical sign parameters and the medicine taking conditions through voices may exist in the same ward, and the optional embodiment can reduce the situation that extracted information of the specific field and a legal user cannot correspond to each other due to misrecognition of voices of the illegal users, thereby affecting the accuracy of the illness state recording of the legal user.
In step S310, the received audio signal is converted into a character sequence.
The character sequence may include one or more characters, and the characters may be numbers, letters, chinese characters, and the like, and the embodiments of the present disclosure are not limited.
In an alternative embodiment, converting the received audio signal into a sequence of characters comprises:
slicing a received audio signal into at least two audio signal segments; wherein the at least two audio signals are time domain signals;
converting the time domain signal into a frequency domain signal, and extracting acoustic features in the frequency domain signal;
and encoding the acoustic features through an encoder, decoding the encoding result through a decoder corresponding to the encoder, and generating a character sequence according to the decoding result.
Specifically, the manner of dividing the received audio signal into at least two audio signal segments may be: the received audio signal is divided into at least two audio signal segments with a preset duration (e.g., 10 ms). The duration corresponding to the audio signal may be 2min, and the duration corresponding to each audio signal segment may be 10 ms.
Specifically, the conversion of the time-domain signal into the frequency-domain signal is performed based on Fast Fourier Transform (FFT). The method specifically comprises the following steps: and performing FFT (fast Fourier transform) on the time domain signal to obtain a frequency domain signal corresponding to each frame of audio signal, wherein the frequency domain signal is used for representing the relation between frequency and energy. Further, the way of extracting the acoustic features in the frequency domain signal may be: obtaining a spectrogram corresponding to the audio signal by combining the frequency spectrums corresponding to each frame of audio signal, wherein the frequency spectrums in the spectrogram are arranged according to a time sequence and are used for representing phonemes corresponding to the audio signal; furthermore, an envelope of the spectrogram can be extracted, and by extracting formants in the envelope, acoustic features in the frequency signal can be determined. Wherein, the output mode of the acoustic feature can be a discrete value.
Specifically, the manner of encoding the acoustic features by an encoder and decoding the encoding result by a decoder corresponding to the encoder, and generating the character sequence according to the decoding result may be: converting the acoustic features into feature vectors through an encoder, and extracting the features of the feature vectors through a self-attention mechanism of the encoder; furthermore, the feature extraction result can be decoded by a decoder corresponding to the encoder; further, a character sequence corresponding to the audio signal may be generated from the decoding result. The encoder may be one or more, and the decoder is the same, and the embodiments of the present disclosure are not limited. Both the feature extraction result and the decoding result can be expressed as vectors.
It can be seen that implementing this alternative embodiment, the input audio signal can be converted to text for subsequent information extraction. Compared with the mode of directly comparing with the corpus in the prior art, the character conversion mode can achieve higher accuracy.
And, in an optional embodiment, before converting the received audio signal into a character sequence, the method may further include the steps of:
and receiving the audio signal when the user touch operation aiming at the audio detection identifier is detected. The audio detection identifier can be a dynamic identifier or a static identifier, and the screen occupation ratio of the audio detection identifier is smaller than the preset screen occupation ratio.
Therefore, the implementation of the optional embodiment can reduce the probability of triggering the device to receive signals due to mistaken touch on the screen, and reduce the waste of computing resources.
In step S320, a target field recognition model corresponding to the text type is selected from the field recognition models according to the text type included in the character sequence.
The field recognition model may be used for entity recognition, number recognition, chinese recognition, english recognition, and the like, and the embodiment of the present disclosure is not limited. In addition, the number of the target field identification models may be one or more, and the embodiments of the present disclosure are not limited.
In an alternative embodiment, selecting a target field recognition model corresponding to a text type one to one from the field recognition models according to the text type contained in the character sequence includes:
converting the character sequence into a first feature vector;
generating a second feature vector for representing the context relation in the character sequence through the first feature vector;
classifying the second feature vector, and determining the text type contained in the character sequence according to the classification result;
at least one target field recognition model belonging to the text type is selected from the at least two field recognition models, and the at least one target field recognition model is matched with the character sequence.
Specifically, the manner of converting the character sequence into the first feature vector may be: and determining a vector corresponding to each character in the character sequence through pre-training word embedding, and splicing the vectors corresponding to each character to obtain a first characteristic vector.
Specifically, the manner of generating the second feature vector for characterizing the context relationship in the character sequence by using the first feature vector may be: and inputting the first feature vector into a bidirectional recurrent neural network (BiRNN) so that the BiRNN performs feature extraction on the first feature vector through forward propagation and backward propagation among a plurality of neurons to obtain a second feature vector for representing the context relation in the character sequence. Wherein the forward propagating weight parameters are different from the backward propagating weight parameters among the plurality of neurons. In addition, the second feature vector for characterizing the context relationship in the character sequence may be a feature vector that accepts a conjunction word, a predicate, or the like, and the embodiment of the present disclosure is not limited thereto.
Specifically, the manner of classifying the second feature vector and determining the text type included in the character sequence according to the classification result may be: the second feature vector can be input into a multi-layer perceptron (MLP), and feature extraction is carried out on the second feature vector according to the MLP to obtain feature vectors to be classified; furthermore, normalization processing may be performed on the feature vector to be classified based on an activation function (e.g., softmax, sigmoid, etc.), and each element in the feature vector after normalization processing belongs to [0,1 ]; furthermore, the probability that the feature vector after the normalization processing belongs to each text type can be calculated through a classifier; furthermore, the first N text types can be selected in the order of probability from high to low to be determined as the text types contained in the character sequence, where N is a positive integer. The text type included in the character sequence may be one or more, the embodiment of the present disclosure is not limited, and the text type may be a number type, a word type, a chinese character type, an entity type (e.g., a medical entity type), and the like. In addition, it is noted that the medical entity may include proper nouns, drug names, disease names, and the like.
Specifically, the manner of selecting at least one target field recognition model belonging to the text type from the at least two field recognition models may be: and traversing the text types corresponding to all the field recognition models, and selecting at least one target field recognition model of which the text type is matched with the character sequence.
Therefore, by implementing the optional embodiment, the field recognition model matched with the character sequence can be determined through feature extraction of the character sequence, so that the character segments in the character sequence can be recognized in a targeted manner, and the accuracy of information extraction is further improved.
In step S330, a reference character segment in the character sequence is identified by the target field identification model.
The reference character segment may coincide, partially coincide or not coincide with the character sequence, and the embodiments of the present disclosure are not limited thereto. The reference character segments in the character sequence may be one or more.
In an alternative embodiment, if the target field recognition model is used for entity recognition, recognizing the reference character segment in the character sequence through the target field recognition model includes:
converting the character sequence into a word vector and a pinyin vector through a target field recognition model, and splicing the word vector and the pinyin vector to obtain a first reference vector;
extracting first reference features in the first reference vector and classifying the first reference features;
and determining the reference character segment in the character sequence according to the classification result.
And the dimension of the first reference vector is the sum of the dimensions of the word vector and the pinyin vector. In addition, the way of extracting and classifying the first reference features in the first reference vector is the same as the way of generating a second feature vector for characterizing the context relationship in the character sequence by the first feature vector and classifying the second feature vector.
Specifically, the method of converting the character sequence into the word vector and the pinyin vector through the target field recognition model may be: inputting the character sequence into a target field recognition model, so that the target field recognition model converts the character sequence into a word vector and a pinyin vector through pre-trained word embedding.
Specifically, the method for determining the reference character segment in the character sequence according to the classification result may be: and determining the character segment with the highest corresponding probability in the classification result as a reference character segment of the character sequence. Because the character sequence can comprise a plurality of reference character segments, the character segment with the highest corresponding probability in each classification result can be determined as the reference character segment of the character sequence, so that the problem of inaccurate speech recognition caused by accent of a user or misstatement of the user and the like can be avoided to a certain extent.
Therefore, by implementing the optional embodiment, the entity recognition can be carried out on the character sequence, the accuracy of the extracted information is improved, and the problem of poor user experience caused by inaccurate recognition of the audio signal is solved.
Further, extracting the first reference feature in the first reference vector includes:
extracting character features in the first reference vector through a character feature extraction network, and extracting context features in the first reference vector through a context feature extraction network;
and splicing the character features and the context features, and determining a splicing result as a first reference feature in the first reference vector.
The context feature and the character feature can be represented in a vector mode, the character feature extraction network can be BiRNN, and the context feature extraction network can be a one-dimensional convolutional neural network.
It can be seen that implementing this alternative embodiment can improve the accuracy of entity identification by combining the character features and the context features.
In addition, after determining the reference character segment in the character sequence according to the classification result, the method may further include the steps of:
updating the character sequence according to the reference character segment and calculating a Conditional Random Field (CRF) corresponding to the updated character sequence;
and updating parameters in the target field recognition model according to the conditional random field loss function.
Specifically, the manner of updating the character sequence according to the reference character segment may be: if the character sequence has a field which is different from or partially same as the reference character segment, replacing the field by the reference character segment; if no field in the character sequence is different from or partially identical to the reference character segment, the reference character segment is discarded.
Specifically, the manner of calculating the conditional random field loss function corresponding to the updated character sequence may be: determining a characteristic function f (s, W) corresponding to each character in the updated character sequencei,Li,Li-1) S denotes a character sequence, WiIndicating the ith character in the character sequence, LiIndicates the part of speech, L, of the ith character to be taggedi-1Representing the part of speech to be labeled for the i-1 th character; further, the character segment may be determined by a plurality of feature functions f corresponding to the character segment and a weight ω corresponding to each feature functionjAnd expressions
Figure BDA0002361341640000151
Calculating a conditional random field loss function, f, corresponding to the updated character sequencejFor representing the jth characteristic function f.
Therefore, by implementing the optional embodiment, the conditional random field loss function can be calculated through the updated character sequence, and then the model parameters are updated according to the conditional random field loss function, so that the identification accuracy of the model is improved.
In another alternative embodiment, if the object field recognition model is used for digital recognition, recognizing the reference character segment in the character sequence by the object field recognition model includes:
converting the character sequence into a character vector through a target field recognition model, and extracting a context vector corresponding to the character sequence according to the character vector;
splicing the character vector and the context vector to obtain a second reference vector;
extracting second reference features in the second reference vector and classifying the second reference features;
and determining the reference character segment in the character sequence according to the classification result.
The character vector is a vector corresponding to each character in the character sequence; the character vectors belong to the same vector space; the character vector may be one or more. In addition, context vectors are used to characterize the logical relationships between characters in a sequence of characters. The dimension of the second reference vector is the sum of the dimensions of the character vector and the context vector. In addition, the way of extracting and classifying the second reference features in the second reference vector is the same as the way of generating the second feature vector for characterizing the context relationship in the character sequence by the first feature vector and classifying the second feature vector.
It should be noted that, in the object field recognition model for performing number recognition, the number of the reference character segments is the same as the number of characters in the character sequence, that is, the reference character segments correspond to the characters in the character sequence one by one. In addition, the object field recognition model for performing number recognition may recognize a number class as well as a non-number class, and the embodiment of the present disclosure is not limited thereto.
Therefore, by implementing the alternative embodiment, the wrong digits in the character sequence can be identified as the correct digits by identifying the digits in the character sequence, so that the correctness of the extracted information is improved.
Further, after determining the reference character segment in the character sequence according to the classification result, the method may further include the following steps:
and when detecting that the character metering unit to be converted exists in the character sequence, converting the character metering unit to be converted into a specific character metering unit through a preset conversion rule.
The preset conversion rule is used for representing a conversion mode among the character metering units, the conversion coefficient r corresponding to the character metering unit to be converted can be determined according to the preset conversion rule, and then the conversion coefficient r is obtained according to the expression VN=r×VcSpecific character measurement units can be calculated, wherein VNFor a particular unit of character measure, VcIs the unit of measurement of the character to be converted. For example, the scaling factor r for scaling a kilogram to a kilogram may be 0.5, and if a kilogram is a unit of measurement of the character to be converted, the kilogram is a specific unit of measurement of the character.
Therefore, by implementing the optional embodiment, the consistency of the extracted information can be improved through the Unicode metering unit, and the problem of poor use experience of users caused by different metering units of the same type of information is avoided.
In addition, after determining the reference character segment in the character sequence according to the classification result, the method may further include the following steps:
calculating a cross entropy loss function between a reference character segment in the character sequence and a standard character segment in the character sequence;
and updating parameters in the target field recognition model according to the cross entropy loss function.
Wherein, the reference character segment is a character segment (e.g., {51 kg, aspirin }) in the character sequence predicted by the target field recognition model, and the standard character segment is a real character segment (e.g., {50 kg, aspirin }) corresponding to the character sequence. The parameters in the target field recognition model may include weight values, bias items, and the like corresponding to hidden layers, and the embodiments of the present disclosure are not limited.
Specifically, the way to calculate the cross entropy loss function between the reference character segment in the character sequence and the standard character segment in the character sequence may be: determining probability distribution p (x) corresponding to reference character segments in the character sequence and determining probability distribution q (x) corresponding to standard character segments in the character sequence; further, the expression may be relied upon
Figure BDA0002361341640000171
Calculating a cross entropy loss function D between the reference character segment and the standard character segmentKL(p||q)。
Therefore, by implementing the optional embodiment, the target field recognition model can be trained by adjusting the model parameters, so that the recognition accuracy of the target field recognition model can be improved.
In addition, optionally, after determining the reference character segment in the character sequence according to the classification result, the method may further include the following steps: calculating the Mean Square Error (MSE) between the reference character segment in the character sequence and the standard character segment in the character sequence; further, parameters in the target field identification model may be updated based on the mean square error.
In yet another alternative embodiment, the way of identifying the reference character segment in the character sequence by the target field identification model may be: determining pinyin fragments corresponding to the character sequences, and correcting errors of the pinyin fragments according to preset error correction rules to determine the corrected pinyin fragments and reference character fragments corresponding to the corrected pinyin fragments; the preset error correction rule is used for representing the mapping relation between the correct pinyin segment and the wrong pinyin segments similar to the correct pinyin segment, and one correct pinyin segment can have the mapping relation with one or more similar wrong pinyin segments.
In step S340, a set of reference character segments respectively output by each target field recognition model is determined, the set is deduplicated, and information corresponding to a specific field is extracted from the deduplicated set according to the specific field corresponding to the character sequence.
Wherein the specific field includes at least one of blood pressure, weight, heartbeat, and taking medicine. For example, the specific fields corresponding to the character sequence are weight and medicine, the reference character segment includes 50 kg and aspirin, and then the information corresponding to the specific fields extracted from the reference character segment may be 50 kg and aspirin. Further, after step S340, the following steps may be further included: outputting and storing information corresponding to the specific field; the output mode may be a voice output, a text output, and the like, and the embodiment of the disclosure is not limited.
For example, if the character sequence is "I weigh one kilogram today and eat one piece of Ascolin", then the set of reference character segments may be {50 kilograms, aspirin }. In addition, the specific field corresponding to the character sequence may be one or more, and the embodiment of the disclosure is not limited.
In addition, the set deduplication may be performed based on a non-maximum suppression (NMS) algorithm, which specifically includes:
sorting the reference character segments in the set according to the sequence of the confidence degrees from high to low;
and selecting a target reference character segment with the highest confidence from the reference character segments with the intersection according to the sequencing result, and deleting the reference character segments except the target reference character segments in the reference character segments with the intersection until no intersection exists among the reference character segments in the set.
Therefore, the optional embodiment can be implemented to perform de-duplication on a plurality of reference character segments according to the confidence coefficient so as to improve the information extraction efficiency.
Therefore, the accuracy of the voice recognition can be improved by implementing the information extraction method shown in fig. 3, and the voice recognition requirements of the user can be effectively met; and the use experience of the user can be improved to a certain extent by improving the voice recognition accuracy, so that the use viscosity of the user is improved.
Referring to fig. 4, fig. 4 schematically illustrates a module diagram for performing an information extraction method according to an embodiment of the present disclosure. As shown in fig. 4, the module for performing the information extraction method includes: a speech recognition module 410, a scene selection module 420, a character fragment recognition module 430 for performing number recognition, a character fragment recognition module 440 for performing entity recognition, a unit conversion module 450, and an information extraction module 460.
The speech recognition module 410 includes an FFT sub-module 411 and a transform sub-module 412, the scene selection module 420 includes a word embedding sub-module 421, a BiRNN sub-module 422, and a classification module 423, the character segment recognition module 430 for digital recognition includes a character vector calculation sub-module 431, a context vector calculation sub-module 432, a vector splicing sub-module 433, and a character segment calculation sub-module 434, the character segment recognition module 440 for entity recognition includes a vector calculation sub-module 441, a character feature extraction sub-module 442, a context feature extraction sub-module 443, a feature splicing sub-module 444, and a character segment calculation sub-module 445, the unit conversion module 450 includes a measurement unit detection sub-module 451 and a measurement unit conversion sub-module 452, the information extraction module 460 includes a set generation sub-module 461, a confidence ranking sub-module 462, a, A character fragment deduplication submodule 463 and an information extraction submodule 464.
Specifically, the audio signal may be received and divided into at least two audio signal segments, and then the FFT sub-module 411 converts the time domain signal into a frequency domain signal, and extracts the acoustic features in the frequency domain signal, and the encoder in the transform sub-module 412 encodes the acoustic features and the decoder corresponding to the encoder in the transform sub-module 412 decodes the encoded result, and generates a character sequence according to the decoded result. Furthermore, the word embedding submodule 421 may convert the character sequence into a first feature vector F ═ embedding (t) and input the first feature vector F ═ embedding (t) into the BiRNN submodule 422, and then the BiRNN submodule 422 may generate a feature word sequenceA second feature vector C in the context of birnn (f), and classifying the second feature vector by the classification module 423 according to the classification result αSSoftmax(s) determines a text type contained in the character sequence, and selects at least one target field recognition model belonging to the text type from the at least two field recognition models.
In fig. 4, a case where the number of the target field recognition models is 2 and specifically includes a character fragment recognition module 430 for performing number recognition and a character fragment recognition module 440 for performing entity recognition is exemplarily shown.
In the character segment recognition module 430 for number recognition, the character sequence may be converted into a character vector F by the character vector calculation submodule 431NCharemb (t), the context vector C corresponding to the character sequence is extracted by the context vector calculation submodule 432 according to the character vectorN=BiRNN(FN) (ii) a Splicing character vector F by vector splicing submodule 433NCharemb (t) and context vector CN=BiRNN(FN) To obtain a second reference vector CALL=CN⊕FN(ii) a Extracting the second reference feature F in the second reference vector by the character segment calculating submodule 434SA=softmax(W2tanh(W1CALL))CALLClassifying the second reference characteristics according to the classification result PCor=softmax(MLP(FSA) Determine a reference character segment in the character sequence. Further, the measurement unit detection sub-module 451 may detect whether the character measurement unit U to be converted exists in the character sequence, and if so, the measurement unit conversion sub-module 452 may convert the character sequence according to the preset conversion rule VN=r×VCAnd r ═ Ratio (U, U)S) Converting a character unit U to be converted into a specific character unit USAnd dividing the reference character into segments FSA=softmax(W2tanh(W1CALL))CALLIs converted into and USThe number of matches.
In the character fragment recognition module 440 for entity recognition, it is possible toConverting the character sequence into a word vector CharEMB (T) and a Pinyin vector Pinyin EMB (T) through a target field identification model by a vector calculation submodule 441, and splicing the word vector and the Pinyin vector to obtain a first reference vector FECharEMB (T) ⊕ Pinyin EMB (T), character feature C in the first reference vector is extracted by character feature extraction submodule 442R=SelfAtt(BiRNN(FE) ); extracting context feature C in the first reference vector by context feature extraction sub-module 443C=CNN(FE) (ii) a Splicing the character features and the context features through a feature splicing submodule 444, and determining a splicing result as a first reference feature in a first reference vector; extracting and classifying the first reference feature in the first reference vector by the character segment calculation sub-module 445, and classifying the first reference feature according to the classification result pCor=CRF(MLP(CR⊕CC) Determine reference character segments in a character sequence
Further, the set R of reference character segments output by each target field recognition model may be determined by the set generation sub-module 461 as ∪ CN,CE(ii) a The reference character segments in the set are sorted by the confidence sorting submodule 462 according to the order of the confidence from high to low; according to the sorting result R, the character segment de-weight submodule 463SSelecting a target reference character segment with the highest confidence from the reference character segments with intersection, and deleting the reference character segments except the target reference character segment from the reference character segments with intersection until no intersection exists between the reference character segments in the set, thereby obtaining Rfilter=NonMaxSup(RS) (ii) a R pair from corpus by information extraction submodule 464filter=NonMaxSup(RS) Performing error correction to obtain Rfinal=Rule(Rfilter) And according to the specific field corresponding to the character sequence from Rfinal=Rule(Rfilter) Extracting information corresponding to specific field, such as blood pressure BP ═ Cor (BP)RAW) Body weight W ═ Cor (W)RAW) Heart beat HB ═ Cor (HB)RAW) And taking the drug ME=Cor(TRAW) And so on.
Therefore, the implementation of the method shown in FIG. 4 can improve the accuracy of speech recognition, and further effectively meet the speech recognition requirements of users; and the use experience of the user can be improved to a certain extent by improving the voice recognition accuracy, so that the use viscosity of the user is improved.
Referring to fig. 5, fig. 5 schematically shows a flow chart of an information extraction method according to another embodiment of the present disclosure. As shown in fig. 5, the information extraction method of another embodiment includes steps S500 to S538, in which:
step S500: slicing a received audio signal into at least two audio signal segments; wherein the at least two audio signals are time domain signals.
Step S502: and converting the time domain signal into a frequency domain signal, and extracting acoustic features in the frequency domain signal.
Step S504: and encoding the acoustic features through an encoder, decoding the encoding result through a decoder corresponding to the encoder, and generating a character sequence according to the decoding result.
Step S506: the character sequence is converted into a first feature vector.
Step S508: and generating a second feature vector for characterizing the context relation in the character sequence by the first feature vector.
Step S510: and classifying the second feature vector, and determining the text type contained in the character sequence according to the classification result.
Step S512: at least one target field recognition model belonging to the text type is selected from the at least two field recognition models, and the at least one target field recognition model is matched with the character sequence.
Step S514: the target field recognition model is used for entity recognition, the character sequence is converted into a word vector and a pinyin vector through the target field recognition model, and the word vector and the pinyin vector are spliced to obtain a first reference vector.
Step S516: and extracting character features in the first reference vector through a character feature extraction network, and extracting context features in the first reference vector through a context feature extraction network.
Step S518: and splicing the character features and the context features, determining a splicing result as first reference features in the first reference vector, and classifying the first reference features.
Step S520: and determining the reference character segment in the character sequence according to the classification result.
Step S522: the target field recognition model is used for carrying out digital recognition, converting the character sequence into a character vector through the target field recognition model, and extracting a context vector corresponding to the character sequence according to the character vector.
Step S524: and splicing the character vector and the context vector to obtain a second reference vector.
Step S526: and extracting second reference features in the second reference vector and classifying the second reference features.
Step S528: and determining the reference character segment in the character sequence according to the classification result.
Step S530: and when detecting that the character metering unit to be converted exists in the character sequence, converting the character metering unit to be converted into a specific character metering unit through a preset conversion rule.
Step S532: and if a plurality of different target field recognition models exist, determining a set of reference character segments respectively output by each target field recognition model.
Step S534: and sorting the reference character segments in the set according to the order of the confidence degrees from high to low.
Step S536: and selecting a target reference character segment with the highest confidence from the reference character segments with the intersection according to the sequencing result, and deleting the reference character segments except the target reference character segments in the reference character segments with the intersection until no intersection exists among the reference character segments in the set.
Step S538: and extracting information corresponding to the specific field from the set after the duplication is removed according to the specific field corresponding to the character sequence.
It should be noted that, the embodiment of the present disclosure does not limit the execution sequence between step S514 and step S522. Further, steps S500 to S538 correspond to specific examples of steps S310 to S340 and steps S310 to S340 in fig. 3. Therefore, please refer to the embodiment corresponding to fig. 3, which is not described herein again.
Therefore, the information extraction method shown in fig. 5 can improve the accuracy of speech recognition, and further effectively meet the speech recognition requirements of users; and the use experience of the user can be improved to a certain extent by improving the voice recognition accuracy, so that the use viscosity of the user is improved.
Further, in the present exemplary embodiment, an information extraction apparatus is also provided. The information extraction device can be applied to a server or a terminal device. Referring to fig. 6, the information extraction apparatus 600 may include a speech recognition module 601, a scene selection module 602, a character segment recognition module 603, and an information extraction module 604, wherein:
a speech recognition module 601, configured to convert a received audio signal into a character sequence;
a scene selection module 602, configured to select, from the field identification models, a target field identification model corresponding to a text type one to one according to the text type included in the character sequence;
a character segment recognition module 603, configured to recognize a reference character segment in the character sequence through the target field recognition model;
the information extraction module 604 is configured to determine a set of reference character segments output by each target field recognition model, perform deduplication on the set, and extract information corresponding to a specific field from the deduplicated set according to the specific field corresponding to the character sequence.
Wherein the specific field includes at least one of blood pressure, weight, heartbeat, and taking medicine.
Therefore, the information extraction device shown in fig. 6 can improve the accuracy of speech recognition, and further effectively meet the speech recognition requirements of users; and the use experience of the user can be improved to a certain extent by improving the voice recognition accuracy, so that the use viscosity of the user is improved.
In an exemplary embodiment of the disclosure, a manner for the speech recognition module 601 to convert the received audio signal into a character sequence may specifically be:
the voice recognition module 601 divides the received audio signal into at least two audio signal segments; wherein the at least two audio signals are time domain signals;
the voice recognition module 601 converts the time domain signal into a frequency domain signal and extracts acoustic features in the frequency domain signal;
the speech recognition module 601 encodes the acoustic features through an encoder and decodes the encoded result through a decoder corresponding to the encoder, and generates a character sequence according to the decoded result.
It can be seen that implementing this exemplary embodiment, the input audio signal can be converted into text to facilitate subsequent information extraction. Compared with the mode of directly comparing with the corpus in the prior art, the character conversion mode can achieve higher accuracy.
In an exemplary embodiment of the disclosure, a manner that the scene selection module 602 selects the target field recognition model corresponding to the text type one to one from the field recognition models according to the text type included in the character sequence may specifically be:
the scene selection module 602 converts the character sequence into a first feature vector;
the scene selection module 602 generates a second feature vector for characterizing the context relationship in the character sequence through the first feature vector;
the scene selection module 602 classifies the second feature vector, and determines a text type included in the character sequence according to a classification result;
the scene selection module 602 selects at least one target field recognition model belonging to the text type from the at least two field recognition models, the at least one target field recognition model matching the character sequence.
Therefore, by implementing the exemplary embodiment, the field recognition model matched with the character sequence can be determined through feature extraction of the character sequence, so that the character segments in the character sequence can be recognized in a targeted manner, and the accuracy of information extraction is further improved.
In an exemplary embodiment of the disclosure, if the target field recognition model is used for entity recognition, the manner in which the character segment recognition module 603 recognizes the reference character segment in the character sequence through the target field recognition model may specifically be:
the character segment recognition module 603 converts the character sequence into a word vector and a pinyin vector through the target field recognition model, and splices the word vector and the pinyin vector to obtain a first reference vector;
the character segment recognition module 603 extracts a first reference feature in the first reference vector and classifies the first reference feature;
the character segment recognition module 603 determines a reference character segment in the character sequence according to the classification result.
Therefore, the implementation of the exemplary embodiment can perform entity recognition on the character sequence, improve the accuracy of the extracted information, and reduce the problem of poor user experience caused by inaccurate recognition of the audio signal.
In an exemplary embodiment of the disclosure, the way for the character segment recognition module 603 to extract the first reference feature in the first reference vector may specifically be:
the character segment recognition module 603 extracts character features in the first reference vector through a character feature extraction network, and extracts context features in the first reference vector through a context feature extraction network;
the character segment recognition module 603 concatenates the character feature and the context feature and determines the concatenation result as the first reference feature in the first reference vector.
It can be seen that implementing the exemplary embodiment can improve the accuracy of entity identification by combining the character features and the context features.
In an exemplary embodiment of the disclosure, the character segment identifying module 603 is further configured to, after determining a reference character segment in the character sequence according to the classification result, update the character sequence according to the reference character segment and calculate a conditional random field loss function corresponding to the updated character sequence;
the character fragment recognition module 603 is further configured to update parameters in the target field recognition model according to the conditional random field loss function.
Therefore, by implementing the exemplary embodiment, the conditional random field loss function can be calculated through the updated character sequence, and then the model parameters are updated according to the conditional random field loss function, so that the identification accuracy of the model is improved.
In an exemplary embodiment of the disclosure, if the target field recognition model is used for performing number recognition, the manner in which the character segment recognition module 603 recognizes the reference character segment in the character sequence through the target field recognition model may specifically be:
the character segment recognition module 603 converts the character sequence into a character vector through the target field recognition model, and extracts a context vector corresponding to the character sequence according to the character vector;
the character segment recognition module 603 concatenates the character vector and the context vector to obtain a second reference vector;
the character segment recognition module 603 extracts a second reference feature in the second reference vector and classifies the second reference feature;
the character segment recognition module 603 determines a reference character segment in the character sequence according to the classification result.
Therefore, by implementing the exemplary embodiment, the wrong digits in the character sequence can be identified as the correct digits by identifying the digits in the character sequence, so as to improve the correctness of the extracted information.
In an exemplary embodiment of the present disclosure, the apparatus may further include a unit conversion module (not shown), wherein:
and the unit conversion module is used for converting the character metering units to be converted into specific character metering units through a preset conversion rule after the reference character segments in the character sequence are determined according to the classification result and when the character metering units to be converted are detected to exist in the character sequence.
Therefore, by implementing the exemplary embodiment, the consistency of the extracted information can be improved through the Unicode metering unit, and the problem of poor use experience of users caused by different metering units of the same type of information is avoided.
In an exemplary embodiment of the disclosure, the character segment identifying module 603 is further configured to calculate a cross entropy loss function between a reference character segment in the character sequence and a standard character segment in the character sequence after determining the reference character segment in the character sequence according to the classification result;
the character segment recognition module 603 is further configured to update parameters in the target field recognition model according to the cross entropy loss function.
Therefore, by implementing the exemplary embodiment, the target field recognition model can be trained by adjusting the model parameters, so that the recognition accuracy of the target field recognition model can be improved.
In an exemplary embodiment of the disclosure, if there are multiple different target field recognition models, the manner of the information extraction module 604 extracting the information corresponding to the specific field from the reference character segment according to the specific field corresponding to the character sequence may specifically be:
in an exemplary embodiment of the present disclosure, before converting the received audio signal into a character sequence, the apparatus may further include a signal receiving unit (not shown), wherein:
and the signal receiving unit is used for receiving the audio signal when the user touch operation aiming at the audio detection identifier is detected.
Therefore, the implementation of the optional embodiment can reduce the probability of triggering the device to receive signals due to mistaken touch on the screen, and reduce the waste of computing resources.
In an exemplary embodiment of the disclosure, the manner of the information extraction module 604 performing deduplication on the set may specifically be:
the information extraction module 604 sorts the reference character segments in the set according to the order of the confidence degrees from high to low;
the information extraction module 604 selects a target reference character segment with the highest confidence from the reference character segments with intersection according to the sorting result, and deletes the reference character segments except the target reference character segment from the reference character segments with intersection until no intersection exists between the reference character segments in the set.
Therefore, by implementing the exemplary embodiment, the multiple reference character segments can be deduplicated according to the confidence coefficient, so that the information extraction efficiency is improved.
Further, referring to fig. 7, fig. 7 schematically illustrates an application of the information extraction method according to an embodiment of the disclosure. As shown in fig. 7, a person inputs an audio signal to the terminal device 700 that "i eat two asippilins in the morning today, and weigh 100 jin. "furthermore, the terminal device 700 may recognize the character sequence corresponding to the audio signal through the voice recognition module 710" i eat two asippilins in the morning today, and weigh 100 jin. "where the character sequence relates to the entity" asippilin "and the number" 100 ", the scene selection module 720 may trigger the activation of the character fragment recognition module 730 for entity recognition and the character fragment recognition module 740 for number recognition. The character segment recognition module 730 for entity recognition can recognize "asippilin" in the character sequence and output the correct reference character segment "aspirin", and the character segment recognition module 740 for number recognition can recognize the number "100" in the character sequence and output the correct reference character segment "100". Since the measurement unit corresponding to "100" is "jin" and "jin" does not belong to a specific character measurement unit, the unit conversion module 750 can convert "100 jin" into "50 kg" according to a preset conversion rule. Further, the information extraction module 760 may determine the set { aspirin, 50} of the reference character segments according to the "aspirin" output by the character segment recognition module 730 and the "50 kg" output by the character segment recognition module 740, and since there is no reference character segment with an intersection in the set, the set obtained after the duplication is removed by the character segment duplication removal sub-module (not shown) is also { aspirin, 50}, and further, the information extraction sub-module 760 may perform feature extraction on { aspirin, 50} according to the blood pressure, the weight, the heartbeat, and the medicine taken, and the output result may be: blood pressure is empty; weight is 50; heartbeat is empty; the medicine is taken as aspirin.
Therefore, by implementing the application schematic diagram shown in fig. 7, the accuracy of speech recognition can be improved, and the speech recognition requirements of users can be effectively met; and the use experience of the user can be improved to a certain extent by improving the voice recognition accuracy, so that the use viscosity of the user is improved.
It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.
For details which are not disclosed in the embodiments of the apparatus of the present disclosure, please refer to the embodiments of the information extraction method of the present disclosure for the details which are not disclosed in the embodiments of the apparatus of the present disclosure.
As another aspect, the present application also provides a computer-readable medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by an electronic device, cause the electronic device to implement the method described in the above embodiments.
It should be noted that the computer readable media shown in the present disclosure may be computer readable signal media or computer readable storage media or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer-readable signal medium may include a propagated data signal with computer-readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present disclosure may be implemented by software, or may be implemented by hardware, and the described units may also be disposed in a processor. Wherein the names of the elements do not in some way constitute a limitation on the elements themselves.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (15)

1. An information extraction method, comprising:
converting the received audio signal into a character sequence;
selecting a target field recognition model corresponding to the text type one by one from field recognition models according to the text type contained in the character sequence;
identifying a reference character segment in the character sequence through the target field identification model;
determining a set of reference character segments respectively output by each target field recognition model;
and removing the duplicate of the set, and extracting information corresponding to the specific field from the removed duplicate set according to the specific field corresponding to the character sequence.
2. The method of claim 1, wherein converting the received audio signal into a sequence of characters comprises:
slicing a received audio signal into at least two audio signal segments; wherein the at least two audio signals are both time domain signals;
converting the time domain signal into a frequency domain signal, and extracting acoustic features in the frequency domain signal;
and encoding the acoustic features through an encoder, decoding an encoding result through a decoder corresponding to the encoder, and generating a character sequence according to the decoding result.
3. The method of claim 1, wherein selecting a target field recognition model corresponding to a text type from field recognition models according to the text type included in the character sequence comprises:
converting the character sequence into a first feature vector;
generating a second feature vector for characterizing the context relation in the character sequence through the first feature vector;
classifying the second feature vector, and determining the text type contained in the character sequence according to the classification result;
selecting at least one target field recognition model belonging to the text type from at least two field recognition models, wherein the at least one target field recognition model is matched with the character sequence.
4. The method of claim 1, wherein identifying the reference character segment in the character sequence via the object field recognition model if the object field recognition model is used for entity recognition comprises:
converting the character sequence into a word vector and a pinyin vector through the target field recognition model, and splicing the word vector and the pinyin vector to obtain a first reference vector;
extracting first reference features in the first reference vector and classifying the first reference features;
and determining the reference character segment in the character sequence according to the classification result.
5. The method of claim 4, wherein extracting the first reference feature in the first reference vector comprises:
extracting character features in the first reference vector through a character feature extraction network, and extracting context features in the first reference vector through a context feature extraction network;
and splicing the character features and the context features, and determining a splicing result as a first reference feature in the first reference vector.
6. The method of claim 4, wherein after determining the reference character segment in the character sequence according to the classification result, the method further comprises:
updating the character sequence according to the reference character segment and calculating a conditional random field loss function corresponding to the updated character sequence;
and updating parameters in the target field recognition model according to the conditional random field loss function.
7. The method of claim 1, wherein identifying the reference character segment in the character sequence via the object field recognition model if the object field recognition model is used for number recognition comprises:
converting the character sequence into a character vector through the target field recognition model, and extracting a context vector corresponding to the character sequence according to the character vector;
splicing the character vector and the context vector to obtain a second reference vector;
extracting second reference features in the second reference vector and classifying the second reference features;
and determining the reference character segment in the character sequence according to the classification result.
8. The method of claim 7, wherein after determining the reference character segment in the character sequence according to the classification result, the method further comprises:
and when detecting that the character metering unit to be converted exists in the character sequence, converting the character metering unit to be converted into a specific character metering unit through a preset conversion rule.
9. The method of claim 7, wherein after determining the reference character segment in the character sequence according to the classification result, the method further comprises:
calculating a cross entropy loss function between a reference character segment in the character sequence and a standard character segment in the character sequence;
and updating parameters in the target field identification model according to the cross entropy loss function.
10. The method of claim 1, wherein prior to converting the received audio signal into a sequence of characters, the method further comprises:
and receiving the audio signal when the user touch operation aiming at the audio detection identifier is detected.
11. The method of claim 1, wherein de-duplicating the set comprises:
sorting the reference character segments in the set according to the sequence of confidence degrees from high to low;
and selecting a target reference character segment with the highest confidence from the reference character segments with the intersection according to the sequencing result, and deleting the reference character segments except the target reference character segment from the reference character segments with the intersection until no intersection exists between the reference character segments in the set.
12. The method of claim 1, wherein the specific field includes at least one of blood pressure, weight, heartbeat, and taking medication.
13. An information extraction apparatus characterized by comprising:
the voice recognition module is used for converting the received audio signal into a character sequence;
the scene selection module is used for selecting a target field recognition model corresponding to the text type one by one from field recognition models according to the text type contained in the character sequence;
the character segment recognition module is used for recognizing the reference character segment in the character sequence through the target field recognition model;
the information extraction module is used for determining a set of reference character segments output by each target field recognition model; and removing the duplicate of the set, and extracting information corresponding to the specific field from the removed duplicate set according to the specific field corresponding to the character sequence.
14. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method of any one of claims 1-12.
15. An electronic device, comprising:
a processor; and
a memory for storing executable instructions of the processor;
wherein the processor is configured to perform the method of any of claims 1-12 via execution of the executable instructions.
CN202010022597.5A 2020-01-09 2020-01-09 Information extraction method, information extraction device, computer readable storage medium and electronic equipment Active CN111223481B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010022597.5A CN111223481B (en) 2020-01-09 2020-01-09 Information extraction method, information extraction device, computer readable storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010022597.5A CN111223481B (en) 2020-01-09 2020-01-09 Information extraction method, information extraction device, computer readable storage medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN111223481A true CN111223481A (en) 2020-06-02
CN111223481B CN111223481B (en) 2023-10-13

Family

ID=70832310

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010022597.5A Active CN111223481B (en) 2020-01-09 2020-01-09 Information extraction method, information extraction device, computer readable storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN111223481B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111914822A (en) * 2020-07-23 2020-11-10 腾讯科技(深圳)有限公司 Text image labeling method and device, computer readable storage medium and equipment
CN112183055A (en) * 2020-08-17 2021-01-05 北京来也网络科技有限公司 Information acquisition method and device combining RPA and AI, computer equipment and medium
CN114386423A (en) * 2022-01-18 2022-04-22 平安科技(深圳)有限公司 Text duplicate removal method and device, electronic equipment and storage medium
CN114399996A (en) * 2022-03-16 2022-04-26 阿里巴巴达摩院(杭州)科技有限公司 Method, apparatus, storage medium, and system for processing voice signal

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10147428B1 (en) * 2018-05-30 2018-12-04 Green Key Technologies Llc Computer systems exhibiting improved computer speed and transcription accuracy of automatic speech transcription (AST) based on a multiple speech-to-text engines and methods of use thereof
JP2019020597A (en) * 2017-07-18 2019-02-07 日本放送協会 End-to-end japanese voice recognition model learning device and program
CN109388795A (en) * 2017-08-07 2019-02-26 芋头科技(杭州)有限公司 A kind of name entity recognition method, language identification method and system
WO2019071660A1 (en) * 2017-10-09 2019-04-18 平安科技(深圳)有限公司 Bill information identification method, electronic device, and readable storage medium
CN109785840A (en) * 2019-03-05 2019-05-21 湖北亿咖通科技有限公司 The method, apparatus and vehicle mounted multimedia host, computer readable storage medium of natural language recognition
CN110162795A (en) * 2019-05-30 2019-08-23 重庆大学 A kind of adaptive cross-cutting name entity recognition method and system
CN110399616A (en) * 2019-07-31 2019-11-01 国信优易数据有限公司 Name entity detection method, device, electronic equipment and readable storage medium storing program for executing

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2019020597A (en) * 2017-07-18 2019-02-07 日本放送協会 End-to-end japanese voice recognition model learning device and program
CN109388795A (en) * 2017-08-07 2019-02-26 芋头科技(杭州)有限公司 A kind of name entity recognition method, language identification method and system
WO2019071660A1 (en) * 2017-10-09 2019-04-18 平安科技(深圳)有限公司 Bill information identification method, electronic device, and readable storage medium
US10147428B1 (en) * 2018-05-30 2018-12-04 Green Key Technologies Llc Computer systems exhibiting improved computer speed and transcription accuracy of automatic speech transcription (AST) based on a multiple speech-to-text engines and methods of use thereof
CN109785840A (en) * 2019-03-05 2019-05-21 湖北亿咖通科技有限公司 The method, apparatus and vehicle mounted multimedia host, computer readable storage medium of natural language recognition
CN110162795A (en) * 2019-05-30 2019-08-23 重庆大学 A kind of adaptive cross-cutting name entity recognition method and system
CN110399616A (en) * 2019-07-31 2019-11-01 国信优易数据有限公司 Name entity detection method, device, electronic equipment and readable storage medium storing program for executing

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111914822A (en) * 2020-07-23 2020-11-10 腾讯科技(深圳)有限公司 Text image labeling method and device, computer readable storage medium and equipment
CN111914822B (en) * 2020-07-23 2023-11-17 腾讯科技(深圳)有限公司 Text image labeling method, device, computer readable storage medium and equipment
CN112183055A (en) * 2020-08-17 2021-01-05 北京来也网络科技有限公司 Information acquisition method and device combining RPA and AI, computer equipment and medium
CN114386423A (en) * 2022-01-18 2022-04-22 平安科技(深圳)有限公司 Text duplicate removal method and device, electronic equipment and storage medium
CN114386423B (en) * 2022-01-18 2023-07-14 平安科技(深圳)有限公司 Text deduplication method and device, electronic equipment and storage medium
CN114399996A (en) * 2022-03-16 2022-04-26 阿里巴巴达摩院(杭州)科技有限公司 Method, apparatus, storage medium, and system for processing voice signal

Also Published As

Publication number Publication date
CN111223481B (en) 2023-10-13

Similar Documents

Publication Publication Date Title
CN109241524B (en) Semantic analysis method and device, computer-readable storage medium and electronic equipment
CN111223481B (en) Information extraction method, information extraction device, computer readable storage medium and electronic equipment
CN110110041B (en) Wrong word correcting method, wrong word correcting device, computer device and storage medium
EP3832519A1 (en) Method and apparatus for evaluating translation quality
JP7266683B2 (en) Information verification method, apparatus, device, computer storage medium, and computer program based on voice interaction
CN112287680B (en) Entity extraction method, device and equipment of inquiry information and storage medium
CN113420556B (en) Emotion recognition method, device, equipment and storage medium based on multi-mode signals
CN110503956B (en) Voice recognition method, device, medium and electronic equipment
CN115662435B (en) Virtual teacher simulation voice generation method and terminal
WO2024088262A1 (en) Data processing system and method for speech recognition model, and speech recognition method
CN113837299A (en) Network training method and device based on artificial intelligence and electronic equipment
CN113761377A (en) Attention mechanism multi-feature fusion-based false information detection method and device, electronic equipment and storage medium
CN115033733A (en) Audio text pair generation method, electronic device and storage medium
CN116402166B (en) Training method and device of prediction model, electronic equipment and storage medium
JP2014044363A (en) Discriminative voice recognition accuracy estimation device, discriminative voice recognition accuracy estimation method, and program
CN111858860B (en) Search information processing method and system, server and computer readable medium
CN115132182B (en) Data identification method, device, equipment and readable storage medium
CN115831117A (en) Entity identification method, entity identification device, computer equipment and storage medium
CN114974310A (en) Emotion recognition method and device based on artificial intelligence, computer equipment and medium
CN114925175A (en) Abstract generation method and device based on artificial intelligence, computer equipment and medium
CN114490946A (en) Xlnet model-based class case retrieval method, system and equipment
Ghorpade et al. ITTS model: speech generation for image captioning using feature extraction for end-to-end synthesis
Meghanani et al. Deriving translational acoustic sub-word embeddings
CN112767923B (en) Voice recognition method and device
CN117009532B (en) Semantic type recognition method and device, computer readable medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40024766

Country of ref document: HK

SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant