WO2015067116A1 - 处理语音文本的方法及装置 - Google Patents

处理语音文本的方法及装置 Download PDF

Info

Publication number
WO2015067116A1
WO2015067116A1 PCT/CN2014/088371 CN2014088371W WO2015067116A1 WO 2015067116 A1 WO2015067116 A1 WO 2015067116A1 CN 2014088371 W CN2014088371 W CN 2014088371W WO 2015067116 A1 WO2015067116 A1 WO 2015067116A1
Authority
WO
WIPO (PCT)
Prior art keywords
matching
rule
mapping
elimination
processing
Prior art date
Application number
PCT/CN2014/088371
Other languages
English (en)
French (fr)
Inventor
王飞
徐浩
褚攀
韩贵平
廖玲
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2015067116A1 publication Critical patent/WO2015067116A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/683Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/685Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using automatically derived transcript of audio data, e.g. lyrics

Definitions

  • the present invention relates to the field of information processing technologies, and in particular, to a method and apparatus for processing voice text.
  • embodiments of the present invention provide a method and apparatus for processing voice text.
  • the technical solution is as follows.
  • an embodiment of the present invention provides a method for processing a voice book, the method comprising:
  • an embodiment of the present invention provides an apparatus for processing a voice text, the apparatus comprising:
  • a first mapping module configured to perform a named entity mapping on the voice text to obtain a first mapping result
  • a second mapping module configured to perform vocabulary mapping on the first mapping result, to obtain a second mapping result
  • a matching module configured to match the second mapping result with a preset rule, where the preset rule includes a regular rule
  • the first processing module is configured to process the voice text by using a matching rule.
  • an embodiment of the present invention provides a non-transitory computer readable storage medium having stored thereon computer executable instructions.
  • the executable instructions When the executable instructions are run in a computer, the following steps are performed:
  • the first mapping result is obtained by performing naming entity mapping on the voice text; performing vocabulary mapping on the first mapping result to obtain a second mapping result; then matching the second mapping result with a preset rule including a regular rule, and obtaining according to the A matching rule processes the speech text to unify the configuration format of the regular rules and the named entity rules, so that the processing of the speech text technology is applicable to both the actual language environment and the fixed language environment, and the processing of the speech text technology is expanded.
  • the scope of application so that the processing of voice text is optimized.
  • FIG. 1 is a flowchart of a method for processing a voice text according to Embodiment 1 of the present invention
  • FIG. 2 is a flowchart of a method for processing a voice text according to Embodiment 2 of the present invention
  • FIG. 3 is a schematic structural diagram of an apparatus for processing a voice text according to Embodiment 3 of the present invention.
  • FIG. 4 is a schematic structural diagram of a second apparatus for processing voice text according to Embodiment 3 of the present invention.
  • FIG. 5 is a schematic structural diagram of a third apparatus for processing voice text according to Embodiment 3 of the present invention.
  • FIG. 6 is a schematic structural diagram of a terminal for processing voice text according to Embodiment 4 of the present invention.
  • An embodiment of the present invention provides a method for processing a voice text.
  • the method process includes steps 101 to 103.
  • step 101 the voice text is subjected to a named entity mapping to obtain a first mapping result.
  • step 102 vocabulary mapping is performed on the first mapping result to obtain a second mapping result.
  • the method may further include:
  • the one or more named entities in the first mapping result are sequentially expanded into the corresponding voice texts before the mapping, and at least two third mapping results are obtained.
  • Perform vocabulary mapping on the first mapping result including:
  • the vocabulary mapping of each of the third mapping results that is not mapped to the named entity is lexically mapped to obtain a second mapping result.
  • step 103 the second mapping result is matched with the preset rule, and the voice text is processed according to the obtained matching rule, where the preset rule includes a regular rule.
  • the method may further include:
  • the matching width elimination processing, the matching weighting value elimination processing, the matching density elimination processing, the hit quantity elimination processing, and the named entity weight elimination processing are sequentially performed on all the matching rules obtained. At least one type of elimination process until a matching rule is obtained;
  • the speech text is processed according to a obtained matching rule.
  • the matching width cancellation process may include:
  • the matching width of the second mapping result corresponding to each matching rule is determined, and the matching rule with the largest matching width is used as the matching rule after the matching width elimination processing.
  • the matching weighting value cancellation process may include:
  • the matching rule with the largest weighting value is used as the matching rule after the matching weighted value elimination processing
  • the matching rule to be matched with the weighted value elimination processing is a matching rule that has undergone matching width elimination processing.
  • the matching density cancellation process may include:
  • the matching rule for each of the matching density elimination processes is a matching rule that is processed by the matching width.
  • the hit quantity elimination process may include:
  • Determining the number of parameter hits of the matching rule for each hit quantity elimination process the parameter being one of a named entity, a vocabulary, and a location parameter;
  • the matching result with the largest number of parameter hits is used as the matching rule after the hit quantity is eliminated;
  • the matching rule for each hit quantity elimination processing is a matching rule that is processed by the matching width elimination.
  • the named entity weight elimination process may include:
  • the matching rule for each weighted value of the named entity is a matching rule that is processed by the matching width.
  • the method provided in this embodiment obtains a first mapping result by performing a named entity mapping on the voice text, performs vocabulary mapping on the first mapping result, and obtains a second mapping result; and then the second mapping result and the preset including the regular rule.
  • the rules are matched, and the speech text is processed according to a matching rule, so that the configuration patterns of the regular rules and the named entity rules are unified, so that the processing of the speech text technology is applicable to both the actual language environment and the fixed locale environment. It expands the scope of application of speech text technology, so that the processing of speech text is optimized.
  • An embodiment of the present invention provides a method for processing a voice text.
  • the method flow includes steps 201 to 206.
  • step 201 the voice text is subjected to a named entity mapping to obtain a first mapping result.
  • performing a named entity mapping on the voice text may include, but is not limited to, establishing a named entity library; searching for a voice text in the voice text that can be identified as a named entity in the named entity library, and replacing the search with the named entity Speech text.
  • the named entity can be collected from a large amount of information on the network, and the identification of the named entity is implemented by using a separate dictionary tree of each domain, that is, it can support the case where the named entities are all overlapped or partially overlapped. All named entities.
  • the voice text is "I want to listen to the first time", and the named entity is illustrated by the Chinese braces [].
  • the voice text "I want” and “first time” in the named entity are corresponding to the named entity [song name]; the voice text "I want” and “first time” can be recognized as naming in the voice text.
  • [Song name] in the entity library replace the found phonetic text "I want” and “first time” with the named entity [song name] to get the first mapping result, that is, [song name] listen to [song name].
  • step 202 one or more named entities in the first mapping result are sequentially expanded into corresponding voice texts before mapping, and at least two third mapping results are obtained.
  • the named entity may overlap with the vocabulary
  • one or more named entities in the first mapping result are sequentially expanded into corresponding speech texts before the mapping, and at least two third mapping results are obtained, thereby increasing the mapping result and avoiding the named entity and Speech text cannot be accurately processed in the case of overlapping words.
  • the speech text is still described as an example of "I want to hear the first time”.
  • the first mapping result corresponding to the voice text is [song name] listening to [song name].
  • the two named entities in the [Song Name] Listening [Song Name] are sequentially expanded into the corresponding voice texts before the mapping, and the [Song Name] Listening [Song Name], I want to listen to [Song Name], [Song Name] Listen For the first time, I want to hear the first, these four third mapping results.
  • step 203 vocabulary mapping is performed on each of the third mapping results that is not mapped to the named entity, to obtain a second mapping result.
  • the vocabulary in the vocabulary mapping can be manually configured. For example, speech texts expressing the same meaning in everyday language can be mapped to the same vocabulary. Since the phonetic text of the same meaning may correspond to different phonetic characters in the actual locale, words will not be mapped to the phonetic text of the named entity.
  • the sink map can reduce the workload of repeated matches.
  • the manner of performing vocabulary mapping on the voice text that is not mapped to the named entity in each third mapping result is not specifically limited in this embodiment.
  • the third mapping result in the above step 202 is taken as an example for description.
  • "I want to listen” maps to the vocabulary ⁇ play>.
  • the vocabulary mapping of each of the third mapping results that is not mapped to the named entity is vocabulary mapping, and the second mapping result is: [song name] listening [song name], ⁇ playing> [song name], [song name] Listen to the first time, ⁇ play> for the first time.
  • step 204 the second mapping result is matched with a preset rule, where the preset rule includes a regular rule.
  • the preset rule may include, but is not limited to, a regular rule and other preset rules.
  • other preset rules may include, but are not limited to, setting rules that conform to language habits. This embodiment does not specifically limit other preset rules. In the actual application, other preset rules may be set according to requirements, which is not specifically limited in this embodiment.
  • the matching the second mapping result with the preset rule may include, but is not limited to, extracting a location parameter in the second mapping result by using the rule card slot, and obtaining a card bit extraction result; Other rules set to match.
  • the rule card is a specified position obtained according to the regular rule; the position parameter in the second mapping result is extracted by the rule card, that is, the position parameter in the second mapping result is extracted at the specified position according to the regular rule.
  • the second mapping result may be directly matched with other preset rules, thereby obtaining a matching rule.
  • step 205 is performed; if at least two matching rules are obtained, step 206 is performed.
  • the second mapping result [song name] listen to [song name], ⁇ play> [song name], [song name] listen for the first time, ⁇ play> for the first time, for example. Since there is no location parameter that can be extracted by the rule card slot in the second mapping result, the second mapping result is directly matched with the preset other rules.
  • the preset other rule is ⁇ play>[song name]
  • the second mapping result corresponding to the rule is ⁇ play>[song name]
  • a matching rule is obtained, that is, ⁇ play>[song name].
  • the voice text is “playing corner encounters love”
  • the voice text will be processed according to steps 201 to 203 above to obtain ⁇ play>[video name], ⁇ play>[song name] encounters love, ⁇ play > Corners meet love, these three second mapping results.
  • the position parameters extracted by the rule card are used to directly match the three second mapping results with other preset rules. Since the other preset rules are ⁇ play>[song name] and ⁇ play>[video name], the second mapping result corresponding to the two rules is ⁇ play>[video name], ⁇ Play>[ ⁇ ] encounters love; then get two matching rules, namely ⁇ play>[song name], ⁇ play>[video name].
  • step 205 the speech text is processed according to a matching rule obtained.
  • the speech text is processed according to the obtained matching rule.
  • the manner of processing the voice text is not specifically limited in this embodiment.
  • step 205 processes the voice text according to the obtained matching rule.
  • the video titled "Turning in Love” will be played.
  • step 206 at least one of the matching width elimination processing, the matching weighting value elimination processing, the matching density elimination processing, the hit quantity elimination processing, and the named entity weight elimination processing are sequentially performed on all the obtained matching rules. Dissipate the process until you get a matching rule after the elimination.
  • the method provided in this embodiment adopts a method of performing cancellation processing on all matching rules.
  • the process of the elimination process may include, but is not limited to, performing matching width elimination processing, matching weighting value elimination processing, matching density elimination processing, hit quantity elimination processing, and named entity weights on all the matching rules obtained. At least one of the elimination processing is eliminated until a matching rule after the elimination is obtained.
  • the matching width cancellation processing manner may include, but is not limited to: determining a matching width of the second mapping result corresponding to each matching rule, and using a matching rule with the largest matching width as the matching rule after the matching width elimination processing.
  • the matching width is determined from the start position of the first parameter in the second mapping result to the end position of the last parameter.
  • the parameters may include but are not limited to: a named entity, a vocabulary, and a location parameter. It should be noted that, in order to prevent the speech text and the like from being ineffective, the processing of the speech text is affected.
  • a threshold is needed, which is used to determine the matching width of the result of the second mapping, that is, within the threshold range.
  • the matching width of the second mapping result is the same.
  • the threshold value is not limited in this embodiment. In practical applications, any threshold may be set as needed.
  • the voice text is "playing corners and encountering love" as an example.
  • the set threshold is 2 bytes.
  • the name is obtained as a matching rule after the matching width is eliminated, and a matching rule is obtained.
  • the manner of matching the weighted value elimination processing may include, but is not limited to, determining, according to a preset vocabulary and a weighted value of the named entity, a weighting value of each matching rule to be subjected to the matching weighting value elimination processing;
  • the matching rule with the largest weighting value is used as the matching rule after the matching weighted value elimination processing
  • the matching rule to be matched with the weighted value elimination processing is a matching rule that has undergone matching width elimination processing.
  • the weighting value of the vocabulary and the named entity that are set in advance is not specifically limited in this embodiment. Since the vocabulary can be manually configured, and the named entity is collected from a large amount of information on the Internet, the weight of the vocabulary is usually more important than the weight of the named entity.
  • the voice text is "to find a nearby restaurant" as an example.
  • the matching width elimination processing is performed on all the matching rules obtained, and it is determined that the matching widths of the two matching rules are the same. Since there are more than one matching rule after the matching width is eliminated, the matching weighting value elimination processing is performed on the matching rule after the matching width is eliminated.
  • the weighting value of the matching rule ⁇ search> ⁇ restaurant> is determined. More than the matching rule ⁇ search>[restaurant name], that is, the matching rule with the largest weighting value is ⁇ search> ⁇ restaurant>; and then ⁇ find> ⁇ restaurant> is used as the matching rule of the matching weighted value elimination process, and a cancellation rule is obtained. After the matching rules.
  • the matching density elimination processing manner may include, but is not limited to: determining a matching proportion of each matching rule to be matched density elimination processing and the second mapping result, and matching the matching rule with the largest matching proportion as the matched density elimination. Processing matching rules;
  • the matching rule for each of the matching density elimination processes is a matching rule that is processed by the matching width.
  • the matching density elimination processing is for the case where the matching rule includes the same type of parameters.
  • a matching rule contains a vocabulary, or a matching rule contains a named entity.
  • the voice text is a "playing small age" as an example.
  • the matching width is eliminated for all the matching rules obtained, and it is determined that the matching widths of the two matching rules are the same. Since there are more than one matching rule after the matching width is eliminated, the matching weighting value elimination processing is performed on the matching rule after the matching width is eliminated, and the weighting values of the two matching rules are determined to be the same.
  • the matching density elimination is performed on the matching rule after the matching weighting value is eliminated. It is determined that the matching proportion of ⁇ play>[movie name] and ⁇ play>[movie name] is 100%, and the matching proportion of ⁇ play>[song name] and ⁇ play>small [song name] is 80%, that is, the matching proportion is the largest.
  • the matching rule is ⁇ play>[movie name]; then ⁇ playing>[movie name] is used as the matching rule of matching density elimination processing to obtain a matching rule after elimination.
  • the manner of hitting the quantity elimination processing may include, but is not limited to, determining a parameter hit quantity of each matching rule of the number of hits to be processed, and the parameter is one of a named entity, a vocabulary and a position parameter;
  • the matching result with the largest number of parameter hits is used as the matching rule after the hit quantity is eliminated;
  • the matching rule for each hit quantity elimination processing is a matching rule that is processed by the matching width elimination.
  • the voice text is "playing song style" as an example.
  • the matching width elimination processing is performed on all the matching rules obtained.
  • the threshold is set to 2 bytes, it is determined that the matching widths of the two matching rules are the same. Since there are more than one matching rule after the matching width is eliminated, the matching weighting value elimination processing is performed on the matching rule after the matching width is eliminated, and the weighting values of the two matching rules are determined to be the same.
  • the matching density elimination is performed on the matching rule after the matching weighting value is eliminated. Since ⁇ play> ⁇ song> does not include a positional parameter, the matching rule of the matching weighted value cannot be matched, and the matching rule of the matching weighted value is eliminated in order. deal with.
  • the number of parameter hits for ⁇ play> ⁇ song>%s1 is 3, and the number of hits for ⁇ play> ⁇ song> is 2, that is, the matching rule with the largest number of parameter hits is ⁇ play> ⁇ song>%s1; Play > ⁇ song>%s1 as the matching rule after the hit quantity is processed, and get a matching rule after the elimination.
  • the method for processing the weighted value of the named entity may include, but is not limited to, determining a weight value of the named entity in the matching rule for which the weighted processing of the named entity is to be performed, and using the matching rule with the largest weight value of the named entity as a matching rule that is processed by a named entity weight;
  • the matching rule for each weighted value of the named entity is a matching rule that is processed by the matching width.
  • the voice text is "playing to youth" as an example.
  • the matching width elimination processing is performed on all the matching rules obtained, and it is determined that the matching widths of the two matching rules are the same. Since there are more than one matching rule after the matching width is eliminated, the matching weighting value elimination processing is performed on the matching rule after the matching width is eliminated, and the weighting values of the two matching rules are determined to be the same.
  • the matching density elimination processing is performed on the matching rule after the matching weighting value is eliminated, and the matching density of the two matching rules is determined to be the same. Since there are more than one matching rule after the matching density is eliminated, the matching rule of the matching density elimination is performed, and the number of hits of the matching rules is determined to be the same. Since there are more than one matching rule after the number of hits is eliminated, the matching rule weight loss processing is performed on the matching rule after the hit quantity is eliminated.
  • step 207 the speech text is processed according to the obtained matching rule after the elimination.
  • the method for processing the voice text according to the obtained matching rule is not limited.
  • the specific processing manner is the same as the processing method in the step 205. For details, see step 205 above. .
  • the method provided in this embodiment obtains a first mapping result by performing a named entity mapping on the voice text, performs vocabulary mapping on the first mapping result, and obtains a second mapping result; and then the second mapping result and the preset including the regular rule.
  • the rules are matched, and the speech text is processed according to a matching rule, so that the configuration patterns of the regular rules and the named entity rules are unified, so that the processing of the speech text technology is applicable to both the actual language environment and the fixed locale environment. It expands the scope of application of speech text technology, so that the processing of speech text is optimized.
  • an embodiment of the present invention provides an apparatus for processing a voice text, the apparatus comprising:
  • the first mapping module 301 is configured to perform a named entity mapping on the voice text to obtain a first mapping result.
  • the second mapping module 302 is configured to perform vocabulary mapping on the first mapping result to obtain a second mapping result.
  • the matching module 303 is configured to match the second mapping result with a preset rule, where the preset rule includes a regular rule;
  • the first processing module 304 is configured to process the voice text according to the obtained matching rule.
  • the apparatus may further include:
  • the expansion module 305 is configured to sequentially expand one or more named entities in the first mapping result into corresponding voice texts before mapping, to obtain at least two third mapping results;
  • the second mapping module 302 is further configured to perform vocabulary mapping on the voice text that is not mapped to the named entity in each third mapping result to obtain a second mapping result.
  • the apparatus may further include:
  • the elimination module 306 is configured to: when obtaining at least two matching rules, all the matching rules obtained Performing at least one of the matching width elimination processing, the matching weighting value elimination processing, the matching density elimination processing, the hit quantity elimination processing, and the named entity weight elimination processing, until a subtracted one is obtained Matching rules;
  • the second processing module 307 is configured to process the voice text according to the obtained matching rule after the elimination.
  • the elimination module 306 is configured to determine a matching width of the second mapping result corresponding to each matching rule, and use a matching rule with the largest matching width as the matching rule after the matching width elimination processing.
  • the elimination module 306 is configured to determine, according to a preset vocabulary and a weighted value of the named entity, a weighting value of each matching rule to be matched with the weighted value elimination processing; Matching rules after matching weighted value elimination processing;
  • the matching rule to be matched with the weighted value elimination processing is a matching rule that has undergone matching width elimination processing.
  • the elimination module 306 is configured to determine a matching proportion of the matching rule and the second mapping result of each matching density elimination process, and use the matching rule with the largest matching proportion as the matched density elimination process. Matching rules;
  • the matching rule for each of the matching density elimination processes is a matching rule that is processed by the matching width.
  • the elimination module 306 is configured to determine a parameter hit quantity of each matching rule of the number of hits to be processed, the parameter is one of a named entity, a vocabulary, and a position parameter; The matching result is used as a matching rule after the hit quantity is eliminated;
  • the matching rule for each hit quantity elimination processing is a matching rule that is processed by the matching width elimination.
  • the elimination module 306 is configured to determine a weight value of a named entity in a matching rule for which the named entity weight elimination processing is to be performed, and the matching rule with the largest weight value of the named entity is used as the named entity right. Matching rules for value elimination processing;
  • the matching rule for each weighted value of the named entity is a matching rule that is processed by the matching width.
  • the apparatus performs a named entity mapping on a voice text.
  • the processing method is optimized.
  • FIG. 6 is a schematic structural diagram of a terminal according to an embodiment of the present invention.
  • the terminal may be used to implement the method for processing voice text provided in the foregoing embodiment. Specifically:
  • the terminal 600 can include a radio frequency (RF) circuit 110, a memory 120 including one or more computer readable storage media, an input unit 130, a display unit 140, a sensor 150, an audio circuit 160, and a wireless fidelity. , WiFi) module 170, processor 180 including one or more processing cores, and power supply 190 and the like.
  • RF radio frequency
  • FIG. 6 does not constitute a limitation to the terminal, and may include more or less components than those illustrated, or a combination of certain components, or different component arrangements. among them:
  • the RF circuit 110 can be used for transmitting and receiving information or during a call, and receiving and transmitting signals. Specifically, after receiving downlink information of the base station, the downlink information is processed by one or more processors 180. In addition, the data related to the uplink is sent to the base station. .
  • the RF circuit 110 includes, but is not limited to, an antenna, at least one amplifier, a tuner, one or more oscillators, a Subscriber Identitu Module (SIM) card, a transceiver, a coupler, and a low noise amplifier (Low Noise) Amplifier, LNA), duplexer, etc.
  • RF circuitry 110 can also communicate with the network and other devices via wireless communication.
  • the wireless communication may use any communication standard or protocol, including but not limited to Global System of Mobile communication (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (Code Division). Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), Long Term Evolution (LTE), E-mail, Short Messaging Service (SMS), and the like.
  • GSM Global System of Mobile communication
  • GPRS General Packet Radio Service
  • CDMA Code Division Multiple Access
  • WCDMA Wideband Code Division Multiple Access
  • LTE Long Term Evolution
  • E-mail Short Messaging Service
  • the memory 120 can be used to store software programs and modules, and the processor 180 stores them in storage.
  • the software program and modules of the storage 120 perform various functional applications and data processing.
  • the memory 120 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application required for at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may be stored according to The data created by the use of the terminal 600 (such as audio data, phone book, etc.) and the like.
  • memory 120 can include high speed random access memory, and can also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, memory 120 may also include a memory controller to provide access to memory 120 by processor 180 and input unit 130.
  • the input unit 130 can be configured to receive input numeric or character information and to generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function controls.
  • input unit 130 can include touch-sensitive surface 131 as well as other input devices 132.
  • Touch-sensitive surface 131 also referred to as a touch display or trackpad, can collect touch operations on or near the user (such as a user using a finger, stylus, etc., on any suitable object or accessory on touch-sensitive surface 131 or The operation near the touch-sensitive surface 131) and driving the corresponding connecting device according to a preset program.
  • the touch-sensitive surface 131 can include two portions of a touch detection device and a touch controller.
  • the touch detection device detects the touch orientation of the user, and detects a signal brought by the touch operation, and transmits the signal to the touch controller; the touch controller receives the touch information from the touch detection device, converts the touch information into contact coordinates, and sends the touch information.
  • the processor 180 is provided and can receive commands from the processor 180 and execute them.
  • the touch-sensitive surface 131 can be implemented in various types such as resistive, capacitive, infrared, and surface acoustic waves.
  • the input unit 130 can also include other input devices 132.
  • other input devices 132 may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control buttons, switch buttons, etc.), trackballs, mice, joysticks, and the like.
  • the display unit 140 can be used to display information input by the user or information provided to the user and various graphical user interfaces of the terminal 600, which can be composed of graphics, a book, an icon, a video, and any combination thereof.
  • the display unit 140 may include a display panel 141.
  • the display panel 141 may be configured in the form of a liquid crystal display (LCD), an organic light-emitting diode (OLED), or the like.
  • the touch-sensitive surface 131 may cover the display panel 141, and when the touch-sensitive surface 131 detects a touch operation thereon or nearby, it is transmitted to the processor 180 to determine the type of the touch event, and then the processor 180 according to the touch event The type provides a corresponding visual output on display panel 141.
  • touch-sensitive surface 131 and display panel 141 are implemented as two separate components to implement input and input functions, in some embodiments, touch-sensitive surface 131 can be integrated with display panel 141 for input. And output function.
  • Terminal 600 may also include at least one type of sensor 150, such as a light sensor, motion sensor, and other sensors.
  • the light sensor may include an ambient light sensor and a proximity sensor, wherein the ambient light sensor may adjust the brightness of the display panel 141 according to the brightness of the ambient light, and the proximity sensor may close the display panel 141 when the terminal 600 moves to the ear. / or backlight.
  • the gravity acceleration sensor can detect the magnitude of acceleration in all directions (usually three axes). When it is stationary, it can detect the magnitude and direction of gravity.
  • the gesture of the mobile phone such as horizontal and vertical screen switching, related Game, magnetometer attitude calibration), vibration recognition related functions (such as pedometer, tapping), etc.; as for the terminal 600 can also be configured with gyroscopes, barometers, hygrometers, thermometers, infrared sensors and other sensors, here Let me repeat.
  • the audio circuit 160, the speaker 161, and the microphone 162 can provide an audio interface between the user and the terminal 600.
  • the audio circuit 160 can transmit the converted electrical data of the received audio data to the speaker 161 for conversion to the sound signal output by the speaker 161; on the other hand, the microphone 162 converts the collected sound signal into an electrical signal by the audio circuit 160. After receiving, it is converted into audio data, and then processed by the audio data output processor 180, transmitted to the terminal, for example, via the RF circuit 110, or outputted to the memory 120 for further processing.
  • the audio circuit 160 may also include an earbud jack to provide communication of the peripheral earphones with the terminal 600.
  • WiFi is a short-range wireless transmission technology
  • the terminal 600 can help users to send and receive emails, browse web pages, and access streaming media through the WiFi module 170, which provides wireless broadband Internet access for users.
  • FIG. 6 shows the WiFi module 170, it can be understood that it does not belong to the essential configuration of the terminal 600, and may be omitted as needed within the scope of not changing the essence of the invention.
  • the processor 180 is the control center of the terminal 600, which connects various portions of the entire handset using various interfaces and lines, by running or executing software programs and/or modules stored in the memory 120, and recalling data stored in the memory 120, The various functions and processing data of the terminal 600 are performed to perform overall monitoring of the mobile phone.
  • the processor 180 may include one or more processing cores; preferably, the processor 180 may integrate an application processor and a modem processor, where the application processor is mainly Processing the operating system, user interface, applications, etc., the modem processor primarily handles wireless communications. It can be understood that the above modem processor may not be integrated into the processor 180.
  • the terminal 600 also includes a power source 190 (such as a battery) for powering various components.
  • a power source 190 such as a battery
  • the power source can be logically coupled to the processor 180 through a power management system to manage functions such as charging, discharging, and power management through the power management system.
  • Power supply 190 may also include any one or more of a DC or AC power source, a recharging system, a power failure detection circuit, a power converter or inverter, a power status indicator, and the like.
  • the terminal 600 may further include a camera, a Bluetooth module, and the like, and details are not described herein.
  • the display unit of the terminal is a touch screen display
  • the terminal further includes a memory, and one or more programs, wherein one or more programs are stored in the memory and configured to be processed by one or more Executed, the one or more programs include instructions for performing the following operations:
  • the second mapping result is matched with the preset rule, and the voice text is processed according to the obtained matching rule, wherein the preset rule includes a regular rule.
  • the memory of the terminal further includes an instruction for performing the following operations:
  • the method may further include:
  • the one or more named entities in the first mapping result are sequentially expanded into the corresponding voice texts before the mapping, and at least two third mapping results are obtained.
  • Perform vocabulary mapping on the first mapping result including:
  • the vocabulary mapping of each of the third mapping results that is not mapped to the named entity is lexically mapped to obtain a second mapping result.
  • the memory of the terminal further includes an instruction for performing the following operations:
  • the method further includes:
  • all matching rules obtained are matched in order. At least one of the elimination processing, the matching weight elimination processing, the matching density elimination processing, the hit quantity elimination processing, and the named entity weight elimination processing until a matching rule after the elimination is obtained;
  • the speech text is processed according to a obtained matching rule.
  • the memory of the terminal further includes an instruction for performing the following operations:
  • Match width reduction processing including:
  • the matching width of the second mapping result corresponding to each matching rule is determined, and the matching rule with the largest matching width is used as the matching rule after the matching width elimination processing.
  • the memory of the terminal further includes an instruction for performing the following operations:
  • Perform matching weight elimination processing including:
  • the matching rule with the largest weighting value is used as the matching rule after the matching weighted value elimination processing
  • the matching rule to be matched with the weighted value elimination processing is a matching rule that has undergone matching width elimination processing.
  • the memory of the terminal further includes an instruction for performing the following operations:
  • Perform matching density elimination processing including:
  • the matching rule for each of the matching density elimination processes is a matching rule that is processed by the matching width.
  • the memory of the terminal further includes an instruction for performing the following operations:
  • Perform hit quantity elimination including:
  • Determining the number of parameter hits of the matching rule for each hit quantity elimination process the parameter being one of a named entity, a vocabulary, and a location parameter;
  • the matching result with the largest number of parameter hits is used as the matching rule after the hit quantity is eliminated;
  • the matching rule for each hit quantity elimination processing is a matching rule that is processed by the matching width elimination.
  • the memory of the terminal further includes an instruction for performing the following operations:
  • the matching rule for each weighted value of the named entity is a matching rule that is processed by the matching width.
  • the terminal provided by the embodiment of the present invention obtains a first mapping result by performing naming entity mapping on the voice text; performing vocabulary mapping on the first mapping result to obtain a second mapping result; and then performing the second mapping result and
  • the preset rules including the regular rules are matched, and the speech text is processed according to the obtained matching rule, thereby unifying the configuration format of the regular rules and the named entity rules, so that the processing of the speech text technology is applicable to the actual language environment, and Applicable to a fixed language environment, expanding the scope of processing speech text technology, so that the processing of voice text is optimized.
  • the embodiment of the present invention further provides a computer readable storage medium, which may be a computer readable storage medium included in the memory in the above embodiment; or may exist separately and not assembled into the terminal.
  • Computer readable storage medium stores one or more programs that are used by one or more processors to perform a method of processing voice text, the method comprising:
  • the second mapping result is matched with the preset rule, and the voice text is processed according to the obtained matching rule, wherein the preset rule includes a regular rule.
  • the memory of the terminal further includes a decree for performing the following operations:
  • the one or more named entities in the first mapping result are sequentially expanded into the corresponding voice texts before the mapping, and at least two third mapping results are obtained.
  • Perform vocabulary mapping on the first mapping result including:
  • the vocabulary mapping of each of the third mapping results that is not mapped to the named entity is lexically mapped to obtain a second mapping result.
  • the memory of the terminal further includes an instruction for performing the following operations:
  • the method further includes:
  • the matching width elimination processing, the matching weighting value elimination processing, the matching density elimination processing, the hit quantity elimination processing, and the named entity weight elimination processing are sequentially performed on all the matching rules obtained. At least one type of elimination process until a matching rule is obtained;
  • the speech text is processed according to a obtained matching rule.
  • the memory of the terminal further includes an instruction for performing the following operations:
  • Match width reduction processing including:
  • the matching width of the second mapping result corresponding to each matching rule is determined, and the matching rule with the largest matching width is used as the matching rule after the matching width elimination processing.
  • the memory of the terminal further includes an instruction for performing the following operations:
  • Perform matching weight elimination processing including:
  • the matching rule with the largest weighting value is used as the matching rule after the matching weighted value elimination processing
  • the matching rule to be matched with the weighted value elimination processing is a matching rule that has undergone matching width elimination processing.
  • the memory of the terminal further includes an instruction for performing the following operations:
  • Perform matching density elimination processing including:
  • the matching rule for each of the matching density elimination processes is a matching rule that is processed by the matching width.
  • the memory of the terminal further includes an instruction for performing the following operations:
  • Perform hit quantity elimination including:
  • Determining the number of parameter hits of the matching rule for each hit quantity elimination process the parameter being one of a named entity, a vocabulary, and a location parameter;
  • the matching result with the largest number of parameter hits is used as the matching rule after the hit quantity is eliminated;
  • the matching rule for each hit quantity elimination processing is a matching rule that is processed by the matching width elimination.
  • the memory of the terminal further includes an instruction for performing the following operations:
  • the matching rule for each weighted value of the named entity is a matching rule that is processed by the matching width.
  • the computer readable storage medium obtains a first mapping result by performing a named entity mapping on the voice text, and performs vocabulary mapping on the first mapping result to obtain a second mapping result;
  • the second mapping result is matched with the preset rule including the regular rule, and the speech text is processed according to the obtained matching rule, thereby unifying the configuration format of the regular rule and the named entity rule, so that the processing of the speech text technology is applicable to the actual
  • the language environment which is also suitable for a fixed locale, expands the scope of processing speech text technology, thus enabling the processing of speech text. The way is optimized.
  • a graphical user interface is provided in an embodiment of the present invention, the graphical user interface being used on a terminal, the terminal comprising a touch screen display, a memory, and one or more processors for executing one or more programs;
  • the graphic User interfaces include:
  • the second mapping result is matched with the preset rule, and the voice text is processed according to the obtained matching rule, wherein the preset rule includes a regular rule.
  • the graphical user interface obtained by performing naming entity mapping on the voice text; performing vocabulary mapping on the first mapping result to obtain a second mapping result; and then performing the second mapping result.
  • the voice text is processed by the foregoing embodiment, only the division of each functional module is described in the processing of the voice text. In actual applications, the functions may be allocated by different functional modules according to requirements. Upon completion, the internal structure of the device is divided into different functional modules to perform all or part of the functions described above.
  • the device for processing the voice text provided by the foregoing embodiment is the same as the method for processing the voice text. The specific implementation process is described in detail in the method embodiment, and details are not described herein again.
  • a person skilled in the art may understand that all or part of the steps of implementing the above embodiments may be completed by hardware, or may be instructed by a program to execute related hardware, and the program may be stored in a computer readable storage medium.
  • the storage medium mentioned may be a read only memory, a magnetic disk or an optical disk or the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Library & Information Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Machine Translation (AREA)

Abstract

一种处理语音文本的方法及装置,属于信息处理技术领域。该方法包括:对语音文本进行命名实体映射,得到第一映射结果(101);对第一映射结果进行词汇映射,得到第二映射结果(102);将第二映射结果与预设规则进行匹配,根据得到的一个匹配规则对语音文本进行处理,其中预设规则包括正则规则(103)。将正则规则和命名实体规则的配置格式统一,扩大了处理语音文本技术的适用范围,从而使语音文本的处理方式得到了优化。

Description

处理语音文本的方法及装置
本申请要求于2013年11月07日提交中国专利局、申请号为201310554808.X、发明名称为“处理语音文本的方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及信息处理技术领域,特别涉及一种处理语音文本的方法及装置。
背景技术
随着信息处理技术的不断发展,以自然语言进行人机交互成为现实。实现人机交互的关键是要准确理解用户发出的自然语言指令并进行相应的操作。用户在发出自然语言指令后,该指令被转换为语音文本,如何处理语音文本,成为人们关注的问题。
但现有语音文本处理技术具有一定的局限性,导致语音文本的处理方式不够优化。
发明内容
为了解决现有技术的问题,本发明实施例提供了一种处理语音文本的方法及装置。所述技术方案如下。
一方面,本发明实施例提供了一种处理语音丈本的方法,所述方法包括:
对语音文本进行命名实体映射,得到第一映射结果;
对所述第一映射结果进行词汇映射,得到第二映射结果;
将所述第二映射结果与预设规则进行匹配,根据得到的一个匹配规则对所述语音文本进行处理,其中所述预设规则包括正则规则。
另一方面,本发明实施例提供了一种处理语音文本的装置,所述装置包括:
第一映射模块,用于对语音文本进行命名实体映射,得到第一映射结果;
第二映射模块,用于对所述第一映射结果进行词汇映射,得到第二映射结果;
匹配模块,用于将所述第二映射结果与预设规则进行匹配,其中所述预设规则包括正则规则;
第一处理模块,用于得到的一个匹配规则对所述语音文本进行处理。
再一方面,本发明实施例提供了一种非瞬时性的计算机可读存储介质,其上存储有计算机可执行指令,当计算机中运行这些可执行指令时,执行如下步骤:
对语音文本进行命名实体映射,得到第一映射结果;
对所述第一映射结果进行词汇映射,得到第二映射结果;以及
将所述第二映射结果与预设规则进行匹配,根据得到的匹配规则对所述语音文本进行处理,其中所述预设规则包括正则规则。
通过对语音文本进行命名实体映射,得到第一映射结果;对第一映射结果进行词汇映射,得到第二映射结果;之后将第二映射结果与包括正则规则的预设规则进行匹配,并根据得到的一个匹配规则对语音文本进行处理,从而将正则规则和命名实体规则的配置格式统一,使处理语音文本技术既适用于实际的语言环境,又适用于固定的语言环境,扩大了处理语音文本技术的适用范围,从而使语音文本的处理方式得到了优化。
附图说明
图1是本发明实施例一提供的处理语音文本的方法流程图;
图2是本发明实施例二提供的处理语音文本的方法流程图;
图3是本发明实施例三提供的第一种处理语音文本的装置结构示意图;
图4是本发明实施例三提供的第二种处理语音文本的装置结构示意图;
图5是本发明实施例三提供的第三种处理语音文本的装置结构示意图;
图6是本发明实施例四提供的处理语音文本的终端结构示意图。
具体实施方式
为使本发明的目的、技术方案和优点更加清楚,下面将结合附图对本发明 实施方式作进一步地详细描述。
实施例一
本发明实施例提供了一种处理语音文本的方法,参见图1,方法流程包括步骤101至步骤103。
在步骤101中,对语音文本进行命名实体映射,得到第一映射结果。
在步骤102中,对第一映射结果进行词汇映射,得到第二映射结果。
在一实施例中,对第一映射结果进行词汇映射之前,该方法还可以包括:
将第一映射结果中的一至多个命名实体依次展开为映射前对应的语音文本,得到至少两个第三映射结果。
对第一映射结果进行词汇映射,包括:
将每个第三映射结果中未被映射为命名实体的语音文本进行词汇映射,得到第二映射结果。
在步骤103中,将第二映射结果与预设规则进行匹配,根据得到的一个匹配规则对语音文本进行处理,其中预设规则包括正则规则。
在一实施例中,将第二映射结果与包括正则规则的预设规则进行匹配之后,该方法还可以包括:
如果得到至少两个匹配规则,则对得到的所有匹配规则依次进行匹配宽度消岐处理、匹配加权值消岐处理、匹配密度消岐处理、命中数量消岐处理以及命名实体权值消岐处理中的至少一种消岐处理,直至得到一个消岐后的匹配规则;
根据得到的一个消岐后的匹配规则对语音文本进行处理。
在一实施例中,匹配宽度消岐处理可以包括:
确定每个匹配规则对应的第二映射结果的匹配宽度,将匹配宽度最大的匹配规则作为经过匹配宽度消岐处理的匹配规则。
在一实施例中,匹配加权值消岐处理可以包括:
按照预先设置的词汇与命名实体的加权值确定每个待进行匹配加权值消岐处理的匹配规则的加权值;以及
将加权值最大的匹配规则作为经过匹配加权值消岐处理的匹配规则;
其中,待进行匹配加权值消岐处理的匹配规则为经过匹配宽度消岐处理的匹配规则。
在一实施例中,匹配密度消岐处理可以包括:
确定每个待进行匹配密度消岐处理的匹配规则与第二映射结果的匹配比重,将匹配比重最大的匹配规则作为经过匹配密度消岐处理的匹配规则;
其中,每个待进行匹配密度消岐处理的匹配规则为经过匹配宽度消岐处理的匹配规则。
在一实施例中,命中数量消岐处理可以包括:
确定每个待进行命中数量消岐处理的匹配规则的参数命中数量,参数为命名实体、词汇和位置参数中的一个;以及
将参数命中数量最大的匹配结果作为经过命中数量消岐处理的匹配规则;
其中,每个待进行命中数量消岐处理的匹配规则为经过匹配宽度消岐处理的匹配规则。
在一实施例中,命名实体权值消岐处理可以包括:
确定每个待进行命名实体权值消岐处理的匹配规则中命名实体的权重值,将命名实体的权重值最大的匹配规则作为经过命名实体权值消岐处理的匹配规则;
其中,每个待进行命名实体权值消岐处理的匹配规则为经过匹配宽度消岐处理的匹配规则。
本实施例提供的方法,通过对语音文本进行命名实体映射,得到第一映射结果;对第一映射结果进行词汇映射,得到第二映射结果;之后将第二映射结果与包括正则规则的预设规则进行匹配,并根据得到的一个匹配规则对语音文本进行处理,从而将正则规则和命名实体规则的配置格式统一,使处理语音文本技术既适用于实际的语言环境,又适用于固定的语言环境,扩大了处理语音文本技术的适用范围,从而使语音文本的处理方式得到了优化。
实施例二
本发明实施例提供了一种处理语音文本的方法,结合上述实施例一的内容,参见图2,方法流程包括步骤201至步骤206。
在步骤201中,对语音文本进行命名实体映射,得到第一映射结果。
在本实施例中,对语音文本进行命名实体映射可以包括但不限于:建立命名实体库;在语音文本中查找能够被识别为命名实体库中的命名实体的语音本文,并用命名实体替换查找到的语音文本。需要说明的是,命名实体可以是从网上大量的信息中搜集的,并且命名实体的识别是使用各个领域独立字典树的实现方式,即能够支持到命名实体全部重合或部分重合的情况下找出所有的命名实体。
为了便于理解,以语音文本为“我想听第一次”,命名实体用中文大括号【】表示为例进行说明。建立的命名实体中语音文本“我想”及“第一次”都对应命名实体【歌名】;在语音文本中查找到语音文本“我想”及“第一次”都能够被识别为命名实体库中的【歌名】,用命名实体【歌名】替换查找到的语音文本“我想”及“第一次”后得到第一映射结果,即【歌名】听【歌名】。
在步骤202中,将第一映射结果中的一至多个命名实体依次展开为映射前对应的语音文本,得到至少两个第三映射结果。
由于命名实体可能和词汇重叠,则将第一映射结果中的一至多个命名实体依次展开为映射前对应的语音文本,得到至少两个第三映射结果,从而增加了映射结果,避免命名实体和词汇重叠的情况下不能准确处理语音文本。
需要说明的是,如果第一映射结果中没有命名实体部分重叠的情况,并且第一映射结果中一共有n个命名实体,则将一至多个命名实体依次展开为映射前对应的语音文本后得到2^n个第三映射结果。
为了便于理解,仍以语音文本为“我想听第一次”为例进行说明。其中,语音文本对应的第一映射结果为【歌名】听【歌名】。将【歌名】听【歌名】中的两个命名实体依次展开为映射前对应的语音文本,得到【歌名】听【歌名】、我想听【歌名】、【歌名】听第一次、我想听第一次,这四个第三映射结果。
在步骤203中,将每个第三映射结果中未被映射为命名实体的语音文本进行词汇映射,得到第二映射结果。
进行词汇映射中的词汇可以是人工配置的,例如,可以将日常语言中表达相同含义的语音文本映射为相同的词汇。由于相同含义的语音文本在实际的语言环境中可能对应不同的语音丈本,将未被映射为命名实体的语音文本进行词 汇映射,可以减少重复匹配的工作量。关于将每个第三映射结果中未被映射为命名实体的语音文本进行词汇映射的方式,本实施例不作具体限定。
为了便于理解,以上述步骤202中的第三映射结果为例进行说明。其中,“我想听”映射到词汇<播放>。将每个第三映射结果中未被映射为命名实体的语音文本进行词汇映射,得到第二映射结果为:【歌名】听【歌名】,<播放>【歌名】,【歌名】听第一次,<播放>第一次。
在步骤204中,将第二映射结果与预设规则进行匹配,其中预设规则包括正则规则。
在本实施例中,预设规则可以包括但不限于:正则规则及预设的其他规则。其中,预设的其他规则可以包括但不限于设置符合语言习惯的规则。本实施例不对预设的其他规则作具体限定,实际应用中可以根据需要设置预设的其他规则,本实施例对此不作具体限定。
在本实施例中,将第二映射结果与预设规则进行匹配可以包括但不限于:通过规则卡位提取第二映射结果中的位置参数,得到卡位提取结果;将卡位提取结果和预设的其他规则进行匹配。其中,规则卡位是根据正则规则获得的指定位置;通过规则卡位提取第二映射结果中的位置参数,即根据正则规则在指定位置提取第二映射结果中的位置参数。当第二映射结果中没有可以通过规则卡位提取的位置参数时,可直接将第二映射结果和预设的其他规则进行匹配,进而得到匹配的规则。
需要说明的是,将第二映射结果与预设规则进行匹配后,如果得到一个匹配规则,则执行步骤205;如果得到至少两个匹配规则,则执行步骤206。
为了便于理解,以第二映射结果:【歌名】听【歌名】,<播放>【歌名】,【歌名】听第一次,<播放>第一次,为例进行说明。由于第二映射结果中没有可以通过规则卡位提取的位置参数,则直接将第二映射结果和预设的其他规则进行匹配。当预设的其他规则为<播放>【歌名】时,得到与该规则对应的第二映射结果为<播放>【歌名】,进而得到一个匹配规则,即<播放>【歌名】。
又例如,语音文本是“播放转角遇到爱”,则将根据上述步骤201至203对语音文本进行处理,得到<播放>【视频名称】,<播放>【歌名】遇到爱,<播放>转角遇到爱,这三个第二映射结果。由于这三个第二映射结果中没有可 以通过规则卡位提取的位置参数,则直接将这三个第二映射结果和预设的其他规则进行匹配。由于预设的其他规则既为<播放>【歌名】,还可以为<播放>【视频名称】,则得到与这两个规则对应的第二映射结果为<播放>【视频名称】,<播放>【歌名】遇到爱;进而得到两个匹配规则,即<播放>【歌名】,<播放>【视频名称】。
在步骤205中,根据得到的一个匹配规则对语音文本进行处理。
如果上述步骤204中得到一个匹配规则,则根据得到的一个匹配规则对语音文本进行处理。关于对语音文本进行处理的方式,本实施例不作具体限定。
例如,仍以语音文本是“播放转角遇到爱”为例,如果得到的一个匹配规则为<播放>【视频名称】,则该步骤205在根据得到的一个匹配规则对语音文本进行处理时,将播放视频名称为“转角遇到爱”的视频。
在步骤206中,对得到的所有匹配规则依次进行匹配宽度消岐处理、匹配加权值消岐处理、匹配密度消岐处理、命中数量消岐处理以及命名实体权值消岐处理中的至少一种消岐处理,直至得到一个消岐后的匹配规则。
如果上述步骤204中得到至少两个匹配规则,为了使语音文本的处理结果更为准确,本实施例提供的方法采取了对所有匹配规则进行消岐处理的方式。其中,消岐处理的过程可以包括但不限于:对得到的所有匹配规则依次进行匹配宽度消岐处理、匹配加权值消岐处理、匹配密度消岐处理、命中数量消岐处理以及命名实体权值消岐处理中的至少一种消岐处理,直至得到一个消岐后的匹配规则。
其中,匹配宽度消岐处理的方式可以包括但不限于:确定每个匹配规则对应的第二映射结果的匹配宽度,将匹配宽度最大的匹配规则作为经过匹配宽度消岐处理的匹配规则。匹配宽度的确定是从第二映射结果中第一个参数的起始位置到最后一个参数的结束位置。其中,参数可以包括但不限于:命名实体、词汇和位置参数。需要说明的是,为了避免没有实际意义的语音文本等影响对语音文本的处理,在确定匹配宽度之前需要设置阈值,该阈值用于确定第二映射的结果的匹配宽度,即在阈值范围内认为第二映射结果的匹配宽度一致。设置的阈值的大小,本实施例不作具体限定,实际应用中可以根据需要设置任意的阈值。
为了便于理解,以语音文本是“播放转角遇到爱”为例进行说明。其中,设置阈值为2个字节。对语音文本进行匹配后,得到<播放>【歌名】,<播放>【视频名称】,这两个匹配规则;这两个匹配规则分别对应<播放>【歌名】遇到爱,<播放>【视频名称】,这两个第二映射结果。对得到的所有匹配规则进行匹配宽度消岐处理,确定匹配规则<播放>【歌名】对应的第二映射结果<播放>【歌名】遇到爱的匹配宽度:第一个参数是句首<播放>,最后一个参数是【歌名】;并且由于设置的阈值为2字节,所以“遇到爱”这三个字不能被忽略,则匹配规则<播放>【歌名】的匹配宽度不是从句首到句尾;匹配规则<播放>【视频名称】对应的第二映射结果<播放>【视频名称】的匹配宽度:第一个参数是<播放>,最后一个参数是【视频名称】,则匹配规则<播放>【视频名称】的匹配宽度是从句首到句尾,比匹配规则<播放>【歌名】的匹配宽度更大;进而将匹配宽度最大的匹配规则<播放>【视频名称】作为经过匹配宽度消岐处理的匹配规则,得到一个消岐后的匹配规则。
其中,匹配加权值消岐处理的方式可以包括但不限于:按照预先设置的词汇与命名实体的加权值确定每个待进行匹配加权值消岐处理的匹配规则的加权值;以及
将加权值最大的匹配规则作为经过匹配加权值消岐处理的匹配规则;
其中,待进行匹配加权值消岐处理的匹配规则为经过匹配宽度消岐处理的匹配规则。
需要说明的是,预先设置的词汇与命名实体的加权值大小,本实施例不作具体限定。由于词汇可以是人工配置的,而命名实体是从网上大量的信息中搜集的,则通常情况下词汇的权重要比命名实体的权重大。
为了便于理解,以语音文本是“找一下附近的餐馆”为例进行说明。对语音文本进行匹配后,得到<查找><餐厅>,<查找>【餐馆名】,这两个匹配规则;这两个匹配规则分别对应<查找>附近的<餐厅>,<查找>附近的【餐馆名】,这两个第二映射结果。对得到的所有匹配规则进行匹配宽度消岐处理,确定这两个匹配规则的匹配宽度相同。由于匹配宽度消岐后的匹配规则多于一个,则对匹配宽度消岐后的匹配规则进行匹配加权值消岐处理。如果预先设置的词汇与命名实体的加权值的权重比是2:1,则确定匹配规则<查找><餐厅>的加权值 大于匹配规则<查找>【餐馆名】,即加权值最大的匹配规是<查找><餐厅>;进而将<查找><餐厅>作为经过匹配加权值消岐处理的匹配规则,得到一个消岐后的匹配规则。
其中,匹配密度消岐处理的方式可以包括但不限于:确定每个待进行匹配密度消岐处理的匹配规则与第二映射结果的匹配比重,将匹配比重最大的匹配规则作为经过匹配密度消岐处理的匹配规则;
其中,每个待进行匹配密度消岐处理的匹配规则为经过匹配宽度消岐处理的匹配规则。
需要说明的是,匹配密度消岐处理是针对匹配规则中包含同类参数的情况。例如,匹配规则中都包含词汇,或者匹配规则中都包含命名实体。
为了便于理解,以语音文本是“播放小时代”为例进行说明。对语音文本进行匹配后,得到<播放>【电影名】,<播放>【歌名】,这两个匹配规则;这两个匹配规则分别对应<播放>【电影名】,<播放>小【歌名】,这两个第二映射结果。对得到的所有匹配规则进行匹配宽度消岐,确定这两个匹配规则的匹配宽度相同。由于匹配宽度消岐后的匹配规则多于一个,则对匹配宽度消岐后的匹配规则进行匹配加权值消岐处理,确定这两个匹配规则的加权值相同。由于匹配加权值消岐后的匹配规则多于一个,则对匹配加权值消岐后的匹配规则进行匹配密度消岐。确定<播放>【电影名】与<播放>【电影名】的匹配比重为100%,<播放>【歌名】与<播放>小【歌名】的匹配比重为80%,即匹配比重最大的匹配规则是<播放>【电影名】;进而将<播放>【电影名】作为经过匹配密度消岐处理的匹配规则,得到一个消岐后的匹配规则。
其中,命中数量消岐处理的方式可以包括但不限于:确定每个待进行命中数量消岐处理的匹配规则的参数命中数量,参数为命名实体、词汇和位置参数中的一个;以及
将参数命中数量最大的匹配结果作为经过命中数量消岐处理的匹配规则;
其中,每个待进行命中数量消岐处理的匹配规则为经过匹配宽度消岐处理的匹配规则。
为了便于理解,以语音文本是“播放歌曲风”为例进行说明。对语音文本进行匹配后,得到<播放><歌曲>%s1,<播放><歌曲>,这两个匹配规则;其 中,%s1是位置参数;这两个匹配规则都对应<播放><歌曲>风,这个第二映射结果。对得到的所有匹配规则进行匹配宽度消岐处理,当设置阈值为2个字节时,确定这两个匹配规则的匹配宽度相同。由于匹配宽度消岐后的匹配规则多于一个,则对匹配宽度消岐后的匹配规则进行匹配加权值消岐处理,确定这两个匹配规则的加权值相同。由于匹配加权值消岐后的匹配规则多于一个,则对匹配加权值消岐后的匹配规则进行匹配密度消岐。由于<播放><歌曲>中不包含位置参数,则不能对匹配加权值消岐后的匹配规则进行匹配密度消岐处理,进而按照顺序对匹配加权值消岐后的匹配规则进行命中数量消岐处理。确定<播放><歌曲>%s1的参数命中数量是3,<播放><歌曲>的参数命中数量是2,即参数命中数量最大的匹配规则是<播放><歌曲>%s1;进而将<播放><歌曲>%s1作为经过命中数量消岐处理的匹配规则,得到一个消岐后的匹配规则。
其中,命名实体权值消岐处理的方式可以包括但不限于:确定每个待进行命名实体权值消岐处理的匹配规则中命名实体的权重值,将命名实体的权重值最大的匹配规则作为经过命名实体权值消岐处理的匹配规则;
其中,每个待进行命名实体权值消岐处理的匹配规则为经过匹配宽度消岐处理的匹配规则。
为了便于理解,以语音文本是“播放致青春”为例进行说明。对语音文本进行匹配后,得到<播放>【电影名】,<播放>【歌名】,这两个匹配规则分别对应<播放>【电影名】,<播放>【歌名】,这两个第二映射结果。对得到的所有匹配规则进行匹配宽度消岐处理,确定这两个匹配规则的匹配宽度相同。由于匹配宽度消岐后的匹配规则多于一个,则对匹配宽度消岐后的匹配规则进行匹配加权值消岐处理,确定这两个匹配规则的加权值相同。由于匹配加权值消岐后的匹配规则多于一个,则对匹配加权值消岐后的匹配规则进行匹配密度消岐处理,确定这两个匹配规则的匹配密度相同。由于匹配密度消岐后的匹配规则多于一个,则对匹配密度消岐后的匹配规则进行命中数量消岐处理,确定这两个匹配规则的参数命中数量相同。由于命中数量消岐后的匹配规则多于一个,则对命中数量消岐后的匹配规则进行命名实体权值消岐处理。当确定命名实体【电影名】的权重值大于命名实体【歌名】的权重值时,确定<播放>【电影名】中命名实体的权重值大于<播放>【歌名】中命名实体的权重值,即命 名实体的权重值最大的匹配规则是<播放>【电影名】;进而将<播放>【电影名】作为经过命名实体权值消岐处理的匹配规则,得到一个消岐后的匹配规则。
在步骤207中,根据得到的一个消岐后的匹配规则对语音文本进行处理。
针对该步骤,关于根据得到的一个消岐后的匹配规则对语音文本进行处理的方式,本实施例不作具体限定,具体处理方式与上述步骤205中的处理方式原理相同,具体详见上述步骤205。
本实施例提供的方法,通过对语音文本进行命名实体映射,得到第一映射结果;对第一映射结果进行词汇映射,得到第二映射结果;之后将第二映射结果与包括正则规则的预设规则进行匹配,并根据得到的一个匹配规则对语音文本进行处理,从而将正则规则和命名实体规则的配置格式统一,使处理语音文本技术既适用于实际的语言环境,又适用于固定的语言环境,扩大了处理语音文本技术的适用范围,从而使语音文本的处理方式得到了优化。
实施例三
参见图3,本发明实施例提供了一种处理语音文本的装置,该装置包括:
第一映射模块301,用于对语音文本进行命名实体映射,得到第一映射结果;
第二映射模块302,用于对第一映射结果进行词汇映射,得到第二映射结果;
匹配模块303,用于将第二映射结果与预设规则进行匹配,其中预设规则包括正则规则;
第一处理模块304,用于根据得到的一个匹配规则对语音文本进行处理。
参见图4,在一实施例中,该装置还可以包括:
展开模块305,用于将第一映射结果中的一至多个命名实体依次展开为映射前对应的语音文本,得到至少两个第三映射结果;
第二映射模块302,还用于将每个第三映射结果中未被映射为命名实体的语音文本进行词汇映射,得到第二映射结果。
参见图5,在一实施例中,该装置还可以包括:
消岐模块306,用于当得到至少两个匹配规则时,对得到的所有匹配规则 依次进行匹配宽度消岐处理、匹配加权值消岐处理、匹配密度消岐处理、命中数量消岐处理以及命名实体权值消岐处理中的至少一种消岐处理,直至得到一个消岐后的匹配规则;
第二处理模块307,用于根据得到的消岐后的匹配规则对语音文本进行处理。
在一实施例中,消岐模块306,用于确定每个匹配规则对应的第二映射结果的匹配宽度,将匹配宽度最大的匹配规则作为经过匹配宽度消岐处理的匹配规则。
在一实施例中,消岐模块306,用于按照预先设置的词汇与命名实体的加权值确定每个待进行匹配加权值消岐处理的匹配规则的加权值;将加权值最大的匹配规则作为经过匹配加权值消岐处理的匹配规则;
其中,待进行匹配加权值消岐处理的匹配规则为经过匹配宽度消岐处理的匹配规则。
在一实施例中,消岐模块306,用于确定每个待进行匹配密度消岐处理的匹配规则与第二映射结果的匹配比重,将匹配比重最大的匹配规则作为经过匹配密度消岐处理的匹配规则;
其中,每个待进行匹配密度消岐处理的匹配规则为经过匹配宽度消岐处理的匹配规则。
在一实施例中,消岐模块306,用于确定每个待进行命中数量消岐处理的匹配规则的参数命中数量,参数为命名实体、词汇和位置参数中的一个;将参数命中数量最大的匹配结果作为经过命中数量消岐处理的匹配规则;
其中,每个待进行命中数量消岐处理的匹配规则为经过匹配宽度消岐处理的匹配规则。
在一实施例中,消岐模块306,用于确定每个待进行命名实体权值消岐处理的匹配规则中命名实体的权重值,将命名实体的权重值最大的匹配规则作为经过命名实体权值消岐处理的匹配规则;
其中,每个待进行命名实体权值消岐处理的匹配规则为经过匹配宽度消岐处理的匹配规则。
综上所述,本发明实施例提供的装置,通过对语音文本进行命名实体映射, 得到第一映射结果;对第一映射结果进行词汇映射,得到第二映射结果;之后将第二映射结果与包括正则规则的预设规则进行匹配,并根据得到的一个匹配规则对语音文本进行处理,从而将正则规则和命名实体规则的配置格式统一,使处理语音文本技术既适用于实际的语言环境,又适用于固定的语言环境,扩大了处理语音文本技术的适用范围,从而使语音文本的处理方式得到了优化。
实施例四
本发明实施例提供了一种终端,请参考图6,其示出了本发明实施例所涉及的终端的结构示意图,该终端可以用于实施上述实施例中提供的处理语音文本的方法。具体来讲:
终端600可以包括射频(Radio Frequency,RF)电路110、包括有一个或一个以上计算机可读存储介质的存储器120、输入单元130、显示单元140、传感器150、音频电路160、无线保真(Wireless Fidelity,WiFi)模块170、包括有一个或者一个以上处理核心的处理器180、以及电源190等部件。本领域技术人员可以理解,图6中示出的终端结构并不构成对终端的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。其中:
RF电路110可用于收发信息或通话过程中,信号的接收和发送,特别地,将基站的下行信息接收后,交由一个或者一个以上处理器180处理;另外,将涉及上行的数据发送给基站。通常,RF电路110包括但不限于天线、至少一个放大器、调谐器、一个或多个振荡器、用户身份模块(Subscriber Identitu Module,SIM)卡、收发信机、耦合器、低噪声放大器(Low Noise Amplifier,LNA)、双工器等。此外,RF电路110还可以通过无线通信与网络和其他设备通信。所述无线通信可以使用任一通信标准或协议,包括但不限于全球移动通讯***(Global System ofMobile communication,GSM)、通用分组无线服务(General Packet Radio Service,GPRS)、码分多址(Code Division Multiple Access,CDMA)、宽带码分多址(Wideband Code Division Multiple Access,WCDMA)、长期演进(Long Term Evolution,LTE)、电子邮件、短消息服务(Short Messaging Service,SMS)等。
存储器120可用于存储软件程序以及模块,处理器180通过运行存储在存 储器120的软件程序以及模块,从而执行各种功能应用以及数据处理。存储器120可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作***、至少一个功能所需的应用程序(比如声音播放功能、图像播放功能等)等;存储数据区可存储根据终端600的使用所创建的数据(比如音频数据、电话本等)等。此外,存储器120可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。相应地,存储器120还可以包括存储器控制器,以提供处理器180和输入单元130对存储器120的访问。
输入单元130可用于接收输入的数字或字符信息,以及产生与用户设置以及功能控制有关的键盘、鼠标、操作杆、光学或者轨迹球信号输入。具体地,输入单元130可包括触敏表面131以及其他输入设备132。触敏表面131,也称为触摸显示屏或者触控板,可收集用户在其上或附近的触摸操作(比如用户使用手指、触笔等任何适合的物体或附件在触敏表面131上或在触敏表面131附近的操作),并根据预先设定的程式驱动相应的连接装置。可选的,触敏表面131可包括触摸检测装置和触摸控制器两个部分。其中,触摸检测装置检测用户的触摸方位,并检测触摸操作带来的信号,将信号传送给触摸控制器;触摸控制器从触摸检测装置上接收触摸信息,并将它转换成触点坐标,再送给处理器180,并能接收处理器180发来的命令并加以执行。此外,可以采用电阻式、电容式、红外线以及表面声波等多种类型实现触敏表面131。除了触敏表面131,输入单元130还可以包括其他输入设备132。具体地,其他输入设备132可以包括但不限于物理键盘、功能键(比如音量控制按键、开关按键等)、轨迹球、鼠标、操作杆等中的一种或多种。
显示单元140可用于显示由用户输入的信息或提供给用户的信息以及终端600的各种图形用户接口,这些图形用户接口可以由图形、丈本、图标、视频和其任意组合来构成。显示单元140可包括显示面板141,可选的,可以采用液晶显示器(Liquid Crystal Display,LCD)、有机发光二极管(Organic Light-Emitting Diode,OLED)等形式来配置显示面板141。进一步的,触敏表面131可覆盖显示面板141,当触敏表面131检测到在其上或附近的触摸操作后,传送给处理器180以确定触摸事件的类型,随后处理器180根据触摸事件 的类型在显示面板141上提供相应的视觉输出。虽然在图6中,触敏表面131与显示面板141是作为两个独立的部件来实现输入和输入功能,但是在某些实施例中,可以将触敏表面131与显示面板141集成而实现输入和输出功能。
终端600还可包括至少一种传感器150,比如光传感器、运动传感器以及其他传感器。具体地,光传感器可包括环境光传感器及接近传感器,其中,环境光传感器可根据环境光线的明暗来调节显示面板141的亮度,接近传感器可在终端600移动到耳边时,关闭显示面板141和/或背光。作为运动传感器的一种,重力加速度传感器可检测各个方向上(一般为三轴)加速度的大小,静止时可检测出重力的大小及方向,可用于识别手机姿态的应用(比如横竖屏切换、相关游戏、磁力计姿态校准)、振动识别相关功能(比如计步器、敲击)等;至于终端600还可配置的陀螺仪、气压计、湿度计、温度计、红外线传感器等其他传感器,在此不再赘述。
音频电路160、扬声器161,传声器162可提供用户与终端600之间的音频接口。音频电路160可将接收到的音频数据转换后的电信号,传输到扬声器161,由扬声器161转换为声音信号输出;另一方面,传声器162将收集的声音信号转换为电信号,由音频电路160接收后转换为音频数据,再将音频数据输出处理器180处理后,经RF电路110以发送给比如另一终端,或者将音频数据输出至存储器120以便进一步处理。音频电路160还可能包括耳塞插孔,以提供外设耳机与终端600的通信。
WiFi属于短距离无线传输技术,终端600通过WiFi模块170可以帮助用户收发电子邮件、浏览网页和访问流式媒体等,它为用户提供了无线的宽带互联网访问。虽然图6示出了WiFi模块170,但是可以理解的是,其并不属于终端600的必须构成,完全可以根据需要在不改变发明的本质的范围内而省略。
处理器180是终端600的控制中心,利用各种接口和线路连接整个手机的各个部分,通过运行或执行存储在存储器120内的软件程序和/或模块,以及调用存储在存储器120内的数据,执行终端600的各种功能和处理数据,从而对手机进行整体监控。可选的,处理器180可包括一个或多个处理核心;优选的,处理器180可集成应用处理器和调制解调处理器,其中,应用处理器主要 处理操作***、用户界面和应用程序等,调制解调处理器主要处理无线通信。可以理解的是,上述调制解调处理器也可以不集成到处理器180中。
终端600还包括给各个部件供电的电源190(比如电池),优选的,电源可以通过电源管理***与处理器180逻辑相连,从而通过电源管理***实现管理充电、放电、以及功耗管理等功能。电源190还可以包括一个或一个以上的直流或交流电源、再充电***、电源故障检测电路、电源转换器或者逆变器、电源状态指示器等任意组件。
尽管未示出,终端600还可以包括摄像头、蓝牙模块等,在此不再赘述。具体在本实施例中,终端的显示单元是触摸屏显示器,终端还包括有存储器,以及一个或者一个以上的程序,其中一个或者一个以上程序存储于存储器中,且经配置以由一个或者一个以上处理器执行,所述一个或者一个以上程序包含用于执行以下操作的指令:
对语音文本进行命名实体映射,得到第一映射结果;
对第一映射结果进行词汇映射,得到第二映射结果;
将第二映射结果与预设规则进行匹配,根据得到的一个匹配规则对语音文本进行处理,其中预设规则包括正则规则。
假设上述为第一种可能的实施方式,则在第一种可能的实施方式作为基础而提供的第二种可能的实施方式中,终端的存储器中,还包含用于执行以下操作的指令:
对第一映射结果进行词汇映射之前,还可以包括:
将第一映射结果中的一至多个命名实体依次展开为映射前对应的语音文本,得到至少两个第三映射结果。
对第一映射结果进行词汇映射,包括:
将每个第三映射结果中未被映射为命名实体的语音文本进行词汇映射,得到第二映射结果。
在第一种或第二种可能的实施方式作为基础而提供的第三种可能的实施方式中,终端的存储器中,还包含用于执行以下操作的指令:
将第二映射结果与包括正则规则的预设规则进行匹配之后,还包括:
如果得到至少两个匹配规则,则对得到的所有匹配规则依次进行匹配宽度 消岐处理、匹配加权值消岐处理、匹配密度消岐处理、命中数量消岐处理以及命名实体权值消岐处理中的至少一种消岐处理,直至得到一个消岐后的匹配规则;
根据得到的一个消岐后的匹配规则对语音文本进行处理。
在第三种可能的实施方式作为基础而提供的第四种可能的实施方式中,终端的存储器中,还包含用于执行以下操作的指令:
进行匹配宽度消岐处理,包括:
确定每个匹配规则对应的第二映射结果的匹配宽度,将匹配宽度最大的匹配规则作为经过匹配宽度消岐处理的匹配规则。
在第四种可能的实施方式作为基础而提供的第五种可能的实施方式中,终端的存储器中,还包含用于执行以下操作的指令:
进行匹配加权值消岐处理,包括:
按照预先设置的词汇与命名实体的加权值确定每个待进行匹配加权值消岐处理的匹配规则的加权值;以及
将加权值最大的匹配规则作为经过匹配加权值消岐处理的匹配规则;
其中,待进行匹配加权值消岐处理的匹配规则为经过匹配宽度消岐处理的匹配规则。
在第五种可能的实施方式作为基础而提供的第六种可能的实施方式中,终端的存储器中,还包含用于执行以下操作的指令:
进行匹配密度消岐处理,包括:
确定每个待进行匹配密度消岐处理的匹配规则与第二映射结果的匹配比重,将匹配比重最大的匹配规则作为经过匹配密度消岐处理的匹配规则;
其中,每个待进行匹配密度消岐处理的匹配规则为经过匹配宽度消岐处理的匹配规则。
在第六种可能的实施方式作为基础而提供的第七种可能的实施方式中,终端的存储器中,还包含用于执行以下操作的指令:
进行命中数量消岐处理,包括:
确定每个待进行命中数量消岐处理的匹配规则的参数命中数量,参数为命名实体、词汇和位置参数中的一个;以及
将参数命中数量最大的匹配结果作为经过命中数量消岐处理的匹配规则;
其中,每个待进行命中数量消岐处理的匹配规则为经过匹配宽度消岐处理的匹配规则。
在第七种可能的实施方式作为基础而提供的第八种可能的实施方式中,终端的存储器中,还包含用于执行以下操作的指令:
进行命名实体权值消岐处理,包括:
确定每个待进行命名实体权值消岐处理的匹配规则中命名实体的权重值,将命名实体的权重值最大的匹配规则作为经过命名实体权值消岐处理的匹配规则;
其中,每个待进行命名实体权值消岐处理的匹配规则为经过匹配宽度消岐处理的匹配规则。
综上所述,本发明实施例提供的终端,通过对语音文本进行命名实体映射,得到第一映射结果;对第一映射结果进行词汇映射,得到第二映射结果;之后将第二映射结果与包括正则规则的预设规则进行匹配,并根据得到的一个匹配规则对语音文本进行处理,从而将正则规则和命名实体规则的配置格式统一,使处理语音文本技术既适用于实际的语言环境,又适用于固定的语言环境,扩大了处理语音文本技术的适用范围,从而使语音文本的处理方式得到了优化。
实施例五
本发明实施例还提供了一种计算机可读存储介质,该计算机可读存储介质可以是上述实施例中的存储器中所包含的计算机可读存储介质;也可以是单独存在,未装配入终端中的计算机可读存储介质。该计算机可读存储介质存储有一个或者一个以上程序,该一个或者一个以上程序被一个或者一个以上的处理器用来执行一个处理语音文本的方法,该方法包括:
对语音文本进行命名实体映射,得到第一映射结果;
对第一映射结果进行词汇映射,得到第二映射结果;
将第二映射结果与预设规则进行匹配,根据得到的一个匹配规则对语音文本进行处理,其中预设规则包括正则规则。
假设上述为第一种可能的实施方式,则在第一种可能的实施方式作为基础 而提供的第二种可能的实施方式中,终端的存储器中,还包含用于执行以下操作的旨令:
对第一映射结果进行词汇映射之前,包括:
将第一映射结果中的一至多个命名实体依次展开为映射前对应的语音文本,得到至少两个第三映射结果。
对第一映射结果进行词汇映射,包括:
将每个第三映射结果中未被映射为命名实体的语音文本进行词汇映射,得到第二映射结果。
在第一种或第二种可能的实施方式作为基础而提供的第三种可能的实施方式中,终端的存储器中,还包含用于执行以下操作的指令:
将第二映射结果与包括正则规则的预设规则进行匹配之后,还包括:
如果得到至少两个匹配规则,则对得到的所有匹配规则依次进行匹配宽度消岐处理、匹配加权值消岐处理、匹配密度消岐处理、命中数量消岐处理以及命名实体权值消岐处理中的至少一种消岐处理,直至得到一个消岐后的匹配规则;
根据得到的一个消岐后的匹配规则对语音文本进行处理。
在第三种可能的实施方式作为基础而提供的第四种可能的实施方式中,终端的存储器中,还包含用于执行以下操作的指令:
进行匹配宽度消岐处理,包括:
确定每个匹配规则对应的第二映射结果的匹配宽度,将匹配宽度最大的匹配规则作为经过匹配宽度消岐处理的匹配规则。
在第四种可能的实施方式作为基础而提供的第五种可能的实施方式中,终端的存储器中,还包含用于执行以下操作的指令:
进行匹配加权值消岐处理,包括:
按照预先设置的词汇与命名实体的加权值确定每个待进行匹配加权值消岐处理的匹配规则的加权值;以及
将加权值最大的匹配规则作为经过匹配加权值消岐处理的匹配规则;
其中,待进行匹配加权值消岐处理的匹配规则为经过匹配宽度消岐处理的匹配规则。
在第五种可能的实施方式作为基础而提供的第六种可能的实施方式中,终端的存储器中,还包含用于执行以下操作的指令:
进行匹配密度消岐处理,包括:
确定每个待进行匹配密度消岐处理的匹配规则与第二映射结果的匹配比重,将匹配比重最大的匹配规则作为经过匹配密度消岐处理的匹配规则;
其中,每个待进行匹配密度消岐处理的匹配规则为经过匹配宽度消岐处理的匹配规则。
在第六种可能的实施方式作为基础而提供的第七种可能的实施方式中,终端的存储器中,还包含用于执行以下操作的指令:
进行命中数量消岐处理,包括:
确定每个待进行命中数量消岐处理的匹配规则的参数命中数量,参数为命名实体、词汇和位置参数中的一个;以及
将参数命中数量最大的匹配结果作为经过命中数量消岐处理的匹配规则;
其中,每个待进行命中数量消岐处理的匹配规则为经过匹配宽度消岐处理的匹配规则。
在第七种可能的实施方式作为基础而提供的第八种可能的实施方式中,终端的存储器中,还包含用于执行以下操作的指令:
进行命名实体权值消岐处理,包括:
确定每个待进行命名实体权值消岐处理的匹配规则中命名实体的权重值,将命名实体的权重值最大的匹配规则作为经过命名实体权值消岐处理的匹配规则;
其中,每个待进行命名实体权值消岐处理的匹配规则为经过匹配宽度消岐处理的匹配规则。
综上所述,本发明实施例提供的计算机可读存储介质,通过对语音文本进行命名实体映射,得到第一映射结果;对第一映射结果进行词汇映射,得到第二映射结果;之后将第二映射结果与包括正则规则的预设规则进行匹配,并根据得到的一个匹配规则对语音文本进行处理,从而将正则规则和命名实体规则的配置格式统一,使处理语音文本技术既适用于实际的语言环境,又适用于固定的语言环境,扩大了处理语音文本技术的适用范围,从而使语音文本的处理 方式得到了优化。
实施例六
本发明实施例中提供了一种图形用户接口,该图形用户接口用在终端上,该终端包括触摸屏显示器、存储器和用于执行一个或者一个以上的程序的一个或者一个以上的处理器;该图形用户接口包括:
对语音文本进行命名实体映射,得到第一映射结果;
对第一映射结果进行词汇映射,得到第二映射结果;
将第二映射结果与预设规则进行匹配,根据得到的一个匹配规则对语音文本进行处理,其中预设规则包括正则规则。
综上所述,本发明实施例提供的图形用户接口通过对语音文本进行命名实体映射,得到第一映射结果;对第一映射结果进行词汇映射,得到第二映射结果;之后将第二映射结果与包括正则规则的预设规则进行匹配,并根据得到的一个匹配规则对语音文本进行处理,从而将正则规则和命名实体规则的配置格式统一,使处理语音文本技术既适用于实际的语言环境,又适用于固定的语言环境,扩大了处理语音文本技术的适用范围,从而使语音文本的处理方式得到了优化。
需要说明的是:上述实施例提供的处理语音文本的装置在处理语音文本时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的处理语音文本的装置与处理语音文本的方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
上述本发明实施例序号仅仅为了描述,不代表实施例的优劣。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。
以上所述仅为本发明的较佳实施例,并不用以限制本发明,凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。

Claims (19)

  1. 一种处理语音文本的方法,其特征在于,所述方法包括:
    对语音文本进行命名实体映射,得到第一映射结果;
    对所述第一映射结果进行词汇映射,得到第二映射结果;以及
    将所述第二映射结果与预设规则进行匹配,根据得到的一个匹配规则对所述语音文本进行处理,其中所述预设规则包括正则规则。
  2. 根据权利要求1所述的方法,其特征在于,所述对所述第一映射结果进行词汇映射之前,包括:
    将所述第一映射结果中的一至多个命名实体依次展开为映射前对应的语音文本,得到至少两个第三映射结果。
  3. 根据权利要求2所述的方法,其特征在于,所述对所述第一映射结果进行词汇映射,包括:
    将每个第三映射结果中未被映射为命名实体的语音文本进行词汇映射,得到第二映射结果。
  4. 根据权利要求1至3中任一项所述的方法,其特征在于,所述将所述第二映射结果与预设规则进行匹配之后,还包括:
    如果得到至少两个匹配规则,则对得到的所有匹配规则依次进行匹配宽度消岐处理、匹配加权值消岐处理、匹配密度消岐处理、命中数量消岐处理以及命名实体权值消岐处理中的至少一种消岐处理,直至得到一个消岐后的匹配规则;
    根据得到的一个消岐后的匹配规则对所述语音文本进行处理。
  5. 根据权利要求4所述的方法,其特征在于,所述匹配宽度消岐处理,包括:
    确定每个匹配规则对应的第二映射结果的匹配宽度,将匹配宽度最大的匹配规则作为经过匹配宽度消岐处理的匹配规则。
  6. 根据权利要求5所述的方法,其特征在于,所述匹配加权值消岐处理,包括:
    按照预先设置的词汇与命名实体的加权值确定每个待进行匹配加权值消岐处理的匹配规则的加权值;以及
    将加权值最大的匹配规则作为经过匹配加权值消岐处理的匹配规则;
    其中,所述待进行匹配加权值消岐处理的匹配规则为经过匹配宽度消岐处理的匹配规则。
  7. 根据权利要求6所述的方法,其特征在于,所述匹配密度消岐处理,包括:
    确定每个待进行匹配密度消岐处理的匹配规则与第二映射结果的匹配比重,将匹配比重最大的匹配规则作为经过匹配密度消岐处理的匹配规则;
    其中,每个待进行匹配密度消岐处理的匹配规则为经过匹配宽度消岐处理的匹配规则。
  8. 根据权利要求7所述的方法,其特征在于,所述命中数量消岐处理,包括:
    确定每个待进行命中数量消岐处理的匹配规则的参数命中数量,所述参数为命名实体、词汇和位置参数中的一个;以及
    将参数命中数量最大的匹配结果作为经过命中数量消岐处理的匹配规则;
    其中,每个待进行命中数量消岐处理的匹配规则为经过匹配宽度消岐处理的匹配规则。
  9. 根据权利要求8所述的方法,其特征在于,所述命名实体权值消岐处理,包括:
    确定每个待进行命名实体权值消岐处理的匹配规则中命名实体的权重值,将命名实体的权重值最大的匹配规则作为经过命名实体权值消岐处理的匹配规则;
    其中,每个待进行命名实体权值消岐处理的匹配规则为经过匹配宽度消岐处理的匹配规则。
  10. 一种处理语音文本的装置,其特征在于,所述装置包括:
    第一映射模块,用于对语音文本进行命名实体映射,得到第一映射结果;
    第二映射模块,用于对所述第一映射结果进行词汇映射,得到第二映射结果;
    匹配模块,用于将所述第二映射结果与预设规则进行匹配,其中所述预设规则包括正则规则;以及
    第一处理模块,用于根据得到的一个匹配规则对所述语音文本进行处理。
  11. 根据权利要求10所述的装置,其特征在于,所述装置,还包括:
    展开模块,用于将所述第一映射结果中的一至多个命名实体依次展开为映射前对应的语音文本,得到至少两个第三映射结果。
  12. 根据权利要求11所述的装置,其特征在于,
    所述第二映射模块,还用于将每个第三映射结果中未被映射为命名实体的语音文本进行词汇映射,得到第二映射结果。
  13. 根据权利要求10至12中任一项所述的装置,其特征在于,所述装置,还包括:
    消岐模块,用于当得到至少两个匹配规则时,对得到的所有匹配规则依次进行匹配宽度消岐处理、匹配加权值消岐处理、匹配密度消岐处理、命中数量消岐处理以及命名实体权值消岐处理中的至少一种消岐处理,直至得到一个消岐后的匹配规则;
    第二处理模块,用于根据得到的消岐后的匹配规则对所述语音文本进行处理。
  14. 根据权利要求13所述的装置,其特征在于,所述消岐模块,用于确 定每个匹配规则对应的第二映射结果的匹配宽度,将匹配宽度最大的匹配规则作为经过匹配宽度消岐处理的匹配规则。
  15. 根据权利要求14所述的装置,其特征在于,所述消岐模块,用于按照预先设置的词汇与命名实体的加权值确定每个待进行匹配加权值消岐处理的匹配规则的加权值;将加权值最大的匹配规则作为经过匹配加权值消岐处理的匹配规则;
    其中,所述待进行匹配加权值消岐处理的匹配规则为经过匹配宽度消岐处理的匹配规则。
  16. 根据权利要求14所述的装置,其特征在于,所述消岐模块,用于确定每个待进行匹配密度消岐处理的匹配规则与第二映射结果的匹配比重,将匹配比重最大的匹配规则作为经过匹配密度消岐处理的匹配规则;
    其中,每个待进行匹配密度消岐处理的匹配规则为经过匹配宽度消岐处理的匹配规则。
  17. 根据权利要求16所述的装置,其特征在于,所述消岐模块,用于确定每个待进行命中数量消岐处理的匹配规则的参数命中数量,所述参数为命名实体、词汇和位置参数中的一个;将参数命中数量最大的匹配结果作为经过命中数量消岐处理的匹配规则;
    其中,每个待进行命中数量消岐处理的匹配规则为经过匹配宽度消岐处理的匹配规则。
  18. 根据权利要求17所述的装置,其特征在于,所述消岐模块,用于确定每个待进行命名实体权值消岐处理的匹配规则中命名实体的权重值,将命名实体的权重值最大的匹配规则作为经过命名实体权值消岐处理的匹配规则;
    其中,每个待进行命名实体权值消岐处理的匹配规则为经过匹配宽度消岐处理的匹配规则。
  19. 一种非瞬时性的计算机可读存储介质,其上存储有计算机可执行指令,当计算机中运行这些可执行指令时,执行如下步骤:
    对语音文本进行命名实体映射,得到第一映射结果;
    对所述第一映射结果进行词汇映射,得到第二映射结果;以及
    将所述第二映射结果与预设规则进行匹配,根据得到的匹配规则对所述语音文本进行处理,其中所述预设规则包括正则规则。
PCT/CN2014/088371 2013-11-07 2014-10-11 处理语音文本的方法及装置 WO2015067116A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201310554808.XA CN104636323B (zh) 2013-11-07 2013-11-07 处理语音文本的方法及装置
CN201310554808.X 2013-11-07

Publications (1)

Publication Number Publication Date
WO2015067116A1 true WO2015067116A1 (zh) 2015-05-14

Family

ID=53040884

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/088371 WO2015067116A1 (zh) 2013-11-07 2014-10-11 处理语音文本的方法及装置

Country Status (2)

Country Link
CN (1) CN104636323B (zh)
WO (1) WO2015067116A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109885823A (zh) * 2017-12-01 2019-06-14 武汉楚鼎信息技术有限公司 一种金融行业的分布式语义识别方法及***装置
WO2020018525A1 (en) 2018-07-17 2020-01-23 iT SpeeX LLC Method, system, and computer program product for an intelligent industrial assistant
CN111079435B (zh) * 2019-12-09 2021-04-06 深圳追一科技有限公司 命名实体消歧方法、装置、设备及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101454750A (zh) * 2006-03-31 2009-06-10 谷歌公司 命名实体的消歧
WO2009086312A1 (en) * 2007-12-21 2009-07-09 Kondadadi, Ravi, Kumar Entity, event, and relationship extraction
CN101510221A (zh) * 2009-02-17 2009-08-19 北京大学 一种用于信息检索的查询语句分析方法与***
US20120078918A1 (en) * 2010-09-28 2012-03-29 Siemens Corporation Information Relation Generation
CN102654866A (zh) * 2011-03-02 2012-09-05 北京百度网讯科技有限公司 例句索引创建方法和装置以及例句检索方法和装置
JP2013134625A (ja) * 2011-12-26 2013-07-08 Fujitsu Ltd 抽出装置、抽出プログラム、および抽出方法

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102831892B (zh) * 2012-09-07 2014-10-22 深圳市信利康电子有限公司 基于互联网语音交互的玩具控制方法及***
CN103021403A (zh) * 2012-12-31 2013-04-03 威盛电子股份有限公司 基于语音识别的选择方法及其移动终端装置及信息***

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101454750A (zh) * 2006-03-31 2009-06-10 谷歌公司 命名实体的消歧
WO2009086312A1 (en) * 2007-12-21 2009-07-09 Kondadadi, Ravi, Kumar Entity, event, and relationship extraction
CN101510221A (zh) * 2009-02-17 2009-08-19 北京大学 一种用于信息检索的查询语句分析方法与***
US20120078918A1 (en) * 2010-09-28 2012-03-29 Siemens Corporation Information Relation Generation
CN102654866A (zh) * 2011-03-02 2012-09-05 北京百度网讯科技有限公司 例句索引创建方法和装置以及例句检索方法和装置
JP2013134625A (ja) * 2011-12-26 2013-07-08 Fujitsu Ltd 抽出装置、抽出プログラム、および抽出方法

Also Published As

Publication number Publication date
CN104636323A (zh) 2015-05-20
CN104636323B (zh) 2018-04-03

Similar Documents

Publication Publication Date Title
WO2017206916A1 (zh) 处理器中内核运行配置的确定方法以及相关产品
CN105788612B (zh) 一种检测音质的方法和装置
WO2016119580A1 (zh) 一种开启终端的语音输入功能的方法、装置和终端
WO2015078293A1 (zh) 音效处理方法、装置、插件管理器及音效插件
CN106528545B (zh) 一种语音信息的处理方法及装置
CN105208056B (zh) 信息交互的方法及终端
TWI519999B (zh) 一種終端運行環境的優化方法及其裝置
KR101600999B1 (ko) 문자를 선택하는 방법, 장치, 단말기장치, 프로그램 및 기록매체
TWI522917B (zh) 應用程式的啓動控制方法與裝置及電腦可讀取儲存介質
WO2015000429A1 (zh) 智能选词的方法和装置
WO2015000430A1 (zh) 智能选词的方法和装置
WO2017206915A1 (zh) 处理器中内核运行配置的确定方法以及相关产品
WO2017215635A1 (zh) 一种音效处理方法及移动终端
US9921735B2 (en) Apparatuses and methods for inputting a uniform resource locator
CN104281568B (zh) 一种释义显示方法和装置
WO2019007414A1 (zh) 实现应用程序支持多语言的方法、存储设备及移动终端
WO2015172705A1 (en) Method and system for collecting statistics on streaming media data, and related apparatus
WO2017215661A1 (zh) 一种场景音效的控制方法、及电子设备
WO2015067142A1 (zh) 网页显示方法及装置
CN107885718B (zh) 语义确定方法及装置
WO2017128986A1 (zh) 多媒体菜单项的选择方法、装置及存储介质
WO2017206860A1 (zh) 移动终端的处理方法及移动终端
WO2015067116A1 (zh) 处理语音文本的方法及装置
CN111897916A (zh) 语音指令识别方法、装置、终端设备及存储介质
CN105159655B (zh) 行为事件的播放方法和装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14859369

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 05.10.2016)

122 Ep: pct application non-entry in european phase

Ref document number: 14859369

Country of ref document: EP

Kind code of ref document: A1