WO2024120450A1 - 语音交互方法、服务器及计算机可读存储介质 - Google Patents

语音交互方法、服务器及计算机可读存储介质 Download PDF

Info

Publication number
WO2024120450A1
WO2024120450A1 PCT/CN2023/136846 CN2023136846W WO2024120450A1 WO 2024120450 A1 WO2024120450 A1 WO 2024120450A1 CN 2023136846 W CN2023136846 W CN 2023136846W WO 2024120450 A1 WO2024120450 A1 WO 2024120450A1
Authority
WO
WIPO (PCT)
Prior art keywords
sentence
voice request
breakpoint position
breakpoint
user voice
Prior art date
Application number
PCT/CN2023/136846
Other languages
English (en)
French (fr)
Inventor
张熙康
赵耀
王天一
Original Assignee
广州小鹏汽车科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 广州小鹏汽车科技有限公司 filed Critical 广州小鹏汽车科技有限公司
Publication of WO2024120450A1 publication Critical patent/WO2024120450A1/zh

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60RVEHICLES, VEHICLE FITTINGS, OR VEHICLE PARTS, NOT OTHERWISE PROVIDED FOR
    • B60R16/00Electric or fluid circuits specially adapted for vehicles and not otherwise provided for; Arrangement of elements of electric or fluid circuits specially adapted for vehicles and not otherwise provided for
    • B60R16/02Electric or fluid circuits specially adapted for vehicles and not otherwise provided for; Arrangement of elements of electric or fluid circuits specially adapted for vehicles and not otherwise provided for electric constitutive elements
    • B60R16/037Electric or fluid circuits specially adapted for vehicles and not otherwise provided for; Arrangement of elements of electric or fluid circuits specially adapted for vehicles and not otherwise provided for electric constitutive elements for occupant comfort, e.g. for automatic adjustment of appliances according to personal settings, e.g. seats, mirrors, steering wheel
    • B60R16/0373Voice control
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Definitions

  • the present application relates to the field of vehicle-mounted voice technology, and in particular to a voice interaction method, a server, and a computer-readable storage medium.
  • in-vehicle voice technology can support users to interact in the vehicle cabin through voice, such as controlling vehicle parts or interacting with components in the in-vehicle system user interface.
  • voice such as controlling vehicle parts or interacting with components in the in-vehicle system user interface.
  • the user issues a voice request including multiple commands, only one of the intentions is understood. For example, for the voice request "turn on the air conditioner and close the windows”, only the intention of turning on the air conditioner is understood. This affects the fluency and convenience of voice interaction and the user experience is poor.
  • the present application provides a voice interaction method, a server and a computer-readable storage medium.
  • the voice interaction method of the present application includes:
  • Natural language understanding is performed based on the sentence segmentation results to complete voice interaction.
  • the voice request in the scenario where the user issues a voice request including multiple commands, the voice request can be segmented to obtain the breakpoint position and its corresponding confidence, and the segmentation result can be obtained based on this.
  • the segmentation result is further subjected to natural language understanding, and the vehicle is handed over to execute the control instructions corresponding to the intentions of each instruction in the user request, and finally the voice interaction is completed.
  • the voice interaction method of the present application can identify the various functional requirements in the voice request including multiple instructions issued by the user, understand all the intentions therein, and the user can express multiple intention requirements at one time through a voice request without issuing multiple voice requests to complete the voice interaction, making the user's expression more convenient, improving the convenience of voice interaction and the efficiency of using the vehicle system applets, and improving the user experience.
  • the performing sentence segmentation processing on the user voice request to obtain a breakpoint position and a confidence level corresponding to the breakpoint position includes:
  • the sentence to be processed is subjected to secondary segmentation processing to obtain the breakpoint position and the corresponding confidence level.
  • the initial breakpoint position and its corresponding confidence level can be determined according to the quantization processing result of the user voice request, and then the sentence to be processed can be determined, so as to perform secondary sentence segmentation processing on the sentence to be processed.
  • the determining, according to the initial breakpoint position and the first confidence level, the sentence to be processed of the user voice request includes:
  • a first sentence with the largest number of characters among the plurality of first sentences is determined as the sentence to be processed.
  • the sentence with the largest number of characters among the multiple sentences obtained according to the multiple initial breakpoints can be determined as the sentence to be processed for subsequent secondary sentence segmentation processing, thereby improving the recall rate of multiple sentences after segmentation.
  • the determining, according to the initial breakpoint position and the first confidence level, the sentence to be processed of the user voice request includes:
  • the two sentences obtained according to the initial breakpoint can be determined as sentences to be processed for subsequent secondary segmentation processing, thereby improving the recall rate of multiple sentences after segmentation.
  • the determining, according to the initial breakpoint position and the first confidence level, the sentence to be processed of the user voice request includes:
  • first confidences corresponding to the initial breakpoint positions in the initial breakpoint positions are all not greater than a first preset value, segmenting the user voice request according to the initial breakpoint position with the largest first confidence to obtain two first sentences;
  • the one with the highest confidence can be determined as the initial breakpoint, and the two sentences obtained according to the initial breakpoint can be determined as the sentences to be processed, so as to perform subsequent secondary segmentation processing and improve the recall rate of multiple sentences after segmentation.
  • the performing secondary segmentation processing on the sentence to be processed to obtain the breakpoint position and the corresponding confidence level includes:
  • the second sentence vector is segmented to obtain the breakpoint position and the confidence level corresponding to the breakpoint position.
  • the breakpoint positions and their corresponding confidence levels can be determined according to the quantization processing results of the text to be subjected to secondary segmentation processing, so that the segmentation results can be obtained according to the obtained breakpoint positions and their corresponding confidence levels.
  • the processing of the user voice request according to the breakpoint position and the confidence level to obtain a sentence segmentation result includes:
  • the entire parse tree is traversed from the root node of the parse tree to obtain the sentence segmentation result.
  • a parse tree for sentence component analysis of the user voice request can be established, and a Cartesian product can be performed based on the result of traversing the knot coefficients to obtain the sentence segmentation result.
  • the processing of the user voice request according to the breakpoint position and the confidence level to obtain a sentence segmentation result includes:
  • sentence component analysis is performed on the second sentence to establish a parse tree according to the analysis result;
  • the entire parse tree is traversed from the root node of the parse tree to obtain the sentence segmentation result.
  • a parse tree for sentence component analysis of the user voice request can be established, and a Cartesian product can be performed based on the result of traversing the knot coefficients to obtain the segmentation result.
  • the processing of the user voice request according to the breakpoint position and the confidence level to obtain a sentence segmentation result includes:
  • the user voice request is segmented according to the target breakpoint position to obtain the segmentation result.
  • the user voice request determined to be unbroken after secondary segmentation processing can be regularly matched with the preset category voice request to obtain a matching result, and the target breakpoint position can be determined in combination with the breakpoint position and its confidence, and the voice request can be segmented according to the target breakpoint position to obtain a segmentation result.
  • the processing of the user voice request according to the breakpoint position and the confidence level to obtain a sentence segmentation result includes:
  • the second sentence is subjected to regular matching with a preset category voice request to obtain a matching result
  • the user voice request is segmented according to the target breakpoint position to obtain the segmentation result.
  • the sentences obtained after the secondary segmentation of the user's voice request can be regularly matched with the preset category voice request to obtain a matching result, and the target breakpoint position can be determined in combination with the breakpoint position and its confidence.
  • the voice request can be segmented according to the target breakpoint position to obtain a segmentation result.
  • the determining the target breakpoint position according to the matching result, the breakpoint position and the confidence level includes:
  • the breakpoint position at the first or last digit of the sequence that matches the preset category voice request in the user voice request is determined to be the target breakpoint position.
  • the breakpoint position at the first or last position of the sequence of the preset category voice request can be matched according to the user voice request or the sentence obtained after the second segmentation processing.
  • the breakpoint position at the first or last position of the sequence of the preset category voice request can be determined as the target breakpoint position.
  • the server of the present application includes a processor and a memory, wherein a computer program is stored in the memory, and when the computer program is executed by the processor, the above method is implemented.
  • the computer-readable storage medium of the present application stores a computer program, and when the computer program is executed by one or more processors, the above method is implemented.
  • FIG1 is a flow chart of the voice interaction method of the present application.
  • FIG2 is a second flow chart of the voice interaction method of the present application.
  • FIG3 is a third flow chart of the voice interaction method of the present application.
  • FIG4 is a fourth flow chart of the voice interaction method of the present application.
  • FIG5 is a fifth flow chart of the voice interaction method of the present application.
  • FIG6 is a sixth flow chart of the voice interaction method of the present application.
  • FIG. 7 is a seventh flow chart of the voice interaction method of the present application.
  • FIG8 is a flowchart of the voice interaction method of the present application.
  • FIG9 is a ninth flowchart of the voice interaction method of the present application.
  • FIG10 is a flowchart of the voice interaction method of the present application.
  • FIG. 11 is a flow chart of the voice interaction method of the present application.
  • the present application provides a voice interaction method, including:
  • the present application also provides a server, which includes a memory and a processor.
  • the speech processing method of the present application can be implemented by the server of the present application.
  • a computer program is stored in the memory, and the processor is used to receive a user voice request forwarded by a vehicle, and to segment the user voice request to obtain a breakpoint position and a confidence level corresponding to the breakpoint position, and to process the user voice request according to the breakpoint position and the confidence level to obtain a segmentation result, and to perform natural language understanding based on the segmentation result to complete the speech interaction.
  • a user issues a voice request "turn on the air conditioner and close the windows".
  • the two intentions "turn on the air conditioner” and “close the windows” cannot be separated, only the intention “turn on the air conditioner” which is relatively earlier in the time sequence can be recognized.
  • the user needs to manually close the windows or issue a voice request of "close the windows” again to achieve all the needs.
  • This application can support the segmentation of voice requests that include multiple commands.
  • the user issues a voice request "turn on the air conditioner and close the car window”.
  • the output segmentation result is "turn on the air conditioner#close the car window”.
  • This solution can segment voice requests that include multiple commands according to different intentions, without the need for users to supplement the unrecognized voice request content.
  • the sentence segmentation process involved in this application is the process of separating a voice request issued by a user including multiple instructions into multiple sentences according to the voice.
  • the breakpoint position is the position that needs to be broken after the user's voice request is processed through sentence segmentation.
  • the user's voice request generally expresses different intentions before and after the breakpoint position.
  • the confidence level corresponding to the breakpoint position is used to characterize the credibility of the breakpoint position. The higher the confidence level, the higher the credibility of the breakpoint position, and the more obvious the difference in intent between the content before and after the breakpoint. In the above example, the user makes a voice request "turn on the air conditioner and close the windows".
  • the confidence of each breakpoint is as follows [1.0456085650090152e -11 , 1.758101442206339e -10 , 1.534110837496383e -12 , 6.312194655855308e -10 , 0.9969684481620789, 2.1762635096789218e -12 , 2.6937018154232772e -12 , 9.654541055503785e -12 ], and the confidence before the word "off" is the highest, which is most likely to be regarded as the initial breakpoint.
  • the segmentation result is the result of the user's voice request after the breakpoint is inserted, that is, "turn on the air conditioner # close the car window".
  • the server After receiving the user's voice request, the server can segment the voice request, obtain the breakpoint position and its corresponding confidence, and obtain the segmentation result accordingly. Furthermore, the segmentation result can be understood by natural language, and the vehicle can execute the control instructions corresponding to the intention of each instruction in the user's request to complete the voice interaction process.
  • the voice interaction method of the present application can make the voice request issued by the user including multiple instructions recognized by the vehicle voice assistant, and execute the functional requirements through the vehicle without the need for secondary supplementation through voice requests.
  • the voice request in the scenario where the user issues a voice request including multiple commands, can be segmented to obtain the breakpoint position and its corresponding confidence, and the segmentation result can be obtained based on this.
  • the segmentation result is further subjected to natural language understanding, and the vehicle is handed over to execute the control instructions corresponding to the intentions of each instruction in the user request, and finally the voice interaction is completed.
  • the voice interaction method of the present application can identify the various functional requirements in the voice request including multiple instructions issued by the user, understand all the intentions therein, and the user can express multiple intention requirements at one time through a voice request without issuing multiple voice requests to complete the voice interaction, making the user's expression more convenient, improving the convenience of voice interaction and the efficiency of using the vehicle system applets, and improving the user experience.
  • step 02 includes:
  • the processor is used to perform text vectorization processing on the user voice request to obtain a first sentence vector; perform sentence segmentation processing on the first sentence vector to obtain an initial breakpoint position and a first confidence level corresponding to the initial breakpoint position; determine a sentence to be processed of the user voice request based on the initial breakpoint position and the first confidence level; and perform secondary sentence segmentation processing on the sentence to be processed to obtain a breakpoint position and a corresponding confidence level.
  • the received user voice request can be subjected to text vectorization processing to obtain a first sentence vector, that is, each character is converted into a numerical value, and the sentence composed of the characters is converted into a sentence vector composed of the numerical values corresponding to the characters.
  • the first sentence vector can be used as the input of the sentence segmentation processing model.
  • the initial breakpoint position is determined, and the first confidence corresponding to the initial breakpoint position is obtained.
  • the initial breakpoint position is the breakpoint position obtained by performing sentence segmentation processing on the first sentence vector obtained after the text is vectorized.
  • the first confidence is a standard for judging the possibility of sentence segmentation at the initial breakpoint position.
  • the sentence to be processed of the user voice request can be determined and a secondary sentence segmentation process can be performed on it.
  • the voice request is vectorized to obtain the first sentence vector, and it is determined whether a breakpoint needs to be inserted before each input word. If a breakpoint needs to be added, the value is 1, and if a breakpoint is not added, the value is 0. Then, the vector obtained by sentence segmentation is (0,0,0,0,1,0,0,0), that is, the probability of adding a breakpoint before "off" is very high, and no breakpoint is added before other words.
  • the training of the sentence segmentation model can be supervised training based on the pre-trained Albert model, and the specific sentence segmentation model used and its training method are not limited here.
  • the initial breakpoint position and its corresponding confidence level can be determined according to the quantization processing result of the user voice request, and then the sentence to be processed can be determined, so as to perform secondary sentence segmentation processing on the sentence to be processed.
  • step 0203 includes:
  • 02032 Determine the first clause with the largest number of characters among the multiple first clauses as the clause to be processed.
  • the processor is used to segment the user voice request to obtain multiple first sentences according to the multiple initial breakpoint positions whose first confidences are greater than the first preset value if there are multiple initial breakpoint positions among the initial breakpoint positions whose corresponding first confidences are greater than the first preset value; and determine the first sentence with the largest number of characters among the multiple first sentences as the sentence to be processed.
  • the sentence to be processed of the user voice request can be determined according to the initial breakpoint position and the first confidence.
  • the user voice request can be segmented according to the multiple initial breakpoint positions with the first confidence greater than the first preset value, thereby obtaining multiple first sentences.
  • the first preset value can be any value between 0-1 that is closer to 1, such as 0.8, 0.9, 0.95, etc., which is not limited here.
  • the first sentence with the largest number of characters can be determined as the sentence to be processed, and subsequent secondary segmentation processing can be performed.
  • the user sends a voice request of "turn on the air conditioner, close the car windows, and play the song b by singer A".
  • the initial breakpoints in the user's voice request whose corresponding confidence is greater than the first preset value are before "off” and before “play”
  • the output text is "turn on the air conditioner#close the car windows#play the song b by singer A”.
  • the third first sentence "play the song b by singer A” is the first sentence with the largest number of characters, which is determined as the sentence to be processed and is subjected to secondary sentence segmentation processing in the subsequent process.
  • the sentence with the largest number of characters among the multiple sentences obtained according to the multiple initial breakpoints can be determined as the sentence to be processed for subsequent secondary sentence segmentation processing, thereby improving the recall rate of multiple sentences after segmentation.
  • step 0203 includes:
  • 02034 Determine the two first clauses as clauses to be processed.
  • the processor is used to segment the user voice request to obtain two first sentences according to the initial breakpoint position whose first confidence is greater than the first preset value if there is only one initial breakpoint position among the initial breakpoint positions whose corresponding first confidence is greater than the first preset value; and determine the two first sentences as sentences to be processed.
  • the sentence to be processed of the user voice request can be determined according to the initial breakpoint position and the first confidence.
  • the user voice request can be segmented according to multiple initial breakpoint positions with the first confidence greater than the first preset value to obtain two first sentences.
  • the first preset value can be any value between 0 and 1 that is closer to 1, such as 0.8, 0.9, 0.95, etc., which is not limited here.
  • both first sentences are determined as sentences to be processed, and subsequent secondary segmentation processing is performed without comparing the number of characters in the two sentences.
  • the user sends a voice request of "turn on the air conditioner and close the windows".
  • the first preset value is set to 0.9.
  • the corresponding confidence levels in the user's voice request are [1.0456085650090152e -11 , 1.758101442206339e -10 , 1.534110837496383e -12 , 6.312194655855308e -10 , 0.9969684481620789, 2.1762635096789218e -12 , 2.6937018154232772e -12 , 9.654541055503785e -12 ], the only initial breakpoint greater than 0.9 is before “off”, and the two sentences “turn on the air conditioner” and “close the car window” can be determined as the sentences to be processed, and the second sentence segmentation processing is performed in the subsequent process.
  • the two sentences obtained according to the initial breakpoint can be determined as sentences to be processed for subsequent secondary segmentation processing, thereby improving the recall rate of multiple sentences after segmentation.
  • step 0203 includes:
  • 02036 Determine the two first clauses as clauses to be processed.
  • the processor is used to segment the user voice request according to the initial breakpoint position with the largest first confidence to obtain two first sentences if the first confidences corresponding to the initial breakpoint positions in the initial breakpoint positions are not greater than the first preset value; and determine the two first sentences as sentences to be processed.
  • the sentence to be processed of the user voice request can be determined according to the initial breakpoint position and the first confidence.
  • the first confidence corresponding to the initial breakpoint position is greater than the first preset value.
  • the user voice request can be segmented according to the initial breakpoint position with the largest first confidence to obtain two first sentences.
  • the first preset value can be any value between 0-1, such as 0.8, 0.9, 0.95, etc., which is not limited here.
  • both first sentences are determined as sentences to be processed, and subsequent secondary segmentation processing is performed without comparing the number of characters in the two sentences.
  • the user makes a voice request of "open the air-conditioned car window", and the first preset value is set to 0.9.
  • the corresponding confidence levels in the user's voice request are [1.0456085650090152e -11 , 1.758101442206339e -10 , 1.534110837496383e -12 , 6.312194655855308e -10 , 0.6945016523749128, 2.1762635096789218e -12 , 2.6937018154232772e -12 , 9.654541055503785e -12 ], and it can be determined that there is no initial breakpoint higher than 0.9.
  • the word “car” with the highest confidence is determined as the initial breakpoint, and the two sentences “turn on the air conditioner” and “car window” obtained can be determined as the sentences to be processed so as to be further processed in the subsequent process.
  • the one with the highest confidence can be determined as the initial breakpoint, and the two sentences obtained according to the initial breakpoint can be determined as the sentences to be processed, so as to perform subsequent secondary segmentation processing and improve the recall rate of multiple sentences after segmentation.
  • step 0204 includes:
  • the processor is used to perform text vectorization processing on the sentence to be processed to obtain a second sentence vector, and to perform sentence segmentation processing on the second sentence vector to obtain breakpoint positions and confidence levels corresponding to the breakpoint positions.
  • the sentence to be processed obtained by the first sentence segmentation processing can be subjected to text vectorization processing to obtain a second sentence vector, that is, each character is converted into a numerical value, and the sentence composed of characters is converted into a sentence vector composed of numerical values corresponding to the characters.
  • the second sentence vector can be input into the sentence segmentation processing model, and the breakpoint position is determined by judging whether a breakpoint needs to be added before each input character, and the confidence corresponding to the breakpoint position can be obtained according to the breakpoint position.
  • the breakpoint position is obtained by sentence segmentation processing of the second sentence vector obtained after text vectorization of the sentence to be processed.
  • the confidence corresponding to the breakpoint position is a standard for judging the possibility of sentence segmentation at the breakpoint position.
  • the training of the sentence segmentation model can be supervised training based on the pre-trained Albert model, and the specific sentence segmentation model used and its training method are not limited here.
  • the breakpoint positions and their corresponding confidence levels can be determined according to the quantization processing results of the text to be subjected to secondary segmentation processing, so that the segmentation results can be obtained according to the obtained breakpoint positions and their corresponding confidence levels.
  • step 03 includes:
  • sentence component analysis is performed on the user voice request to establish a parse tree according to the analysis result;
  • 0303 Traverse the entire parse tree from the root node of the parse tree to obtain the segmentation result.
  • the processor is used to perform sentence component analysis on the user voice request if it is determined based on the breakpoint position and the confidence level that the user voice request is not a sentence break, so as to establish a parse tree based on the analysis results; traverse the associated nodes of the parse tree to perform a Cartesian product; and traverse the entire parse tree from the root node of the parse tree to obtain a sentence break result.
  • the voice request does not need to be segmented based on the breakpoint position and confidence level
  • the sentence components of the initial user voice request are analyzed, and a parse tree is established based on the analysis results.
  • the associated nodes of the parse tree are traversed to make a Cartesian product, and the entire parse tree is traversed from the root node of the parse tree to finally obtain the prop result, that is, the breakpoint position and confidence level.
  • the voice request sent by the user is "turn on the front passenger window and seat heating".
  • the voice request needs to be rewritten into two sentences, namely "turn on the front passenger window” and "turn on the front passenger seat heating” to accurately express the functional requirements.
  • the voice request sent by the above user is analyzed for sentence components, and the two clauses share the predicate "open” and the attributive "front passenger".
  • a sentence component parse tree for the above voice request "turn on the front passenger window and seat heating” is established, as shown in Figure 3.
  • the associated nodes can be traversed from the root node of the parse tree in pre-order, and the specific content of the sentence components can be filled in the corresponding nodes of the sentence component parse tree.
  • the traversal order of the parse tree can be changed according to the structure and content of the parse tree, which is not limited here.
  • each complete path corresponds to a rewritten sentence, and the two clauses that can be obtained in the end are "open the passenger window” and “turn on the passenger seat heating.”
  • a parse tree for sentence component analysis of the user voice request can be established, and a Cartesian product can be performed based on the result of traversing the knot coefficients to obtain the sentence segmentation result.
  • step 03 further includes:
  • the processor is used for, if the second sentence is obtained by segmenting the user voice request according to the breakpoint position and the confidence, then performing sentence component analysis on the second sentence to establish a parse tree according to the analysis result; traversing the associated nodes of the parse tree to make a Cartesian product; and traversing the entire parse tree from the root node of the parse tree to obtain the segmentation result.
  • the sentence components of the second sentence can be analyzed. Further, a parse tree is established based on the results of the sentence component analysis, and the associated nodes of the parse tree are traversed to make a Cartesian product, and the entire parse tree is traversed from the root node of the parse tree to finally obtain the segmentation result, that is, the breakpoint position and confidence level.
  • a parse tree for sentence component analysis of the user voice request can be established, and a Cartesian product can be performed based on the result of traversing the knot coefficients to obtain the segmentation result.
  • step 03 includes:
  • the processor is used to perform regular matching on the user voice request and the preset category voice request to obtain a matching result if it is determined based on the breakpoint position and the confidence level that the user voice request is not broken; determine the target breakpoint position based on the matching result, the breakpoint position and the confidence level; and break the user voice request based on the target breakpoint position to obtain a breakup result.
  • certain preset categories of voice requests that express special meanings can be set according to actual needs, and standard expressions of voice requests of this category can be set.
  • it can be a voice request for a singer on demand.
  • the user voice request is determined to be non-sentence-breaking according to the breakpoint position and confidence level during the secondary segmentation process, the user voice request is regularly matched with the preset category voice request to obtain a matching result, and the target breakpoint position is determined in combination with the breakpoint position and confidence level to segment the voice request, thereby obtaining a segmentation result.
  • the preset category voice request is a pre-set voice request that satisfies certain special meanings.
  • the above-mentioned regular matching process is the process of determining whether the voice request issued by the user hits the preset category voice request.
  • the voice request is matched with the preset category voice request, and the matching result is that the preset category voice request "Play singer A's songs” is hit.
  • the target breakpoint position can be determined.
  • the voice request is segmented according to the target breakpoint position to obtain the segmentation result of "Play singer A's songs#Turn on the air conditioner".
  • the user voice request determined to be unbroken after secondary segmentation processing can be regularly matched with the preset category voice request to obtain a matching result, and the target breakpoint position can be determined in combination with the breakpoint position and its confidence, and the voice request can be segmented according to the target breakpoint position to obtain a segmentation result.
  • step 03 further includes:
  • the processor is used for, if a second sentence is obtained by segmenting the user voice request according to the breakpoint position and the confidence level, then regular matching the second sentence with a preset category voice request to obtain a matching result; determining a target breakpoint position according to the matching result, the breakpoint position and the confidence level; and segmenting the user voice request according to the target breakpoint position to obtain a segmentation result.
  • certain preset categories of voice requests that express special meanings can be set according to actual needs, and standard expressions of voice requests of this category can be set.
  • it can be a voice request for singers on demand.
  • the user's voice request obtains a second sentence including the content requested by the singer after secondary segmentation processing based on the breakpoint position and confidence
  • the second sentence including the singer's request is regularly matched with the preset category voice request to obtain a matching result
  • the target breakpoint position is determined in combination with the breakpoint position and confidence to segment the voice request, thereby obtaining a segmentation result.
  • the preset category voice request is a pre-set voice request that satisfies certain special meanings.
  • the above-mentioned regular matching process is the process of determining whether the voice request issued by the user hits the preset category voice request.
  • the second sentence is "Play singer A's songs and turn on the air conditioner" after secondary segmentation processing based on the breakpoint position and confidence
  • the second sentence is regularly matched with the preset category voice request, and the matching result is that the preset category voice request "Play singer A's songs” is hit.
  • the target breakpoint position can be determined.
  • the voice request is segmented according to the target breakpoint position to obtain the segmentation result of "Play singer A's songs#Turn on the air conditioner".
  • the sentences obtained after the secondary segmentation of the user's voice request can be regularly matched with the preset category voice request to obtain a matching result, and the target breakpoint position can be determined in combination with the breakpoint position and its confidence.
  • the voice request can be segmented according to the target breakpoint position to obtain a segmentation result.
  • Step 0308 includes:
  • the breakpoint position at the first or last digit of the sequence that matches the preset category voice request in the user voice request is determined to be the target breakpoint position.
  • the processor is used to determine that the breakpoint position of the first or last digit of the sequence that matches the preset category voice request in the user voice request is the target breakpoint position if the matching result is a match to a preset category voice request, and the confidence of the breakpoint position of the first or last digit of the sequence that matches the preset category voice request in the user voice request is greater than a second preset value.
  • the preset category voice request content exists in the user voice request
  • the content of the preset category voice request in the sentence needs to be disconnected from the rest of the content.
  • the voice request is matched with the preset category voice request, and the matching result is that the preset category voice request "play singer A's songs” is hit.
  • the confidence of the breakpoint position of the tail position of the sequence "play singer A's songs” that matches the preset category voice request in the user voice request is greater than the second preset value, and the tail position of "play singer A's songs” in the user voice request, that is, the breakpoint position before the word "hit” is determined as the target breakpoint position.
  • the second preset value is any value between 0 and 1 that is closer to 1, such as 0.8, 0.9, 0.95, etc., which is not limited here.
  • the voice request is segmented according to the target breakpoint position to obtain the segmentation result of "play singer A's songs#turn on the air conditioner".
  • the preset category voice request is still "play songs by singer A” and "I want to listen to songs by singer A”.
  • the voice request sent by the user or the sentence obtained after the second segmentation process is to select a specific song of the singer, that is, "play song b by singer A”
  • the voice request or the voice request sentence is regularly matched with the preset category voice request, and the matching result is that the preset category voice request "play songs by singer A” is hit.
  • the confidence of the breakpoint position of the tail position of the sequence "play songs by singer A" that matches the preset category voice request in the user's voice request is less than the second preset value, no breakpoint is added before the first letter of the song name, and no segmentation is required.
  • the final segmentation result is "play song b by singer A”.
  • the breakpoint position at the first or last position of the sequence of the preset category voice request can be matched according to the user voice request or the sentence obtained after the second segmentation processing.
  • the breakpoint position at the first or last position of the sequence of the preset category voice request can be determined as the target breakpoint position.
  • the result after the second segmentation processing needs to undergo component analysis and regular matching processing to determine whether there is a need for further segmentation, so as to perform a second processing on the segmentation processing result to obtain the final segmentation result, which is more accurate.
  • the computer-readable storage medium of the present application stores a computer program, and when the computer program is executed by one or more processors, the above method is implemented.
  • Any process or method description in a flowchart or otherwise described herein may be understood to represent a module, fragment or portion of code that includes one or more executable requests for implementing steps of a specific logical function or process, and the scope of the preferred embodiments of the present application includes alternative implementations in which functions may not be performed in the order shown or discussed, including performing functions in a substantially simultaneous manner or in the reverse order depending on the functions involved, which should be understood by technicians in the technical field to which the embodiments of the present application belong.

Landscapes

  • Engineering & Computer Science (AREA)
  • Mechanical Engineering (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Machine Translation (AREA)

Abstract

本申请公开了一种语音交互方法,该语音交互方法包括:接收车辆转发的用户语音请求;对用户语音请求进行断句处理得到断点位置和断点位置对应的置信度;根据断点位置和置信度对用户语音请求进行处理,得到断句结果;根据断句结果进行自然语言理解以完成语音交互。

Description

语音交互方法、服务器及计算机可读存储介质
本申请要求于2022年12月6日申请的、申请号为202211558626.5的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及车载语音技术领域,特别涉及一种语音交互方法、服务器及计算机可读存储介质。
背景技术
目前,车载语音技术可以支持用户通过语音在车辆座舱内进行交互,例如控制车辆零部件或与车载***用户界面中的组件进行交互。在用户发出包括多个指令的语音请求的场景中,仅支持其中一个意图的理解,例如,对于语音请求“打开空调关闭车窗”,只能理解其中打开空调的意图。如此,影响语音交互的流畅性和便捷性,用户使用体验不佳。
技术问题
本申请提供了一种语音交互方法、服务器及计算机可读存储介质。
技术解决方案
本申请的语音交互方法,包括:
接收车辆转发的用户语音请求;
对所述用户语音请求进行断句处理得到断点位置和所述断点位置对应的置信度;
根据所述断点位置和所述置信度对所述用户语音请求进行处理,得到断句结果;
根据所述断句结果进行自然语言理解以完成语音交互。
如此,本申请中,在用户发出的包括多个指令的语音请求的场景下,可对语音请求进行断句处理,得到断点位置及其对应的置信度,据此得到断句结果,并进一步将断句结果进行自然语言理解,交由车辆执行与用户请求中各个指令的意图相对应的控制指令,最终完成语音交互。本申请的语音交互方法可识别用户发出的包括多个指令的语音请求中的各项功能需求,理解其中所有的意图,用户可通过一条语音请求一次性表达多个意图需求,而无需发出多条语音请求,即可完成语音交互,使得用户的表达更方便,可提高语音交互的便捷性和车载***小程序的使用效率,改善用户体验。
所述对所述用户语音请求进行断句处理得到断点位置和所述断点位置对应的置信度,包括:
对所述用户语音请求进行文本向量化处理得到第一语句向量;
对所述第一语句向量进行断句处理得到初始断点位置和所述初始断点位置对应的第一置信度;
根据所述初始断点位置和所述第一置信度确定所述用户语音请求的待处理分句;
对所述待处理分句进行二次断句处理得到所述断点位置和对应的置信度。
如此,可根据用户语音请求的量化处理结果确定初始断点位置及其对应的置信度,进而确定待处理分句,以便对待处理分句进行二次断句处理。
所述根据所述初始断点位置和所述第一置信度确定所述用户语音请求的待处理分句,包括:
若所述初始断点位置中存在多个所述初始断点位置对应的所述第一置信度大于第一预设值,则根据所述第一置信度大于所述第一预设值的多个所述初始断点位置,对所述用户语音请求进行断句得到多个第一分句;
确定多个所述第一分句中字符数量最多的第一分句为所述待处理分句。
如此,当首次断句得到多个对应置信度大于预设值的初始断点时,可将根据多个初始断点得到的多个分句中,字符数量最多的确定为待处理分句,以进行后续二次断句处理,提升断句后多分句的召回率。
所述根据所述初始断点位置和所述第一置信度确定所述用户语音请求的待处理分句,包括:
若所述初始断点位置中只存在一个所述初始断点位置对应的所述第一置信度大于第一预设值,则根据所述第一置信度大于所述第一预设值的所述初始断点位置,对所述用户语音请求进行断句得到两个第一分句;
确定两个所述第一分句为所述待处理分句。
如此,当首次断句得到一个对应置信度大于预设值的初始断点时,可将根据该初始断点得到的两个分句确定为待处理分句,以进行后续二次断句处理,提升断句后多分句的召回率。
所述根据所述初始断点位置和所述第一置信度确定所述用户语音请求的待处理分句,包括:
若所述初始断点位置中所述初始断点位置对应的所述第一置信度均不大于第一预设值,则根据所述第一置信度最大的所述初始断点位置对所述用户语音请求进行断句得到两个第一分句;
确定两个所述第一分句为所述待处理分句。
如此,当首次断句过程中不存在对应置信度大于预设值的初始断点时,可将置信度最高的确定为初始断点,并将根据初始断点得到的两个分句确定为待处理分句,以进行后续二次断句处理,提升断句后多分句的召回率。
所述对所述待处理分句进行二次断句处理得到所述断点位置和对应的置信度,包括:
对所述待处理分句进行文本向量化处理得到第二语句向量;
对所述第二语句向量进行断句处理得到所述断点位置和所述断点位置对应的所述置信度。
如此,可根据需进行二次断句处理的文本的量化处理结果确定断点位置及其对应的置信度,以便根据得到的断点位置及其对应的置信度得到断句结果。
所述根据所述断点位置和所述置信度对所述用户语音请求进行处理,得到断句结果,包括:
若根据所述断点位置和所述置信度确定所述用户语音请求不断句,则对所述用户语音请求进行句子成分分析,以根据分析结果建立解析树;
遍历所述解析树的关联节点做笛卡尔积;
从所述解析树的根节点遍历整个所述解析树,以得到所述断句结果。
如此,对于经断句处理后确定为不断句的用户语音请求,可建立对用户语音请求进行句子成分分析的解析树,并根据遍历结系数的结果做笛卡尔积,以便得到断句结果。
所述根据所述断点位置和所述置信度对所述用户语音请求进行处理,得到断句结果,包括:
若根据所述断点位置和所述置信度对所述用户语音请求进行断句得到第二分句,则对所述第二分句进行句子成分分析,以根据分析结果建立解析树;
遍历所述解析树的关联节点做笛卡尔积;
从所述解析树的根节点遍历整个所述解析树,以得到所述断句结果。
如此,对于用户语音请求经二次断句处理后得到的分句,可建立对用户语音请求进行句子成分分析的解析树,并根据遍历结系数的结果做笛卡尔积,以便得到断句结果。
所述根据所述断点位置和所述置信度对所述用户语音请求进行处理,得到断句结果,包括:
若根据所述断点位置和所述置信度确定所述用户语音请求不断句,将所述用户语音请求与预设类别语音请求进行正则匹配得到匹配结果;
根据所述匹配结果、所述断点位置和所述置信度确定目标断点位置;
根据所述目标断点位置对所述用户语音请求进行断句得到所述断句结果。
如此,可将经二次断句处理后确定为不断句的用户语音请求与预设类别语音请求进行正则匹配得到匹配结果,结合断点位置及其置信度确定目标断点位置,并可根据目标断点位置对语音请求进行断句,得到断句结果。
所述根据所述断点位置和所述置信度对所述用户语音请求进行处理,得到断句结果,包括:
若根据所述断点位置和所述置信度对所述用户语音请求进行断句得到第二分句,则将所述第二分句与预设类别语音请求进行正则匹配得到匹配结果;
根据所述匹配结果、所述断点位置和所述置信度确定目标断点位置;
根据所述目标断点位置对所述用户语音请求进行断句得到所述断句结果。
如此,可将用户语音请求经二次断句处理后得到的分句与预设类别语音请求进行正则匹配得到匹配结果,结合断点位置及其置信度确定目标断点位置,并可根据目标断点位置对语音请求进行断句,得到断句结果。
所述根据所述匹配结果、所述断点位置和所述置信度确定目标断点位置,包括:
若所述匹配结果为匹配到所述预设类别语音请求,且根据所述用户语音请求中匹配到所述预设类别语音请求的序列首位或尾位的断点位置的置信度大于第二预设值,则确定所述用户语音请求中匹配到所述预设类别语音请求的序列首位或尾位的断点位置为所述目标断点位置。
如此,可根据用户语音请求或经过二次断句处理后得到的分句中匹配到预设类别语音请求的序列首位或尾位的断点位置,判断需要断开时可将预设类别语音请求的序列首位或尾位的断点位置确定为目标断点位置。
本申请的服务器,包括处理器和存储器,所述存储器中存储有计算机程序,所述计算机程序被所述处理器执行时,实现上述的方法。
本申请的计算机可读存储介质,存储有计算机程序,当所述计算机程序被一个或多个处理器执行时,实现上述的方法。
本申请的实施方式的附加方面和优点将在下面的描述中部分给出,部分将从下面的描述中变得明显,或通过本申请的实施方式的实践了解到。
附图说明
本申请的上述和/或附加的方面和优点从结合下面附图对实施方式的描述中将变得明显和容易理解,其中:
图1是本申请语音交互方法的流程示意图之一;
图2是本申请语音交互方法的流程示意图之二;
图3是本申请语音交互方法的流程示意图之三;
图4是本申请语音交互方法的流程示意图之四;
图5是本申请语音交互方法的流程示意图之五;
图6是本申请语音交互方法的流程示意图之六;
图7是本申请语音交互方法的流程示意图之七;
图8是本申请语音交互方法的流程示意图之八;
图9是本申请语音交互方法的流程示意图之九;
图10是本申请语音交互方法的流程示意图之十;
图11是本申请语音交互方法的流程示意图之十一。
本发明的实施方式
下面详细描述本申请的实施方式,实施方式的示例在附图中示出,其中,相同或类似的标号自始至终表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施方式是示例性的,仅用于解释本申请的实施方式,而不能理解为对本申请的实施方式的限制。
请参阅图1,本申请提供一种语音交互方法,包括:
01:接收车辆转发的用户语音请求;
02:对用户语音请求进行断句处理得到断点位置和断点位置对应的置信度;
03:根据断点位置和置信度对用户语音请求进行处理,得到断句结果;
04:根据断句结果进行自然语言理解以完成语音交互。
本申请还提供了一种服务器,服务器包括存储器和处理器。本申请的语音处理方法可以由本申请的服务器实现。具体地,存储器中存储有计算机程序,处理器用于接收车辆转发的用户语音请求,并对用户语音请求进行断句处理得到断点位置和断点位置对应的置信度,以及根据断点位置和置信度对用户语音请求进行处理,得到断句结果,并根据断句结果进行自然语言理解以完成语音交互。
随着车辆电子技术的发展与普及,车辆可与用户进行语音交互,即可识别用户的语音请求并最终完成用户语音请求中的意图。人车语音交互功能满足了驾驶员和乘客行驶过程中的多样体验。然而,随着用户使用语音交互功能次数的增加,用户可能倾向于发出的包括多个指令的语音请求。相关技术中,无法将包括多个指令的语音请求切分成多个意图进行识别,则对于多意图的语音请求来说,仅有一项意图能够被识别并执行,严重影响语音交互过程的便捷性,导致用户体验不佳。
例如,在一个示例中,用户发出语音请求“打开空调关闭车窗”。相关技术中,由于无法将两个意图“打开空调”和“关闭车窗”分开,仅能识别时序上相对靠前的意图“打开空调”,在执行“打开空调”的动作后,无法识别并执行“关闭车窗”导致语音交互过程不能顺利进行,用户体验不佳。此时用户需要手动关闭车窗或再次发出“关闭车窗”的语音请求以实现全部需求。
本申请可支持对于包括多个指令的语音请求的断句处理。如图2所示,在上述示例中,用户发出语音请求“打开空调关闭车窗”,经断句处理后,输出断句结果为“打开空调#关闭车窗”。此方案可实现对包括多个指令的语音请求按不同意图进行断句,无需用户对未识别到的语音请求内容进行二次补充。
本申请中涉及的断句处理,是将用户发出的包括多个指令的语音请求根据语音分开得到多个分句的过程。其中,断点位置是用户语音请求经过断句处理后得到的需要断开的位置,用户的语音请求在断点位置前后一般表达不同意图。断点位置对应的置信度用于表征断点位置的可信程度,置信度越高断点位置可信程度越高,该断点前后内容的意图差异越明显。在上述示例中,用户发出语音请求“打开空调关闭车窗”,每个断点的置信度如下[1.0456085650090152e -11, 1.758101442206339e -10, 1.534110837496383e -12, 6.312194655855308e -10, 0.9969684481620789, 2.1762635096789218e -12, 2.6937018154232772e -12, 9.654541055503785e -12],“关”字前对应的置信度最高,最有可能被视为初始断点。断句结果即为***断点后用户语音请求得到的结果,即“打开空调#关闭车窗”。
服务器在接收到用户的语音请求后,可对语音请求进行断句处理,得到断点位置及其对应的置信度,并据此得到断句结果。进一步地,可将断句结果进行自然语言理解,交由车辆执行与用户请求中各个指令的意图相对应的控制指令,完成语音交互过程。本申请的语音交互方法,可以使用户发出的包括多个指令的语音请求均被车辆语音助手识别,并通过车辆执行功能需求,无需通过语音请求进行二次补充。
综上,本申请中,在用户发出的包括多个指令的语音请求的场景下,可对语音请求进行断句处理,得到断点位置及其对应的置信度,据此得到断句结果,并进一步将断句结果进行自然语言理解,交由车辆执行与用户请求中各个指令的意图相对应的控制指令,最终完成语音交互。本申请的语音交互方法可识别用户发出的包括多个指令的语音请求中的各项功能需求,理解其中所有的意图,用户可通过一条语音请求一次性表达多个意图需求,而无需发出多条语音请求,即可完成语音交互,使得用户的表达更方便,可提高语音交互的便捷性和车载***小程序的使用效率,改善用户体验。
请参阅图2,步骤02包括:
0201:对用户语音请求进行文本向量化处理得到第一语句向量;
0202:对第一语句向量进行断句处理得到初始断点位置和初始断点位置对应的第一置信度;
0203:根据初始断点位置和第一置信度确定用户语音请求的待处理分句;
0204:对待处理分句进行二次断句处理得到断点位置和对应的置信度。
处理器用于对用户语音请求进行文本向量化处理得到第一语句向量;对第一语句向量进行断句处理得到初始断点位置和初始断点位置对应的第一置信度;根据初始断点位置和第一置信度确定用户语音请求的待处理分句;对待处理分句进行二次断句处理得到断点位置和对应的置信度。
具体地,可对接收到的用户语音请求进行文本向量化处理得到第一语句向量,即将各字符转化为数值,字符组成的语句,即转化为字符对应的数值组成语句向量。第一语句向量可以作为断句处理模型的输入,通过判断各输入字符之前是否需要加入断点,确定初始断点位置,得到初始断点位置所对应的第一置信度。初始断点位置是在对文本向量化后,对得到的第一语句向量进行断句处理得到的断点位置。第一置信度是对初始断点位置处断句的可能性进行判断的标准。进一步地,根据初始断点位置和第一置信度可确定用户语音请求的待处理分句并对其进行二次断句处理。
在一个示例中,用户发出语音请求为“打开空调关闭车窗”时,将语音请求向量化得到第一语句向量,判断每个输入字前是否需要***断点,需要加入断点处值为1,不需加入断点处值为0,则通过断句处理得向量为(0,0,0,0,1,0,0,0),即在“关”之前添加断点的概率极大,而其他字前不添加断点。其中,断句模型的训练可为基于预训练Albert模型的监督式训练,具体使用的断句模型及其训练方式在此不作限定。
如此,可根据用户语音请求的量化处理结果确定初始断点位置及其对应的置信度,进而确定待处理分句,以便对待处理分句进行二次断句处理。
请参阅图3,步骤0203包括:
02031:若初始断点位置中存在多个初始断点位置对应的第一置信度大于第一预设值,则根据第一置信度大于第一预设值的多个初始断点位置,对用户语音请求进行断句得到多个第一分句;
02032:确定多个第一分句中字符数量最多的第一分句为待处理分句。
处理器用于若初始断点位置中存在多个初始断点位置对应的第一置信度大于第一预设值,则根据第一置信度大于第一预设值的多个初始断点位置,对用户语音请求进行断句得到多个第一分句;确定多个第一分句中字符数量最多的第一分句为待处理分句。
具体地,可根据初始断点位置和第一置信度确定用户语音请求的待处理分句。在确定断点的过程中,当存在多个初始断点位置对应的第一置信度大于第一预设值时,可根据第一置信度大于第一预设值的多个初始断点位置,对用户语音请求进行断句,从而得到多个第一分句。其中第一预设值的选取可为0-1之间更接近1的任意值,例如0.8、0.9、0.95等,在此不作限定。进一步地,在多个初始断点位置进行断句后得到的多个第一分句中,可将字符数量最多的第一分句确定为待处理分句,进行后续二次断句处理。
在一个示例中,用户发出语音请求为“打开空调关闭车窗播放歌手A的歌曲b”,处理后得到用户语音请求中对应置信度大于第一预设值的初始断点为“关”前和“播”前,输出文本为“打开空调#关闭车窗#播放歌手A的歌曲b”。其中,第三个第一分句“播放歌手A的歌曲b”是字符数量最多的第一分句,将其确定为待处理分句,在后续过程中进行二次断句处理。
如此,当首次断句得到多个对应置信度大于预设值的初始断点时,可将根据多个初始断点得到的多个分句中,字符数量最多的确定为待处理分句,以进行后续二次断句处理,提升断句后多分句的召回率。
请参阅图4,步骤0203包括:
02033:若初始断点位置中只存在一个初始断点位置对应的第一置信度大于第一预设值,则根据第一置信度大于第一预设值的初始断点位置,对用户语音请求进行断句得到两个第一分句;
02034:确定两个第一分句为待处理分句。
处理器用于若初始断点位置中只存在一个初始断点位置对应的第一置信度大于第一预设值,则根据第一置信度大于第一预设值的初始断点位置,对用户语音请求进行断句得到两个第一分句;确定两个第一分句为待处理分句。
具体地,可根据初始断点位置和第一置信度确定用户语音请求的待处理分句。在确定断点的过程中,如果只存在一个初始断点位置对应的第一置信度大于第一预设值,可根据第一置信度大于第一预设值的多个初始断点位置,对用户语音请求进行断句,得到两个第一分句。其中第一预设值的选取可为0-1之间更接近1的任意值,例如0.8、0.9、0.95等,在此不作限定。进一步地,将两个第一分句均确定为待处理分句,进行后续二次断句处理,而无需比较两个分句的字符数量。
在一个示例中,用户发出语音请求为“打开空调关闭车窗”,设第一预设值为0.9,处理后得到用户语音请求中对应置信度分别为 [1.0456085650090152e -11, 1.758101442206339e -10, 1.534110837496383e -12, 6.312194655855308e -10, 0.9969684481620789, 2.1762635096789218e -12, 2.6937018154232772e -12, 9.654541055503785e -12],可得到大于0.9的唯一初始断点为“关”前,可将得到的两个分句“打开空调”和“关闭车窗”确定为待处理分句,在后续过程中进行二次断句处理。
如此,当首次断句得到一个对应置信度大于预设值的初始断点时,可将根据该初始断点得到的两个分句确定为待处理分句,以进行后续二次断句处理,提升断句后多分句的召回率。
请参阅图5,步骤0203包括:
02035:若初始断点位置中初始断点位置对应的第一置信度均不大于第一预设值,则根据第一置信度最大的初始断点位置对用户语音请求进行断句得到两个第一分句;
02036:确定两个第一分句为待处理分句。
处理器用于若初始断点位置中初始断点位置对应的第一置信度均不大于第一预设值,则根据第一置信度最大的初始断点位置对用户语音请求进行断句得到两个第一分句;确定两个第一分句为待处理分句。
具体地,可根据初始断点位置和第一置信度确定用户语音请求的待处理分句。在确定断点的过程中,可能不存在初始断点位置对应的第一置信度大于第一预设值的情况。在这种情况下,可根据第一置信度最大的初始断点位置,对用户语音请求进行断句,得到两个第一分句。其中第一预设值的选取可为0-1之间的任意值,例如0.8、0.9、0.95等,在此不作限定。进一步地,将两个第一分句均确定为待处理分句,进行后续二次断句处理,而无需比较两个分句的字符数量。
在一个示例中,用户发出语音请求为“打开空调车窗”,设第一预设值为0.9,处理后得到用户语音请求中对应置信度分别为 [1.0456085650090152e -11, 1.758101442206339e -10, 1.534110837496383e -12, 6.312194655855308e -10, 0.6945016523749128, 2.1762635096789218e -12, 2.6937018154232772e -12, 9.654541055503785e -12],可确定不存在高于0.9的初始断点。则将置信度最高的“车”前确定为初始断点,可将得到的两个分句“打开空调”和“车窗”确定为待处理分句,以便在后续过程中进一步处理。
如此,当首次断句过程中不存在对应置信度大于预设值的初始断点时,可将置信度最高的确定为初始断点,并将根据初始断点得到的两个分句确定为待处理分句,以进行后续二次断句处理,提升断句后多分句的召回率。
请参阅图6,步骤0204包括:
02041:对待处理分句进行文本向量化处理得到第二语句向量;
02042:对第二语句向量进行断句处理得到断点位置和断点位置对应的置信度。
处理器用于对待处理分句进行文本向量化处理得到第二语句向量,并对第二语句向量进行断句处理得到断点位置和断点位置对应的置信度。
具体地,可对第一次断句处理得到的待处理分句进行文本向量化处理得到第二语句向量,即将各字符转化为数值,字符组成的语句,即转化为字符对应的数值组成语句向量。第二语句向量可以输入断句处理模型中,通过判断各输入字符之前是否需要加入断点,确定断点位置,并可根据断点位置得到断点位置所对应的置信度。断点位置是待处理分句进行文本向量化后得到的第二语句向量进行断句处理所得。断点位置对应的置信度是对断点位置处断句的可能性进行判断的标准。其中,断句模型的训练可为基于预训练Albert模型的监督式训练,具体使用的断句模型及其训练方式在此不作限定。
如此,可根据需进行二次断句处理的文本的量化处理结果确定断点位置及其对应的置信度,以便根据得到的断点位置及其对应的置信度得到断句结果。
请参阅图7及图8,步骤03包括:
0301:若根据断点位置和置信度确定用户语音请求不断句,则对用户语音请求进行句子成分分析,以根据分析结果建立解析树;
0302:遍历解析树的关联节点做笛卡尔积;
0303:从解析树的根节点遍历整个解析树,以得到断句结果。
处理器用于若根据断点位置和置信度确定用户语音请求不断句,则对用户语音请求进行句子成分分析,以根据分析结果建立解析树;遍历解析树的关联节点做笛卡尔积;从解析树的根节点遍历整个解析树,以得到断句结果。
具体地,在对某条用户语音请求进行了两次断句处理后,根据断点位置和置信度确定该语音请求不需要断句,则对最初的用户语音请求进行句子成分的分析,并根据分析结果建立解析树。进一步地,遍历解析树的关联节点做笛卡尔积,从解析树的根节点遍历整个解析树,最终得到道具结果,即断点位置和置信度。
在一个示例中,用户发出的语音请求为“打开副驾车窗和座椅加热”,通过两轮断句处理后,判断上述语音请求不需要断句,但从语义上来看,该语音请求需要改写为两个句子,即“打开副驾车窗”和“打开副驾座椅加热”,才能准确表达功能需求。对上述用户发出的语音请求进行句子成分分析,两个子句共用谓语“打开”和定语“副驾”。建立上述语音请求“打开副驾车窗和座椅加热”的句子成分解析树,如图3所示,可按前序遍历从解析树的根节点遍历关联节点,将句子成分的具体内容填入句子成分解析树的相应节点中。解析树的遍历顺序可根据解析树的结构以及内容改变,在此不作限定。最后,在其中连词“和”处做笛卡尔积,如表达式I所示:
(I)
根据句子成分分析结果的笛卡尔积,在图3的解析树中,每一条完整路径对应一个改写后的句子,最终可得到的两个分句为“打开副驾车窗”和“打开副驾座椅加热”。
如此,对于经断句处理后确定为不断句的用户语音请求,可建立对用户语音请求进行句子成分分析的解析树,并根据遍历结系数的结果做笛卡尔积,以便得到断句结果。
请参阅图7及图9,步骤03还包括:
0304:若根据断点位置和置信度对用户语音请求进行断句得到第二分句,则对第二分句进行句子成分分析,以根据分析结果建立解析树;
0305:遍历解析树的关联节点做笛卡尔积;
0306:从解析树的根节点遍历整个解析树,以得到断句结果。
处理器用于若根据断点位置和置信度对用户语音请求进行断句得到第二分句,则对第二分句进行句子成分分析,以根据分析结果建立解析树;遍历解析树的关联节点做笛卡尔积;从解析树的根节点遍历整个解析树,以得到断句结果。
具体地,在对某条用户语音请求进行了两次断句处理后,根据断点位置和置信度,确定得到的第二分句不需要断句,则可对该第二分句进行句子成分的分析。进一步地,根据句子成分分析的结果建立解析树,并遍历解析树的关联节点做笛卡尔积,从解析树的根节点遍历整个解析树,最终得到断句结果,即断点位置和置信度。
如此,对于用户语音请求经二次断句处理后得到的分句,可建立对用户语音请求进行句子成分分析的解析树,并根据遍历结系数的结果做笛卡尔积,以便得到断句结果。
请参阅图10,步骤03包括:
0307:若根据断点位置和置信度确定用户语音请求不断句,将用户语音请求与预设类别语音请求进行正则匹配得到匹配结果;
0308:根据匹配结果、断点位置和置信度确定目标断点位置;
0309:根据目标断点位置对用户语音请求进行断句得到断句结果。
处理器用于若根据断点位置和置信度确定用户语音请求不断句,将用户语音请求与预设类别语音请求进行正则匹配得到匹配结果;根据匹配结果、断点位置和置信度确定目标断点位置;根据目标断点位置对用户语音请求进行断句得到断句结果。
具体地,针对某些特定的语音交互场景,可根据实际需求设置某些表达特殊含义的预设类别语音请求,并可设置该类别语音请求的标准表达。例如,可以是歌手点播的语音请求。当用户语音请求在二次断句处理过程中,根据断点位置和置信度确定为不断句,则将用户语音请求与该预设类别语音请求进行正则匹配,得到匹配结果,并结合断点位置和置信度,确定目标断点位置,以对语音请求进行断句,从而得到断句结果。其中,预设类别语音请求即为预先设置的满足某些表达特殊含义的语音请求。上述正则匹配过程即为判断用户发出的语音请求是否命中预设类别语音请求的过程。
在一个示例中,存在歌手点播场景。用户发出语音请求“播放歌手A的歌曲打开空调”若经二次断句处理后判断为不断句,则将该语音请求与预设类别语音请求进行正则匹配,匹配结果为命中预设类别语音请求“播放歌手A的歌曲”。结合上述匹配结果,以及用户语音请求中的断点位置和置信度,可确定目标断点位置。最后根据目标断点位置对语音请求进行断句得到断句结果为“播放歌手A的歌曲#打开空调”。
如此,可将经二次断句处理后确定为不断句的用户语音请求与预设类别语音请求进行正则匹配得到匹配结果,结合断点位置及其置信度确定目标断点位置,并可根据目标断点位置对语音请求进行断句,得到断句结果。
请参阅图11,步骤03还包括:
0310:若根据断点位置和置信度对用户语音请求进行断句得到第二分句,则将第二分句与预设类别语音请求进行正则匹配得到匹配结果;
0311:根据匹配结果、断点位置和置信度确定目标断点位置;
0312:根据目标断点位置对用户语音请求进行断句得到断句结果。
处理器用于若根据断点位置和置信度对用户语音请求进行断句得到第二分句,则将第二分句与预设类别语音请求进行正则匹配得到匹配结果;根据匹配结果、断点位置和置信度确定目标断点位置;根据目标断点位置对用户语音请求进行断句得到断句结果。
具体地,针对某些特定的语音交互场景,可根据实际需求设置某些表达特殊含义的预设类别语音请求,并可设置该类别语音请求的标准表达。例如,可以是歌手点播的语音请求。当用户语音请求根据断点位置和置信度在二次断句处理后得到第二分句包括歌手点播的内容时,则将该包括歌手点播的第二分句与该预设类别语音请求进行正则匹配,得到匹配结果,并结合断点位置和置信度,确定目标断点位置,以对语音请求进行断句,从而得到断句结果。其中,预设类别语音请求即为预先设置的满足某些表达特殊含义的语音请求。上述正则匹配过程即为判断用户发出的语音请求是否命中预设类别语音请求的过程。
在一个示例中,存在歌手点播场景。当用户发出语音请求根据断点位置和置信度在二次断句处理后得到第二分句为“播放歌手A的歌曲打开空调”时,将该第二分句与预设类别语音请求进行正则匹配,匹配结果为命中预设类别语音请求“播放歌手A的歌曲”。结合上述匹配结果,以及用户语音请求中的断点位置和置信度,可确定目标断点位置。最后根据目标断点位置对语音请求进行断句得到断句结果为“播放歌手A的歌曲#打开空调”。
如此,可将用户语音请求经二次断句处理后得到的分句与预设类别语音请求进行正则匹配得到匹配结果,结合断点位置及其置信度确定目标断点位置,并可根据目标断点位置对语音请求进行断句,得到断句结果。
步骤0308包括:
若匹配结果为匹配到预设类别语音请求,且根据用户语音请求中匹配到预设类别语音请求的序列首位或尾位的断点位置的置信度大于第二预设值,则确定用户语音请求中匹配到预设类别语音请求的序列首位或尾位的断点位置为目标断点位置。
处理器用于若匹配结果为匹配到预设类别语音请求,且根据用户语音请求中匹配到预设类别语音请求的序列首位或尾位的断点位置的置信度大于第二预设值,则确定用户语音请求中匹配到预设类别语音请求的序列首位或尾位的断点位置为目标断点位置。
具体地,针对某些特定的语音交互场景,当用户语音请求中存在该预设类别语音请求内容时,语句中预设类别语音请求的内容需与其余内容断开。在上述示例中,当用户发出语音请求“播放歌手A的歌曲打开空调”时,将该语音请求与预设类别语音请求进行正则匹配,匹配结果为命中预设类别语音请求“播放歌手A的歌曲”。进一步地,可根据用户语音请求中匹配到预设类别语音请求的序列“播放歌手A的歌曲”尾位的断点位置的置信度大于第二预设值,确定用户语音请求中“播放歌手A的歌曲”尾位,即“打”字前的断点位置为目标断点位置。其中,第二预设值为0-1更接近1的任意值,例如0.8、0.9、0.95等,在此不作限定。最后根据目标断点位置对语音请求进行断句得到断句结果为“播放歌手A的歌曲#打开空调”。
在其他示例中,预设类别语音请求仍为“播放歌手A的歌曲”和“我想听歌手A的歌”。当用户发出的语音请求或经过二次断句处理后得到的分句为选择该歌手的具体歌曲,即“播放歌手A的歌曲b”时,将该语音请求或该语音请求分句与预设类别语音请求进行正则匹配,匹配结果为命中预设类别语音请求“播放歌手A的歌曲”。进一步的,可根据用户语音请求中匹配到预设类别语音请求的序列“播放歌手A的歌曲”尾位的断点位置的置信度小于第二预设值,则在歌曲名称的首字前不加断点,无需断句。最终得到的断句结果为“播放歌手A的歌曲b”。
如此,可根据用户语音请求或经过二次断句处理后得到的分句中匹配到预设类别语音请求的序列首位或尾位的断点位置,判断需要断开时可将预设类别语音请求的序列首位或尾位的断点位置确定为目标断点位置。
需要说明地,对于任一用户语音请求,在经过二次断句处理后的结果均需要再经过成分分析以及正则匹配处理,以判断是否有再次断句的需求,从而对断句处理的结果进行二次处理,得到最终的断句结果,断句结果更为准确。
本申请的计算机可读存储介质,存储有计算机程序,当计算机程序被一个或多个处理器执行时,实现上述的方法。
在本说明书的描述中,参考术语“上述”、“具体地”、“相类似地”、“进一步地”等的描述意指结合实施方式或示例描述的具体特征、结构、材料或者特点包含于本申请的至少一个实施方式或示例中。在本说明书中,对上述术语的示意性表述不一定指的是相同的实施方式或示例。而且,描述的具体特征、结构、材料或者特点可以在任何的一个或多个实施方式或示例中以合适的方式结合。此外,在不相互矛盾的情况下,本领域的技术人员可以将本说明书中描述的不同实施例或示例以及不同实施例或示例的特征进行结合和组合。
流程图中或在此以其他方式描述的任何过程或方法描述可以被理解为,表示包括一个或更多用于实现特定逻辑功能或过程的步骤的可执行请求的代码的模块、片段或部分,并且本申请的优选实施方式的范围包括另外的实现,其中可以不按所示出或讨论的顺序,包括根据所涉及的功能按基本同时的方式或按相反的顺序,来执行功能,这应被本申请的实施例所属技术领域的技术人员所理解。
尽管上面已经示出和描述了本申请的实施方式,可以理解的是,上述实施方式是示例性的,不能理解为对本申请的限制,本领域的普通技术人员在本申请的范围内可以对上述实施方式进行变化、修改、替换和变型。

Claims (13)

  1. 一种语音交互方法,其中,所述语音交互方法包括:
    接收车辆转发的用户语音请求;
    对所述用户语音请求进行断句处理得到断点位置和所述断点位置对应的置信度;
    根据所述断点位置和所述置信度对所述用户语音请求进行处理,得到断句结果;
    根据所述断句结果进行自然语言理解以完成语音交互。
  2. 根据权利要求1所述的语音交互方法,其中,所述对所述用户语音请求进行断句处理得到断点位置和所述断点位置对应的置信度,包括:
    对所述用户语音请求进行文本向量化处理得到第一语句向量;
    对所述第一语句向量进行断句处理得到初始断点位置和所述初始断点位置对应的第一置信度;
    根据所述初始断点位置和所述第一置信度确定所述用户语音请求的待处理分句;
    对所述待处理分句进行二次断句处理得到所述断点位置和对应的置信度。
  3. 根据权利要求2所述的语音交互方法,其中,所述根据所述初始断点位置和所述第一置信度确定所述用户语音请求的待处理分句,包括:
    若所述初始断点位置中存在多个所述初始断点位置对应的所述第一置信度大于第一预设值,则根据所述第一置信度大于所述第一预设值的多个所述初始断点位置,对所述用户语音请求进行断句得到多个第一分句;
    确定多个所述第一分句中字符数量最多的第一分句为所述待处理分句。
  4. 根据权利要求2所述的语音交互方法,其中,所述根据所述初始断点位置和所述第一置信度确定所述用户语音请求的待处理分句,包括:
    若所述初始断点位置中只存在一个所述初始断点位置对应的所述第一置信度大于第一预设值,则根据所述第一置信度大于所述第一预设值的所述初始断点位置,对所述用户语音请求进行断句得到两个第一分句;
    确定两个所述第一分句为所述待处理分句。
  5. 根据权利要求2所述的语音交互方法,其中,所述根据所述初始断点位置和所述第一置信度确定所述用户语音请求的待处理分句,包括:
    若所述初始断点位置中所述初始断点位置对应的所述第一置信度均不大于第一预设值,则根据所述第一置信度最大的所述初始断点位置对所述用户语音请求进行断句得到两个第一分句;
    确定两个所述第一分句为所述待处理分句。
  6. 根据权利要求2所述的语音交互方法,其中,所述对所述待处理分句进行二次断句处理得到所述断点位置和对应的置信度,包括:
    对所述待处理分句进行文本向量化处理得到第二语句向量;
    对所述第二语句向量进行断句处理得到所述断点位置和所述断点位置对应的所述置信度。
  7. 根据权利要求1所述的语音交互方法,其中,所述根据所述断点位置和所述置信度对所述用户语音请求进行处理,得到断句结果,包括:
    若根据所述断点位置和所述置信度确定所述用户语音请求不断句,则对所述用户语音请求进行句子成分分析,以根据分析结果建立解析树;
    遍历所述解析树的关联节点做笛卡尔积;
    从所述解析树的根节点遍历整个所述解析树,以得到所述断句结果。
  8. 根据权利要求1所述的语音交互方法,其中,所述根据所述断点位置和所述置信度对所述用户语音请求进行处理,得到断句结果,包括:
    若根据所述断点位置和所述置信度对所述用户语音请求进行断句得到第二分句,则对所述第二分句进行句子成分分析,以根据分析结果建立解析树;
    遍历所述解析树的关联节点做笛卡尔积;
    从所述解析树的根节点遍历整个所述解析树,以得到所述断句结果。
  9. 根据权利要求1所述的语音交互方法,其中,所述根据所述断点位置和所述置信度对所述用户语音请求进行处理,得到断句结果,包括:
    若根据所述断点位置和所述置信度确定所述用户语音请求不断句,将所述用户语音请求与预设类别语音请求进行正则匹配得到匹配结果;
    根据所述匹配结果、所述断点位置和所述置信度确定目标断点位置;
    根据所述目标断点位置对所述用户语音请求进行断句得到所述断句结果。
  10. 根据权利要求1所述的语音交互方法,其中,所述根据所述断点位置和所述置信度对所述用户语音请求进行处理,得到断句结果,包括:
    若根据所述断点位置和所述置信度对所述用户语音请求进行断句得到第二分句,则将所述第二分句与预设类别语音请求进行正则匹配得到匹配结果;
    根据所述匹配结果、所述断点位置和所述置信度确定目标断点位置;
    根据所述目标断点位置对所述用户语音请求进行断句得到所述断句结果。
  11. 根据权利要求9或10所述的语音交互方法,其中,所述根据所述匹配结果、所述断点位置和所述置信度确定目标断点位置,包括:
    若所述匹配结果为匹配到所述预设类别语音请求,且根据所述用户语音请求中匹配到所述预设类别语音请求的序列首位或尾位的断点位置的置信度大于第二预设值,则确定所述用户语音请求中匹配到所述预设类别语音请求的序列首位或尾位的断点位置为所述目标断点位置。
  12. 一种服务器,其中,所述服务器包括存储器和处理器,所述存储器中存储有计算机程序,所述计算机程序被所述处理器执行时,实现如权利要求1-11中任意一项所述的方法。
  13. 一种计算机可读存储介质,其中,所述计算机可读存储介质存储有计算机程序,当所述计算机程序被一个或多个处理器执行时,实现如权利要求1-11中任意一项所述的方法。
PCT/CN2023/136846 2022-12-06 2023-12-06 语音交互方法、服务器及计算机可读存储介质 WO2024120450A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211558626.5A CN115579009B (zh) 2022-12-06 2022-12-06 语音交互方法、服务器及计算机可读存储介质
CN202211558626.5 2022-12-06

Publications (1)

Publication Number Publication Date
WO2024120450A1 true WO2024120450A1 (zh) 2024-06-13

Family

ID=84590707

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/136846 WO2024120450A1 (zh) 2022-12-06 2023-12-06 语音交互方法、服务器及计算机可读存储介质

Country Status (2)

Country Link
CN (1) CN115579009B (zh)
WO (1) WO2024120450A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115579009B (zh) * 2022-12-06 2023-04-07 广州小鹏汽车科技有限公司 语音交互方法、服务器及计算机可读存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150142449A1 (en) * 2012-08-02 2015-05-21 Bayerische Motoren Werke Aktiengesellschaft Method and Device for Operating a Speech-Controlled Information System for a Vehicle
CN110264997A (zh) * 2019-05-30 2019-09-20 北京百度网讯科技有限公司 语音断句的方法、装置和存储介质
EP3958577A2 (en) * 2021-06-22 2022-02-23 Guangzhou Xiaopeng Motors Technology Co., Ltd. Voice interaction method, voice interaction system, server and storage medium
CN115064169A (zh) * 2022-08-17 2022-09-16 广州小鹏汽车科技有限公司 语音交互方法、服务器和存储介质
CN115083413A (zh) * 2022-08-17 2022-09-20 广州小鹏汽车科技有限公司 语音交互方法、服务器和存储介质
CN115579009A (zh) * 2022-12-06 2023-01-06 广州小鹏汽车科技有限公司 语音交互方法、服务器及计算机可读存储介质

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107305575B (zh) * 2016-04-25 2021-01-26 北京京东尚科信息技术有限公司 人机智能问答***的断句识别方法和装置
CN107247706B (zh) * 2017-06-16 2021-06-25 中国电子技术标准化研究院 文本断句模型建立方法、断句方法、装置及计算机设备
CN107578770B (zh) * 2017-08-31 2020-11-10 百度在线网络技术(北京)有限公司 网络电话语音识别方法、装置、计算机设备和存储介质
CN109785842B (zh) * 2017-11-14 2023-09-05 蔚来(安徽)控股有限公司 语音识别纠错方法以及语音识别纠错***
CN108959575B (zh) * 2018-07-06 2019-09-24 北京神州泰岳软件股份有限公司 一种企业关联关系信息挖掘方法及装置
CN111160003B (zh) * 2018-11-07 2023-12-08 北京猎户星空科技有限公司 一种断句方法及装置
CN109584876B (zh) * 2018-12-26 2020-07-14 珠海格力电器股份有限公司 语音数据的处理方法、装置和语音空调
CN114694637A (zh) * 2020-12-30 2022-07-01 北大方正集团有限公司 混合语音识别方法、装置、电子设备及存储介质
CN113268581B (zh) * 2021-07-20 2021-10-08 北京世纪好未来教育科技有限公司 题目生成方法和装置
CN114420102B (zh) * 2022-01-04 2022-10-14 广州小鹏汽车科技有限公司 语音断句方法、装置、电子设备及存储介质
CN115064170B (zh) * 2022-08-17 2022-12-13 广州小鹏汽车科技有限公司 语音交互方法、服务器和存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150142449A1 (en) * 2012-08-02 2015-05-21 Bayerische Motoren Werke Aktiengesellschaft Method and Device for Operating a Speech-Controlled Information System for a Vehicle
CN110264997A (zh) * 2019-05-30 2019-09-20 北京百度网讯科技有限公司 语音断句的方法、装置和存储介质
EP3958577A2 (en) * 2021-06-22 2022-02-23 Guangzhou Xiaopeng Motors Technology Co., Ltd. Voice interaction method, voice interaction system, server and storage medium
CN115064169A (zh) * 2022-08-17 2022-09-16 广州小鹏汽车科技有限公司 语音交互方法、服务器和存储介质
CN115083413A (zh) * 2022-08-17 2022-09-20 广州小鹏汽车科技有限公司 语音交互方法、服务器和存储介质
CN115579009A (zh) * 2022-12-06 2023-01-06 广州小鹏汽车科技有限公司 语音交互方法、服务器及计算机可读存储介质

Also Published As

Publication number Publication date
CN115579009A (zh) 2023-01-06
CN115579009B (zh) 2023-04-07

Similar Documents

Publication Publication Date Title
WO2024120450A1 (zh) 语音交互方法、服务器及计算机可读存储介质
WO2020057413A1 (zh) 垃圾文本的识别方法、装置、计算设备及可读存储介质
WO2017088363A1 (zh) 筛选发音词典有效词条的方法及装置
CN111191450A (zh) 语料清洗方法、语料录入设备及计算机可读存储介质
CN111104803B (zh) 语义理解处理方法、装置、设备及可读存储介质
EP4086893A1 (en) Natural language understanding method and device, vehicle and medium
WO2023130951A1 (zh) 语音断句方法、装置、电子设备及存储介质
JP2002287793A (ja) コマンド処理装置、コマンド処理方法、及びコマンド処理プログラム
CN114049894A (zh) 语音交互方法及其装置、车辆和存储介质
US20230215425A1 (en) User-system dialog expansion
CN114550718A (zh) 热词语音识别方法、装置、设备与计算机可读存储介质
WO2024083128A1 (zh) 语音交互方法、服务器及计算机可读存储介质
CN114429136A (zh) 一种文本纠错方法
CN116486815A (zh) 车载语音信号处理方法及装置
CN116246616A (zh) 语音交互方法、语音交互装置、服务器以及可读存储介质
CN111125346B (zh) 语义资源的更新方法及***
CN114138953A (zh) 对话流程图生成方法及装置、设备和存储介质
CN113053363A (zh) 语音识别方法、语音识别装置和计算机可读存储介质
Esteve et al. Stochastic finite state automata language model triggered by dialogue states
CN115579008B (zh) 语音交互方法、服务器及计算机可读存储介质
CN115691492A (zh) 一种车载语音控制***及方法
CN114528822B (zh) 客服机器人的对话流程控制方法、装置、服务器及介质
US11893996B1 (en) Supplemental content output
CN117496972B (zh) 一种音频识别方法、音频识别装置、车辆和计算机设备
US11837229B1 (en) Interaction data and processing natural language inputs