CN118098229A

CN118098229A - Voice control method, device, vehicle and storage medium

Info

Publication number: CN118098229A
Application number: CN202410166038.XA
Authority: CN
Inventors: 姜志海; 张礼元; 赵晨纹; 刘思宇; 张时干; 周伟朋; 赵旻; 蒋云峰; 张发贵
Original assignee: Chery Automobile Co Ltd
Current assignee: Chery Automobile Co Ltd
Priority date: 2024-02-05
Filing date: 2024-02-05
Publication date: 2024-05-28

Abstract

The application discloses a voice control method, a voice control device, a vehicle and a storage medium, wherein the voice control method comprises the following steps: acquiring voice content of a user; identifying one or more voice control instructions in the voice content; if the voice content comprises a plurality of voice control instructions, the voice control instructions are sequentially executed according to the target execution sequence. Therefore, the problems that the intention of a plurality of voice instructions of a user cannot be met simultaneously, the experience of the user is poor and the like in the related art are solved.

Description

Voice control method, device, vehicle and storage medium

Technical Field

The present application relates to the field of vehicle-mounted voice technologies, and in particular, to a voice control method, a device, a vehicle, and a storage medium.

Background

The current vehicle-mounted voice control technology evolves from single-intention natural voice recognition to multi-intention natural voice recognition, and supports vehicle control operation of any voice instruction of a user and vehicle control operation of various voice instruction combinations.

At present, each cart enterprise supports a user to say three instructions and span two functional domains at most, but when the user issues a plurality of instructions at the same time or the functional domains between the instructions span more, the problems of inaccurate instruction identification, no feedback in execution and the like exist, so that a plurality of instruction intentions of the user cannot be met at the same time, and worse experience is further caused to the user.

Disclosure of Invention

The application provides a voice control method, a voice control device, a vehicle and a storage medium, which are used for solving the problems that the intention of a plurality of voice instructions of a user cannot be met simultaneously, the experience of the user is poor and the like in the related technology.

An embodiment of a first aspect of the present application provides a voice control method, including the steps of: acquiring voice content of a user; identifying one or more voice control instructions in the voice content; if the voice content comprises a plurality of voice control instructions, the voice control instructions are sequentially executed according to the target execution sequence.

Optionally, in one embodiment of the application, the voice control instructions include one or more of car control instructions, navigation instructions, music instructions, weather instructions, and telephone instructions.

Optionally, in one embodiment of the present application, identifying one or more voice control instructions in the voice content includes: acquiring a voice endpoint detection value of each voice control instruction in voice content; if the voice endpoint detection value is smaller than the preset value, judging that the voice content comprises a plurality of voice control instructions, otherwise, judging that the voice content comprises one voice control instruction.

Optionally, in one embodiment of the present application, executing the plurality of voice control instructions includes: if the voice control instruction is a vehicle control instruction, processing each voice control instruction to obtain a semantic result of each voice control, converting the semantic result into the vehicle control instruction, and issuing the vehicle control instruction to the area control unit, and executing the vehicle control instruction by using the area control unit; if the voice control instruction is a navigation instruction, a music instruction, a weather instruction and a telephone instruction, the voice control instruction is transmitted to corresponding software, and the voice control instruction is executed.

Optionally, in one embodiment of the present application, sequentially executing the plurality of voice control instructions according to the target execution order includes: identifying whether a voice control instruction with an ambiguous instruction exists in the plurality of voice control instructions; if the voice control instruction with the undefined instruction exists, the undefined voice control instruction is adjusted to be executed at the end, otherwise, a plurality of voice control instructions are executed according to the instruction issuing sequence.

Optionally, in one embodiment of the present application, after sequentially executing the plurality of voice control instructions in the target execution order, the method includes: acquiring an execution result of a voice control instruction; and broadcasting the execution result of the voice control instruction after the voice control instruction is executed, or broadcasting the execution result of all the voice control instructions after the execution of all the voice control instructions is completed.

Optionally, in one embodiment of the present application, after adjusting the ambiguous speech control instruction to the end execution, the method includes: guiding a user to issue a definite voice control instruction; and/or guiding the user to complement ambiguous voice control instructions.

An embodiment of a second aspect of the present application provides a voice control apparatus, including: the acquisition module is used for acquiring the voice content of the user; the recognition module is used for recognizing one or more voice control instructions in the voice content; and the execution module is used for sequentially executing the voice control instructions according to the target execution sequence if the voice content comprises the voice control instructions.

Optionally, in one embodiment of the present application, the identification module is further configured to: acquiring a voice endpoint detection value of each voice control instruction in voice content; if the voice endpoint detection value is smaller than the preset value, judging that the voice content comprises a plurality of voice control instructions, otherwise, judging that the voice content comprises one voice control instruction.

Optionally, in one embodiment of the present application, the execution module is further configured to: if the voice control instruction is a vehicle control instruction, processing each voice control instruction to obtain a semantic result of each voice control, converting the semantic result into the vehicle control instruction, and issuing the vehicle control instruction to the area control unit, and executing the vehicle control instruction by using the area control unit; if the voice control instruction is a navigation instruction, a music instruction, a weather instruction and a telephone instruction, the voice control instruction is transmitted to corresponding software, and the voice control instruction is executed.

Optionally, in one embodiment of the present application, the execution module is further configured to: identifying whether a voice control instruction with an ambiguous instruction exists in the plurality of voice control instructions; if the voice control instruction with the undefined instruction exists, the undefined voice control instruction is adjusted to be executed at the end, otherwise, a plurality of voice control instructions are executed according to the instruction issuing sequence.

Optionally, in one embodiment of the present application, further includes: the broadcasting module is used for acquiring an execution result of the voice control instruction after sequentially executing a plurality of voice control instructions according to the target execution sequence; and broadcasting the execution result of the voice control instruction after the voice control instruction is executed, or broadcasting the execution result of all the voice control instructions after the execution of all the voice control instructions is completed.

An embodiment of a third aspect of the present application provides a vehicle including: the voice control method according to the above embodiment includes a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor executing the program to perform the voice control method according to the above embodiment.

An embodiment of the fourth aspect of the present application provides a computer-readable storage medium having stored thereon a computer program that is executed by a processor to perform the voice control method of the above embodiment.

Therefore, the application has at least the following beneficial effects:

The embodiment of the application can identify the voice control instruction in the voice content of the user, and the voice control instructions are carried out according to the target execution sequence, so that the multiple instruction intentions of the user can be met, extremely smooth voice control instruction experience is brought to the user, and the experience sense of the user is improved. Therefore, the technical problems that the intention of a plurality of voice instructions of a user cannot be met simultaneously and the experience of the user is poor in the related art are solved.

Additional aspects and advantages of the application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the application.

Drawings

The foregoing and/or additional aspects and advantages of the application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings, in which:

fig. 1 is a flowchart of a voice control method according to an embodiment of the present application;

FIG. 2 is a flow chart of a voice control method according to one embodiment of the present application;

fig. 3 is a schematic diagram of a voice control apparatus according to an embodiment of the present application;

Fig. 4 is a schematic structural diagram of a vehicle according to an embodiment of the present application.

Detailed Description

Embodiments of the present application are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative and intended to explain the present application and should not be construed as limiting the application.

The following describes a voice control method, apparatus, vehicle, and storage medium of an embodiment of the present application with reference to the accompanying drawings. Aiming at the problem that the prior art center cannot achieve smooth experience that an operation command can be issued between a person and a machine while speaking, and the flow control is necessarily interrupted by multiple rounds of conversations generated in the flow conversation process, the application provides a voice control method. Therefore, the problems that the intention of a plurality of voice instructions of a user cannot be met simultaneously, the experience of the user is poor and the like in the related art are solved.

Specifically, fig. 1 is a schematic flow chart of a voice control method according to an embodiment of the present application.

As shown in fig. 1, the voice control method includes the steps of:

in step S101, the voice content of the user is acquired.

The voice content of the user may include one or more voice control instructions, for example, the voice content of the user is "open window, open air conditioner, adjust seat"; the voice content of the user can be acquired through the vehicle-mounted microphone.

In step S102, one or more voice control instructions in the voice content are identified.

The voice control instruction comprises one or more of a car control instruction, a navigation instruction, a music instruction, a weather instruction and a telephone instruction, and the car control instruction comprises adjustment of a car window, an air conditioner, a seat and the like.

It should be noted that, the voice control instruction in the embodiment of the present application includes a voice controllable domain instruction such as a car control instruction, a navigation instruction, a music instruction, a weather instruction, a telephone instruction, etc., each domain supports (a first-round interaction instruction of) a service drop class, and meanwhile, the number of service drop classes of the same domain instruction is not limited.

In an embodiment of the present application, identifying one or more voice control instructions in voice content includes: acquiring a voice endpoint detection value of each voice control instruction in voice content; if the voice endpoint detection value is smaller than the preset value, judging that the voice content comprises a plurality of voice control instructions, otherwise, judging that the voice content comprises one voice control instruction.

The preset value may be set according to a specific situation, for example, may be set to 600ms or 700ms, which is not limited specifically.

It can be understood that the embodiment of the present application may determine that the voice content of the user includes a plurality of voice control commands by determining the VAD (Voice Activity Detection, voice endpoint detection) value between each voice control command, and determine that the voice content of the user includes only one voice control command if the VAD is greater than 600ms when the VAD value is smaller than a preset value, for example, VAD <600 ms.

In step S103, if the voice content includes a plurality of voice control commands, the plurality of voice control commands are sequentially executed according to the target execution order.

Wherein the plurality of voice control instructions may be referred to as streaming instructions.

It can be understood that when a plurality of voice control instructions exist, the embodiment of the application sequentially executes the voice control instructions according to the target execution sequence, thereby realizing the execution of instructions while speaking and bringing extremely smooth voice instruction operation experience to users.

In the implementation of the present application, sequentially executing a plurality of voice control instructions according to a target execution sequence includes: identifying whether a voice control instruction with an ambiguous instruction exists in the plurality of voice control instructions; if the voice control instruction with the undefined instruction exists, the undefined voice control instruction is adjusted to be executed at the end, otherwise, a plurality of voice control instructions are executed according to the instruction issuing sequence.

It can be understood that the application can execute the tail of the undefined voice control command adjustment value when the undefined voice control command exists in the plurality of voice control commands, so as to prevent the problem of secondary interaction in the command execution process from influencing the smoothness of command issuing, or else execute the voice control commands according to the command issuing sequence, thereby solving the problem that the command execution is interrupted by multiple times of conversations generated in the streaming conversation process.

For example, the voice control command in the voice content includes "open air conditioner to 24 ℃," make a call "," open window ", at this time" make a call "is an ambiguous user command, the" make a call "is adjusted to be executed last to prevent the generation of a secondary dialogue, the voice control command in the voice content includes" open air conditioner to 24 ℃, "open window", "query for today's weather", the voice control command in the voice content has no ambiguous voice command, and then the voice control command is executed in order of order, firstly, open air conditioner to 24 ℃, secondly, open window, and finally query for today's weather.

In the implementation of the present application, after the ambiguous voice control instruction is adjusted to the end of execution, the method includes: guiding a user to issue a definite voice control instruction; and/or guiding the user to complement ambiguous voice control instructions.

It can be understood that after the undefined voice control command is adjusted to the end for execution, the embodiment of the application can guide the user to issue the defined voice control command, or guide the user to complement the undefined voice control command so as to realize the clear issuing and execution of the command.

For example, if the voice control command is a navigation trip, the voice control command is an ambiguous user voice command, which can guide the user to issue an explicit voice control command, such as prompting the user to "please input a correct navigation command, the command needs to include a destination", or guide the user to complement the ambiguous voice control command, such as prompting the user to "please input a navigation destination".

In an embodiment of the present application, executing a plurality of voice control instructions includes: if the voice control instruction is a vehicle control instruction, processing each voice control instruction to obtain a semantic result of each voice control, converting the semantic result into the vehicle control instruction, and issuing the vehicle control instruction to the area control unit, and executing the vehicle control instruction by using the area control unit; if the voice control instruction is a navigation instruction, a music instruction, a weather instruction and a telephone instruction, the voice control instruction is transmitted to corresponding software, and the voice control instruction is executed.

It can be understood that if the voice control instruction is a car control instruction, processing each voice control instruction to obtain a semantic result, converting the semantic result into a car control Someip (Scalable service-Oriented Middleware over IP, and sending the command to the area control unit, and executing the car control instruction by using the area control unit; if the voice control instruction is a navigation instruction, a music instruction, a weather instruction and a telephone instruction, the voice control instruction is transmitted to corresponding software to be executed.

In the implementation of the present application, after sequentially executing a plurality of voice control instructions according to a target execution order, the method includes: acquiring an execution result of a voice control instruction; and broadcasting the execution result of the voice control instruction after the voice control instruction is executed, or broadcasting the execution result of all the voice control instructions after the execution of all the voice control instructions is completed.

It can be understood that the embodiment of the application can acquire the execution result of the voice control instruction, and broadcast the execution result of the voice control instruction after the execution of the voice control instruction is completed, or broadcast the execution result of the voice control instruction after the execution of all the voice control instructions is completed.

For example, if the middle of the plurality of voice control commands includes a similar command to "inquiring the weather today," voice broadcasting is necessary for the user to know the execution result, and the execution result is broadcasted (for example, "weather today is sunny, level 1-2 northeast wind, temperature 23 ℃)"; if there is no instruction which needs to be broadcasted to know the execution result among the plurality of voice control instructions, unified broadcasting, such as good, is performed after all the voice control instructions are executed.

Specifically, the embodiment of the application distinguishes the original voice command or the streaming voice command by creating the streaming command (also called as a plurality of voice control commands) recognition module, and performs quick response after entering the streaming voice command recognition module. The streaming voice instruction module comprises a demand skill recognition module, a streaming tail sentence recognition module, a TTS (Text-To-Speech) broadcasting module and a post processing module.

The demand skill recognition module is used for recognizing whether the skill is a streaming skill to be executed; the stream type end sentence recognition module is used for recognizing whether the current stream type voice is an end sentence of a voice instruction or not; the TTS broadcasting module is used for broadcasting feedback results after each instruction is issued and executed by voice, and the feedback results comprise TTS broadcasting and ending and secondary interaction TTS broadcasting in the process of streaming execution; the post processing module is used for processing post processing skills such as secondary interaction or multi-round dialogue.

The vehicle-mounted system matched with the system comprises a vehicle-mounted system processing module and an interface display module: the vehicle-mounted system processing module is used for processing semantic protocols, transmitting vehicle control instructions such as someip signals and the like after converting the semantic protocols into vehicle control information, providing feedback results of vehicle control instruction execution and specially processing TTS skills without broadcasting part of instructions; the interface display module is used for displaying voice instructions issued by a user, only occupies one line, and refreshes the execution instructions and the execution results from bottom to top, and the types of the results are presented: success, failure, timeout, execution result is synchronized with TTS broadcast.

Compared with the prior art that the user needs to wait for starting voice processing and issuing corresponding vehicle control instructions after speaking, the waiting time from the completion of the user to the start of execution is judged according to the number of instructions issued by the user, the time is different, the instructions cannot be executed while speaking, and the embodiment of the application can improve the response speed of a plurality of continuous instructions of the user in experience, enable the instructions to be executed while speaking, and further realize a more fluent and faster man-machine conversation.

The following describes a voice control method according to an embodiment of the present application, as shown in fig. 2, including:

1. stream type voice instruction module

Judging whether the continuous voice command of the user is a streaming voice command or not, judging the VAD value among the voice commands, and judging the continuous voice command as the streaming command if the VAD value is less than 600 ms; if VAD is more than 600ms, it is judged to be the original voice command and conventional voice command is issued.

2. Demand instruction skill module

After judging the streaming voice command, comparing predefined voice controllable domain skills including but not limited to car control, navigation, music, weather, telephone and the like, if the skills are the same, entering the next streaming tail sentence judgment; if not, the execution is skipped and no conventional voice command is issued.

3. Stream type end sentence instruction judging module

Under the streaming dialogue, the last sentence of voice control instruction issued by the user or the instruction executed by the terminal to adjust the sequence last, the module comprises three cases.

1. Is a stream-type end sentence instruction, and has TTS broadcasting. The TTS broadcasting in this case means that the vehicle control instruction which is not obvious by the user such as opening an air conditioner and opening a seat for heating needs to be TTS broadcasting, the voice instruction of the user issues, the voice assistant processes the semantics, the semantic result is sent to the vehicle system, the vehicle system receives the semantic result and converts the semantic result into the vehicle control Someip instruction and sends the vehicle control Someip instruction to each ZCU (Zone Control Unit, area control unit) of the vehicle for execution, after the operation is executed, the vehicle system processing module returns the result of success or failure of the operation to the voice assistant, the voice assistant returns the result of the operation to a corresponding broadcasting, and the broadcasting content is used as the ending broadcasting of the streaming instruction, for example: good, you have been on.

2. Is a stream-type end sentence instruction, and has no TTS broadcasting. The TTS-free broadcasting in this case refers to a car control instruction with very obvious visual perception by a user, such as opening a car window, adjusting the car window, etc., and the execution result of the instruction can be presented without being broadcast by voice, and the execution process of the instruction is as described in fig. 2. The only difference is that the car machine system processing module feeds back the result receipt, does not report the reply of such skill, but uses the voice assistant to carry out a unified ending TTS report for prompting the user of the overall execution condition of the streaming instructions, for example: good.

3. Non-streaming end-sentence instructions. If the streaming voice command is an N-1 sentence command, the next TTS broadcasting module is needed to process.

4. TTS broadcasting module

1. And after entering the N-1 sentence flow instruction, executing a voice instruction, wherein the execution process of the instruction is described according to the 1 st item in the third item, the execution result of the instruction is presented by relying on TTS broadcasting, such as weather, vehicle condition inquiry and other intentions, a user needs TTS broadcasting to know the feedback result, and if the instruction is an untrimmed sentence (N-1) instruction, the TTS broadcasting process is generated in the skill execution of the instruction which is strongly dependent on the TTS.

2. Independent of TTS broadcasting skills. It is determined whether the command is a multi-turn dialog, if not, but is a visually perceived execution result, such as window control, and the execution result of the command may not be presented by voice broadcasting. If multiple rounds of conversations are generated, the instructions enter a post-processing module to execute post-processing of the upper part of the instructions, and other streaming instructions are processed first.

5. Post-processing module

And (3) carrying out post-processing on multiple rounds of conversations generated in the processing of the N-1 sentence flow instruction, wherein the multiple rounds of conversations cannot generate TTS unified broadcasting. Multiple rounds of conversations may cause the voice assistant to interact with the user twice, interrupting the current streaming process and execution. For example, when the voice command is a navigation trip, the voice assistant judges that the voice information input by the user is a command with an ambiguous intention, and the voice assistant needs to gradually guide the user to make an explicit command or complete the command content through a voice prompt.

According to the voice control method provided by the embodiment of the application, the voice control instruction in the voice content of the user can be identified, and the voice control instructions are processed according to the target execution sequence, so that the intention of the plurality of instructions of the user can be met, extremely smooth voice control instruction experience is brought to the user, and the experience sense of the user is improved.

Next, a voice control apparatus according to an embodiment of the present application will be described with reference to the accompanying drawings.

Fig. 3 is a block diagram of a voice control apparatus according to an embodiment of the present application.

As shown in fig. 3, the voice control apparatus 10 includes: an acquisition module 100, an identification module 200 and an execution module 300.

The acquiring module 100 is configured to acquire voice content of a user; the recognition module 200 is used for recognizing one or more voice control instructions in the voice content; the execution module 300 is configured to sequentially execute the plurality of voice control instructions according to the target execution order if the voice content includes the plurality of voice control instructions.

In an embodiment of the present application, the voice control instructions include one or more of car control instructions, navigation instructions, music instructions, weather instructions, and phone instructions.

In an embodiment of the present application, the identification module 200 is further configured to: acquiring a voice endpoint detection value of each voice control instruction in voice content; if the voice endpoint detection value is smaller than the preset value, judging that the voice content comprises a plurality of voice control instructions, otherwise, judging that the voice content comprises one voice control instruction.

In an embodiment of the present application, the execution module 300 is further configured to: if the voice control instruction is a vehicle control instruction, processing each voice control instruction to obtain a semantic result of each voice control, converting the semantic result into the vehicle control instruction, and issuing the vehicle control instruction to the area control unit, and executing the vehicle control instruction by using the area control unit; if the voice control instruction is a navigation instruction, a music instruction, a weather instruction and a telephone instruction, the voice control instruction is transmitted to corresponding software, and the voice control instruction is executed.

In an embodiment of the present application, the execution module 300 is further configured to: identifying whether a voice control instruction with an ambiguous instruction exists in the plurality of voice control instructions; if the voice control instruction with the undefined instruction exists, the undefined voice control instruction is adjusted to be executed at the end, otherwise, a plurality of voice control instructions are executed according to the instruction issuing sequence.

In the embodiment of the present application, the apparatus 10 of the embodiment of the present application further includes: and a broadcasting module.

The broadcasting module is used for acquiring an execution result of the voice control instruction after sequentially executing a plurality of voice control instructions according to a target execution sequence; and broadcasting the execution result of the voice control instruction after the voice control instruction is executed, or broadcasting the execution result of all the voice control instructions after the execution of all the voice control instructions is completed.

In the embodiment of the application, after the ambiguous voice control instruction is adjusted to the end for execution, the method comprises the following steps: guiding a user to issue a definite voice control instruction; and/or guiding the user to complement ambiguous voice control instructions.

It should be noted that the foregoing explanation of the embodiment of the voice control method is also applicable to the voice control device of this embodiment, and will not be repeated here.

According to the voice control device provided by the embodiment of the application, the voice control instructions in the voice content of the user can be identified, and the voice control instructions are processed according to the target execution sequence, so that the intention of the plurality of instructions of the user can be met, extremely smooth voice control instruction experience is brought to the user, and the experience sense of the user is improved.

Fig. 4 is a schematic structural diagram of a vehicle according to an embodiment of the present application. The vehicle may include:

Memory 401, processor 402, and a computer program stored on memory 401 and executable on processor 402.

The processor 402 implements the voice control method provided in the above embodiment when executing a program.

Further, the vehicle further includes:

a communication interface 403 for communication between the memory 401 and the processor 402.

A memory 401 for storing a computer program executable on the processor 402.

Memory 401 may comprise high-speed RAM memory or may also include non-volatile memory (non-volatile memory), such as at least one disk memory.

If the memory 401, the processor 402, and the communication interface 403 are implemented independently, the communication interface 403, the memory 401, and the processor 402 may be connected to each other by a bus and perform communication with each other. The bus may be an industry standard architecture (Industry Standard Architecture, abbreviated ISA) bus, an external device interconnect (PERIPHERAL COMPONENT, abbreviated PCI) bus, or an extended industry standard architecture (Extended Industry Standard Architecture, abbreviated EISA) bus, among others. The buses may be divided into address buses, data buses, control buses, etc. For ease of illustration, only one thick line is shown in fig. 4, but not only one bus or one type of bus.

Alternatively, in a specific implementation, if the memory 401, the processor 402, and the communication interface 403 are integrated on a chip, the memory 401, the processor 402, and the communication interface 403 may perform communication with each other through internal interfaces.

Processor 402 may be a central processing unit (Central Processing Unit, abbreviated as CPU), or an Application SPECIFIC INTEGRATED Circuit (ASIC), or one or more integrated circuits configured to implement embodiments of the application.

The embodiment of the present application also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the voice control method as above.

In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or N embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.

Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In the description of the present application, "N" means at least two, for example, two, three, etc., unless specifically defined otherwise.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and additional implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order from that shown or discussed, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the embodiments of the present application.

It is to be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the N steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. As with another embodiment, if implemented in hardware, may be implemented with a combination of any one or more of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.

Those of ordinary skill in the art will appreciate that all or a portion of the steps carried out in the method of the above-described embodiments may be implemented by a program to instruct related hardware, where the program may be stored in a computer readable storage medium, and where the program, when executed, includes one or a combination of the steps of the method embodiments.

Claims

1. A voice control method, comprising the steps of:

Acquiring voice content of a user;

Identifying one or more voice control instructions in the voice content;

and if the voice content comprises a plurality of voice control instructions, sequentially executing the voice control instructions according to the target execution sequence.

2. The voice control method of claim 1, wherein the voice control instructions comprise one or more of car control instructions, navigation instructions, music instructions, weather instructions, and phone instructions.

3. The voice control method of claim 1, wherein the identifying one or more voice control instructions in the voice content comprises:

Acquiring a voice endpoint detection value of each voice control instruction in the voice content;

if the voice endpoint detection value is smaller than a preset value, judging that the voice content comprises a plurality of voice control instructions, otherwise, judging that the voice content comprises one voice control instruction.

4. The voice control method of claim 2, wherein the executing the plurality of voice control instructions comprises:

If the voice control instruction is the car control instruction, processing each voice control instruction to obtain a semantic result of each voice control, converting the semantic result into the car control instruction, and issuing the car control instruction to a regional control unit, and executing the car control instruction by using the regional control unit;

And if the voice control instruction is the navigation instruction, the music instruction, the weather instruction and the telephone instruction, transmitting the voice control instruction to corresponding software, and executing the voice control instruction.

5. The voice control method according to claim 1, wherein the sequentially executing the plurality of voice control instructions in the target execution order includes:

Identifying whether a voice control instruction with an undefined instruction exists in the voice control instructions;

if the command-ambiguous voice control command exists, the ambiguous voice control command is adjusted to be executed at the end, otherwise, the voice control commands are executed according to the command issuing sequence.

6. The voice control method according to claim 1 or 5, characterized by comprising, after sequentially executing the plurality of voice control instructions in the target execution order:

acquiring an execution result of the voice control instruction;

And broadcasting an execution result of the voice control instruction after the voice control instruction is executed, or broadcasting the execution result of all the voice control instructions after the execution of all the voice control instructions is completed.

7. The voice control method according to claim 5, characterized by comprising, after adjusting the ambiguous voice control instruction to the end execution:

Guiding a user to issue a definite voice control instruction;

and/or guiding the user to complement the ambiguous voice control instruction.

8. A voice control apparatus, comprising:

The acquisition module is used for acquiring the voice content of the user;

the recognition module is used for recognizing one or more voice control instructions in the voice content;

And the execution module is used for sequentially executing the voice control instructions according to the target execution sequence if the voice content comprises the voice control instructions.

9. A vehicle, characterized by comprising: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor executing the program to implement the speech control method according to any one of claims 1-7.

10. A computer-readable storage medium, on which a computer program is stored, characterized in that the program is executed by a processor for realizing the speech control method according to any one of claims 1-7.