CN110211579B

CN110211579B - Voice instruction recognition method, device and system

Info

Publication number: CN110211579B
Application number: CN201910350007.9A
Authority: CN
Inventors: 何奇; 乔克; 袁志伟
Original assignee: Beijing Moran Cognitive Technology Co Ltd
Current assignee: Beijing Moran Cognitive Technology Co Ltd
Priority date: 2019-04-28
Filing date: 2019-04-28
Publication date: 2021-12-24
Anticipated expiration: 2039-04-28
Also published as: CN110211579A

Abstract

The invention discloses a voice instruction recognition method, a device and a system, wherein the method comprises the following steps: step 101, acquiring a first signal, wherein the first signal is a user voice instruction superposed with background noise; step 102, identifying the in-vehicle identity of the user based on the first signal; step 103, determining an external sound pickup device corresponding to the internal identity of the user according to the corresponding relation between the internal identity and the external sound pickup device, and acquiring a second signal acquired by the external sound pickup device from the external sound pickup device; 104, processing the first signal based on the second signal to obtain a user voice instruction for removing background noise; and 105, performing voice recognition and semantic recognition based on the user voice command for removing the background noise. By the method, the condition that the voice command cannot be recognized or is recognized wrongly due to the noise outside the vehicle during windowing is avoided, the voice command recognition rate is improved, and the user experience is improved.

Description

Voice instruction recognition method, device and system

Technical Field

The embodiment of the invention relates to the technical field of information processing, in particular to a method, a device and a system for recognizing a voice instruction.

Background

In recent years, intellectualization has become the future development direction of automobiles, and voice control has received great attention as one of the expressions of automobile intellectualization. Through the voice control function, a user can avoid manual operation, and can control the car machine to execute specific operations by only utilizing voice, such as turning on or turning off an air conditioner, setting navigation/modifying navigation, turning on music playing and the like, so that the use experience of the user is greatly improved.

However, the current voice control function has insufficient recognition accuracy and is easily affected by environmental noise, for example, when a window is opened, the presence of tire noise, wind noise and environmental noise can cause a voice command sent by a user to be submerged in the noise, so that the voice command cannot be recognized or the voice command is recognized incorrectly, and the use of the voice control function by the user is seriously affected.

In the prior art, some denoising methods exist, but the obtained noise signal is seriously deviated from the actual noise signal, so that the denoising precision is insufficient, the situation that a voice command cannot be recognized or is recognized wrongly cannot be avoided, and the user experience is influenced.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides a voice instruction recognition method, a device and a system.

The invention provides a voice instruction recognition method, which comprises the following steps:

step 101, acquiring a first signal, wherein the first signal is a user voice instruction superposed with background noise;

step 102, identifying the in-vehicle identity of the user based on the first signal;

step 103, determining an external sound pickup device corresponding to the internal identity of the user according to the corresponding relation between the internal identity and the external sound pickup device, and acquiring a second signal acquired by the external sound pickup device from the external sound pickup device;

104, processing the first signal based on the second signal to obtain a user voice instruction for removing background noise;

and 105, performing voice recognition and semantic recognition based on the user voice command for removing the background noise.

The invention provides a voice command recognition device, comprising:

the first signal acquisition unit is used for acquiring a first signal, wherein the first signal is a user voice instruction superposed with background noise;

the user in-vehicle identity recognition unit is used for recognizing the in-vehicle identity of the user based on the first signal;

the second signal acquisition unit is used for determining the sound pickup device outside the vehicle corresponding to the identity in the vehicle of the user according to the corresponding relation between the identity in the vehicle and the sound pickup device outside the vehicle, which is stored in the storage unit, and acquiring a second signal acquired by the sound pickup device outside the vehicle;

the noise removing unit is used for processing the first signal based on the second signal to obtain a user voice instruction for removing background noise;

the voice and semantic recognition unit is used for carrying out voice recognition and semantic recognition based on the user voice command for removing the background noise;

and the storage unit is used for storing the corresponding relation between the identity in the vehicle and the pickup device outside the vehicle.

The invention provides a speech instruction recognition apparatus comprising a processor and a memory, the memory having stored therein a computer program executable on the processor, the computer program, when executed by the processor, implementing a method as described above.

The invention provides a computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program is executable on a processor, and when executed implements a method as described above.

The invention provides a voice command recognition system, which comprises the voice command recognition device, an in-vehicle sound pickup device and an out-vehicle sound pickup device.

According to the voice instruction recognition method and device, the pickup device outside the vehicle is selected according to the identity of the user in the vehicle, the noise collected by the pickup device outside the vehicle is more in line with the actual noise level, so that the denoising is more accurate, the denoising effect is better, the condition that the voice instruction cannot be recognized or the recognition is wrong due to the noise outside the vehicle during windowing is avoided, the voice instruction recognition rate is improved, and the user experience is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

FIG. 1 is a method of voice command recognition in one embodiment of the invention.

Fig. 2 is a voice command recognition apparatus in one embodiment of the invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings. The embodiments and specific features of the embodiments of the present invention are detailed descriptions of technical solutions of the embodiments of the present invention, and are not limited to technical solutions of the present invention, and the technical features of the embodiments and the embodiments of the present invention may be combined with each other without conflict.

The voice command recognition method of the present invention is described below with reference to fig. 1, and includes the steps of:

Preferably, the method is suitable for a scene in which a user gives a voice instruction when at least one vehicle window is opened.

Preferably, in step 101, a first signal is acquired from an in-vehicle sound pickup device, wherein the in-vehicle sound pickup device is a microphone array or a plurality of microphones.

Preferably, in step 101, the voice instruction is a voice instruction including a wakeup word, or a voice instruction in a wake-free scene.

Preferably, in step 102, in one embodiment, the in-vehicle identity of the user is one of a primary drive, a secondary drive, a left-side passenger in the rear row and a right-side passenger in the rear row. In the above embodiment, if the vehicle is a left-handed vehicle, the windows corresponding to the main driving, the sub driving, the rear left passenger, and the rear right passenger are respectively a left front window, a right front window, a left rear window, and a right rear window.

Preferably, in step 102, the identifying the in-vehicle identity of the user specifically includes: identifying an in-vehicle identity of the user based on microphone array beamforming. The microphone array generates a plurality of beams respectively pointing to each in-vehicle identity, each beam corresponds to one in-vehicle identity, the first signal is judged to be collected by which beam of the microphone array, and the in-vehicle identity corresponding to the beam is identified as the in-vehicle identity of the user. In a specific embodiment, the microphone array generates four beams directed to a primary driver, a secondary driver, a rear left passenger and a rear right passenger, respectively, and identifies the in-vehicle identity of the user as the primary driver if the first signal is collected by the beam directed to the primary driver, as the secondary driver if the beam directed to the secondary driver is collected, as the rear left passenger if the beam directed to the rear left passenger is collected, and as the rear right passenger if the beam directed to the rear right passenger is collected.

The microphone array may be located centrally on the roof of the vehicle to facilitate the collection of voice commands from users seated in different seats.

Preferably, in step 102, the identifying the in-vehicle identity of the user specifically includes: identifying an in-vehicle identity of the user based on a microphone of the plurality of microphones from which the first signal was acquired. The plurality of microphones are respectively arranged near different seats of the vehicle, and the in-vehicle identity of the user corresponds to the seat on which the user sits, so that the plurality of microphones respectively correspond to different in-vehicle identities, and the in-vehicle identity of the user can be determined based on the microphone which acquires the first signal in the plurality of microphones. In one embodiment, the microphone that acquires the first signal refers to a microphone with the highest strength of the first signal acquired by the plurality of microphones.

Preferably, in step 103, the correspondence between the in-vehicle identity and the out-vehicle sound pickup device is related to the number of the out-vehicle sound pickup devices. In addition, as described above, since the in-vehicle identity of the user and the window have a one-to-one correspondence relationship, the window and the external sound pickup device also have a correspondence relationship, and the same correspondence relationship as the in-vehicle identity of the user and the external sound pickup device. In one embodiment, the number of the sound pickup devices outside the vehicle is the same as the number of the sound pickup devices inside the vehicle, and the identities or windows inside the vehicle are in one-to-one correspondence with the sound pickup devices outside the vehicle, for example, in a scene where the vehicle interior identity includes a main driver, an auxiliary driver, a back row left passenger and a back row right passenger, four sound pickup devices outside the vehicle may be provided, specifically, the sound pickup devices may be respectively provided at windows near the main driver, the auxiliary driver, the back row left passenger and the back row right passenger, for example, the sound pickup devices outside the vehicle corresponding to the main driver and the auxiliary driver may be respectively provided near a rearview mirror. In another embodiment, the number of the sound pickup devices outside the vehicle may be smaller than the number of the vehicle interior, the vehicle interior identity or the vehicle window and the sound pickup device outside the vehicle may be in a one-to-one relationship, or may be in a many-to-one relationship, for example, in a scene where the vehicle interior includes a main driver, an assistant driver, a back row left passenger, and a back row right passenger, two sound pickup devices outside the vehicle may be provided, and are respectively provided at the vehicle windows near the main driver and the assistant driver, wherein the main driver, the passenger who is located at the same side as the main driver (for example, the back row left passenger in china) corresponds to the sound pickup device outside the vehicle provided at the vehicle window near the main driver, and the window corresponding to the main driver, the passenger who is located at the same side as the main driver also corresponds to the sound pickup device outside the vehicle provided at the vehicle window near the main driver; the passenger on the same side of the passenger seat (for example, the passenger on the right side of the rear row in china) corresponds to the vehicle exterior sound pickup device provided at the window near the passenger seat, and the window corresponding to the passenger on the same side of the passenger seat corresponds to the vehicle exterior sound pickup device provided at the window near the passenger seat.

Preferably, in step 103, the second signal collected by the vehicle exterior sound pickup device includes various noises outside the vehicle, such as road noise, tire noise, wind noise, environmental noise, and the like.

Preferably, the pickup device outside the vehicle is embodied as one or more microphones or a microphone array.

Preferably, in step 104, based on the second signal, the first signal is processed to obtain a user voice instruction for removing background noise, specifically: and subtracting the second signal from the first signal to obtain the user voice instruction for removing the background noise.

Preferably, in step 104, based on the second signal, the first signal is processed to obtain a user voice instruction for removing background noise, specifically: and attenuating the second signal by adopting a preset coefficient to obtain a third signal, and subtracting the third signal from the first signal to obtain the user voice instruction for removing the background noise.

Preferably, the value of the preset coefficient is determined according to the window opening state value. Specifically, the size of the preset coefficient is in positive correlation with the window opening state value, the window opening state value represents the degree of window opening, the smaller the window is opened, the smaller the window opening state value is, correspondingly, the smaller the preset coefficient is, otherwise, the larger the window is opened, the larger the window opening state value is, and the larger the preset coefficient is. For example, the preset coefficient is 0.85 when the window open state value is one-quarter, 0.9 when the window open state value is one-half, and 0.95 when the window open state value is three-quarters. The above is merely an example of the window opening state value and the preset coefficient, and it should not be considered as a limitation of the positive correlation between the window opening state value and the preset coefficient in the present invention. In a specific implementation, a function may be used to represent the relationship between the preset coefficient and the window opening state value, so that the preset coefficient may be calculated according to the window opening state value, for example, in the above embodiment, the preset coefficient is 0.2 × the vehicle-mounted opening state value + 0.8; a table in which the correspondence between the preset coefficient and the window open state value is recorded may also be set in advance, and for example, a table as described in table 1 may be set.

TABLE 1 table of correspondence between preset coefficients and values of window opening state

Window open state value	Coefficient of presetting
		1/4	0.85
1/2	0.9
		3/4	0.95

Preferably, the preset coefficient is determined according to a maximum value among window opening state values of all windows, or is determined according to a window opening state value of a window corresponding to the in-vehicle identity of the user.

Preferably, after performing the voice recognition and semantic recognition on the voice command of the user in step 105, the relevant operation may be performed in response to the voice command of the user, and if the voice command of the user is "navigate to university of beijing", an instruction is sent to the navigation apparatus to complete the setting of the navigation destination.

Preferably, before step 103, acquiring a switch state of each window of the vehicle, determining whether a window corresponding to the in-vehicle identity of the user is in an open state, and if so, executing step 103; if not, step 106 is performed. Step 106 is: and determining all the external sound pickup devices corresponding to the windows in the opening state, and acquiring the second signals acquired by the external sound pickup devices from the determined external sound pickup devices. Step 104-105 is performed after step 106. The number of all the vehicle windows in the opening state is N, correspondingly, the number of the determined vehicle external pickup devices is M, the number of the obtained second signals is M, N is more than or equal to 1 and less than the total number of the vehicle windows of the vehicle, M is more than or equal to 1 and less than the total number of the vehicle windows of the vehicle, and N is more than or equal to M. At this time, step 104 specifically includes: and averaging the second signal to obtain a fourth signal, and subtracting the fourth signal from the first signal to obtain the user voice instruction for removing the background noise. In another embodiment, step 104 is specifically to attenuate the second signal according to a preset coefficient and then perform averaging calculation to obtain a fourth signal, and subtract the fourth signal from the first signal to obtain the user voice command without the background noise. The preset coefficient corresponding to a certain second signal is determined based on the window opening state value of the window corresponding to the sound pickup device outside the vehicle for collecting the second signal. When a plurality of in-vehicle identities or a plurality of windows correspond to one external sound pickup device, if a driver drives and a passenger who is located on the same side as the driver (for example, a passenger on the left side of the rear row in china) corresponds to the external sound pickup device arranged at the window near the driver, a window corresponding to the driver who drives and a passenger who is located on the same side as the driver also corresponds to the external sound pickup device arranged at the window near the driver, and the preset coefficient corresponding to the second signal collected by the external sound pickup device is determined based on the average value of the window opening state values of the windows corresponding to the external sound pickup device. It should be understood by those skilled in the art that, if the second signal obtained in step 106 is one, the averaging operation may be omitted, that is, the fourth signal is directly assigned as the second signal or attenuated according to a preset coefficient.

The above description is only given by taking the in-vehicle identity as one of the main driving, the assistant driving, the left passenger in the rear row and the right passenger in the rear row as an example, and it should not be considered as a limitation of the in-vehicle identity in the present invention. Those skilled in the art will appreciate that the in-vehicle identity is not limited thereto, and may be related to the type of the vehicle, the distribution of the passengers in the vehicle, the number of the approved passengers in the vehicle, etc., for example, if the approved passengers in the vehicle are 10, the in-vehicle identity may include a main driver, a passenger driver, and a common passenger 1-8, wherein the common passenger 1-8 may be identified and distinguished by the seat number thereof.

The above description is only given by way of example of identifying the in-vehicle identity of the user based on microphone array beamforming or identifying the in-vehicle identity of the user based on the microphone of the plurality of microphones acquiring the first signal, and should not be construed as a limitation to the identification of the in-vehicle identity of the user in the present invention. Those skilled in the art will appreciate that other methods of identifying the identity of the user in the vehicle may be used.

By the method, the pickup device outside the vehicle is selected according to the identity of the user in the vehicle, and the noise collected by the pickup device outside the vehicle is more in line with the actual noise level, so that the denoising is more accurate, the denoising effect is better, the condition that the voice command cannot be recognized or is recognized wrongly due to the noise outside the vehicle during windowing is avoided, the voice command recognition rate is improved, and the user experience is improved.

The present invention also provides a voice command recognition apparatus, referring to fig. 2, the apparatus comprising:

The device is suitable for recognition of the voice command in the scene that the user sends the voice command when at least one vehicle window is opened.

Preferably, the first signal acquiring unit acquires the first signal from an in-vehicle sound pickup device, where the in-vehicle sound pickup device is a microphone array or a plurality of microphones.

Preferably, the user in-vehicle identity recognition unit is specifically configured to: the in-vehicle identity of the user is identified based on microphone array beam forming, the microphone array generates a plurality of beams which point to each in-vehicle identity respectively, each beam corresponds to one in-vehicle identity, the in-vehicle identity identification unit of the user judges which beam of the microphone array the first signal is collected by, and identifies the in-vehicle identity corresponding to the beam as the in-vehicle identity of the user. In a specific embodiment, the microphone array generates four beams directed to a primary driver, a secondary driver, a rear left passenger, and a rear right passenger, respectively, and the user in-vehicle identification unit identifies the user in-vehicle identification as the primary driver if the first signal is collected by the beam directed to the primary driver, identifies the user in-vehicle identification as the secondary driver if the first signal is collected by the beam directed to the secondary driver, identifies the user in-vehicle identification as the rear left passenger if the first signal is collected by the beam directed to the rear left passenger, and identifies the user in-vehicle identification as the rear right passenger if the first signal is collected by the beam directed to the rear right passenger.

Preferably, the user in-vehicle identity recognition unit is specifically configured to: identifying an in-vehicle identity of the user based on a microphone of the plurality of microphones from which the first signal was acquired. The plurality of microphones are respectively arranged near different seats of the vehicle, and the in-vehicle identity of the user corresponds to the seat on which the user sits, so that the plurality of microphones respectively correspond to different in-vehicle identities, and the in-vehicle identity identification unit of the user can determine the in-vehicle identity of the user based on the microphone which collects the first signal in the plurality of microphones.

Preferably, the corresponding relation between the identity in the vehicle and the sound pickup devices outside the vehicle is related to the number of the sound pickup devices outside the vehicle.

Preferably, the noise removing unit is specifically configured to: and subtracting the second signal from the first signal to obtain a user voice instruction for removing the background noise, or attenuating the second signal by adopting a preset coefficient to obtain a third signal, and subtracting the third signal from the first signal to obtain the user voice instruction for removing the background noise.

Preferably, the noise removing unit is further configured to determine a value of the preset coefficient according to a window opening state value, where the window opening state value represents a degree of window opening.

Preferably, the noise removing unit is further configured to determine the preset coefficient according to a maximum value of window opening state values of all windows or a window opening state value of a window corresponding to the in-vehicle identity of the user.

Preferably, the storage unit is further configured to store a correspondence table between preset coefficients and window opening state values.

Preferably, the storage unit is further configured to store a correspondence between an in-vehicle identity of the user and the window and a correspondence between the window and the off-vehicle sound pickup device.

Preferably, the second signal acquiring unit is further configured to acquire a switch state of each window of the vehicle, determine whether the window corresponding to the in-vehicle identity of the user is in an open state, if so, determine the off-vehicle sound pickup device corresponding to the in-vehicle identity of the user according to a correspondence between the in-vehicle identity and the off-vehicle sound pickup device stored in the storage unit, and acquire the second signal acquired by the off-vehicle sound pickup device; if not, according to the corresponding relation between the car windows and the car exterior sound pickup devices stored in the storage unit, the car exterior sound pickup devices corresponding to all the car windows in the opening state are determined, and the second signals collected by the car exterior sound pickup devices are obtained from the determined car exterior sound pickup devices. The noise unit is further configured to perform averaging calculation on the second signal to obtain a fourth signal, and subtract the fourth signal from the first signal to obtain the user voice instruction with background noise removed. In another embodiment, the noise unit is further configured to attenuate the second signal according to a preset coefficient and then perform averaging calculation to obtain a fourth signal, and subtract the fourth signal from the first signal to obtain the user voice instruction with background noise removed. The number of all the vehicle windows in the opening state is N, correspondingly, the number of the determined vehicle external sound pickup devices is M, the number of the obtained second signals is M, wherein N is larger than or equal to 1 and smaller than the total number of the vehicle windows of the vehicle, M is larger than or equal to 1 and smaller than the total number of the vehicle windows of the vehicle, N is larger than or equal to M, and the preset coefficient corresponding to a certain second signal is determined based on the vehicle window opening state value of the vehicle window corresponding to the vehicle external sound pickup device for collecting the second signal. When a plurality of in-vehicle identities or a plurality of windows correspond to one external sound pickup device, if a driver drives and a passenger who is located on the same side as the driver (for example, a passenger on the left side of the rear row in china) corresponds to the external sound pickup device arranged at the window near the driver, a window corresponding to the driver who drives and a passenger who is located on the same side as the driver also corresponds to the external sound pickup device arranged at the window near the driver, and the preset coefficient corresponding to the second signal collected by the external sound pickup device is determined based on the average value of the window opening state values of the windows corresponding to the external sound pickup device.

Preferably, the device is used in a vehicle machine.

The above description only takes the example that the in-vehicle identity identifying unit of the user identifies the in-vehicle identity of the user based on the microphone array beamforming or identifies the in-vehicle identity of the user based on the microphone of the plurality of microphones, which collects the first signal, as an example, and it should not be considered as a limitation to the in-vehicle identity identifying unit of the user in the present invention. The user in-car identity recognition unit may also use other methods to recognize the user in-car identity, as will be appreciated by those skilled in the art.

According to the voice instruction recognition device, the pickup device outside the vehicle is selected according to the identity of the user in the vehicle, the noise collected by the pickup device outside the vehicle is more in line with the actual noise level, so that the denoising is more accurate, the denoising effect is better, the condition that the voice instruction cannot be recognized or the recognition is wrong caused by the noise outside the vehicle during windowing is avoided, the voice instruction recognition rate is improved, and the user experience is improved.

The invention also provides a speech instruction recognition apparatus comprising a processor and a memory, the memory having stored therein a computer program executable on the processor, the computer program, when executed by the processor, implementing the method as described above.

The invention also provides a computer-readable storage medium in which a computer program executable on a processor is stored, which computer program, when being executed, carries out the method as described above.

The invention also provides a voice command recognition system, which comprises the voice command recognition device, the sound pickup device in the automobile and the sound pickup device outside the automobile.

Any combination of one or more computer-readable media may be employed. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. The computer-readable storage medium may include: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), a flash memory, an erasable programmable read-only memory (EPROM), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

Computer program code for carrying out operations of the present invention may be written in one or more programming languages, or a combination thereof.

The above description is only an example for the convenience of understanding the present invention, and is not intended to limit the scope of the present invention. In the specific implementation, a person skilled in the art may change, add, or reduce the components of the apparatus according to the actual situation, and may change, add, reduce, or change the order of the steps of the method according to the actual situation without affecting the functions implemented by the method.

While embodiments of the invention have been shown and described, it will be understood by those skilled in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents, and all changes that come within the meaning and range of equivalency of the claims are to be embraced within their scope.

Claims

1. A method for voice command recognition, the method comprising:

step 102, identifying the in-vehicle identity of the user based on the acquired first signal; judging whether the window corresponding to the in-vehicle identity of the user is in an open state, if so, executing step 103; if not, go to step 106;

step 103, determining an external sound pickup device corresponding to the internal identity of the user according to the corresponding relation between the internal identity and the external sound pickup device, and acquiring a second signal acquired by the external sound pickup device from the external sound pickup device; processing the first signal based on the second signal to obtain a user voice instruction for removing background noise; performing voice recognition and semantic recognition based on the user voice command for removing the background noise;

step 106, determining all the vehicle exterior sound pickup devices corresponding to the vehicle windows in the opening state, and acquiring second signals acquired by the vehicle exterior sound pickup devices from the determined vehicle exterior sound pickup devices; averaging the second signal to obtain a fourth signal, and processing the first signal based on the fourth signal to obtain a user voice instruction for removing background noise; and performing voice recognition and semantic recognition based on the user voice command for removing the background noise.

2. The method of claim 1, wherein the first signal is obtained from an in-vehicle sound pickup device, the in-vehicle sound pickup device being a microphone array or a plurality of microphones.

3. The method according to claim 2, wherein in step 102, the in-vehicle identity of the user is identified based on the first signal, in particular:

identifying an in-vehicle identity of the user based on microphone array beamforming or identifying the in-vehicle identity of the user based on a microphone of the plurality of microphones that acquired the first signal.

4. The method of claim 1, wherein the correspondence of the in-vehicle identity to the off-vehicle pickup devices is related to the number of the off-vehicle pickup devices.

5. The method according to claim 1, wherein in step 103, based on the second signal, the first signal is processed to obtain a user voice instruction, specifically:

subtracting the second signal from the first signal to obtain a user voice instruction for removing background noise; or,

and attenuating the second signal by adopting a preset coefficient to obtain a third signal, and subtracting the third signal from the first signal to obtain the user voice instruction for removing the background noise.

6. A method according to claim 5, wherein the value of the predetermined factor is determined from a window opening state value, which characterizes the degree to which the window is open.

7. A method according to claim 6, wherein the window open status value is determined in dependence on the largest window of all windows open.

8. A voice instruction recognition apparatus, characterized in that the apparatus comprises:

the user in-vehicle identity recognition unit is used for recognizing the in-vehicle identity of the user based on the first signal acquired by the first signal acquisition unit;

the second signal acquisition unit is used for judging whether a vehicle window corresponding to the in-vehicle identity of the user is in an open state or not, if so, determining the out-vehicle sound pickup device corresponding to the in-vehicle identity of the user according to the corresponding relation between the in-vehicle identity and the out-vehicle sound pickup device stored in the storage unit, and acquiring a second signal acquired by the out-vehicle sound pickup device;

the second signal acquisition unit is used for determining the external sound pickup devices corresponding to all the windows in the open state if the windows corresponding to the internal identity of the user are in the non-open state, and acquiring the acquired signals from the determined external sound pickup devices;

the noise removing unit is further used for carrying out averaging calculation on the signals acquired by the vehicle-mounted pickup device obtained through determination to obtain a fourth signal, and processing the first signal based on the fourth signal to obtain a user voice instruction for removing background noise;

9. The apparatus of claim 8, wherein the first signal obtaining unit obtains the first signal from an in-vehicle sound pickup apparatus, and the in-vehicle sound pickup apparatus is a microphone array or a plurality of microphones.

10. The apparatus according to claim 9, wherein the user in-vehicle identity identifying unit is specifically configured to:

11. The apparatus of claim 8, wherein the correspondence of the in-vehicle identity to the off-vehicle pickup devices is related to the number of the off-vehicle pickup devices.

12. The apparatus according to claim 8, wherein the noise removing unit is configured to process the first signal based on the second signal to obtain a user voice command for removing background noise, and specifically: and subtracting the second signal from the first signal to obtain a user voice instruction for removing the background noise, or attenuating the second signal by adopting a preset coefficient to obtain a third signal, and subtracting the third signal from the first signal to obtain the user voice instruction for removing the background noise.

13. The device according to claim 12, wherein the noise removing unit is further configured to determine a value of the preset coefficient according to a window opening state value, and the window opening state value represents a degree of window opening.

14. The apparatus of claim 13, wherein the noise removing unit is further configured to determine the window opening state value according to a window that is opened most among all windows.

15. The device of any one of claims 8 to 14, wherein the device is used in a vehicle machine.

16. A speech instruction recognition apparatus, characterized in that the apparatus comprises a processor and a memory, in which a computer program is stored which is executable on the processor, which computer program, when being executed by the processor, carries out the method according to any one of claims 1-7.

17. A computer-readable storage medium, in which a computer program operable on a processor is stored, which computer program, when executed, implements the method of any one of claims 1-7.

18. A voice command recognition system comprising the voice command recognition apparatus, the car interior sound pickup apparatus, and the car exterior sound pickup apparatus according to any one of claims 8 to 16.