CN114283798A

CN114283798A - Radio receiving method of handheld device and handheld device

Info

Publication number: CN114283798A
Application number: CN202110799098.1A
Authority: CN
Inventors: 杨香斌
Original assignee: Hisense Visual Technology Co Ltd
Current assignee: Hisense Visual Technology Co Ltd
Priority date: 2021-07-15
Filing date: 2021-07-15
Publication date: 2022-04-05

Abstract

The embodiment provides a sound receiving method of a handheld device and the handheld device. If the controller of the handheld device receives the screen lifting signal sent by the screen lifting sensor, the controller controls the sound collector to start so that the sound collector collects the voice signal input by the user. And further calculating the sound source angle between the user and the handheld device according to the voice signal. And if the sound source angle is within the preset sound source angle range, performing voice recognition processing on the voice signal. And if the sound source angle is not within the preset sound source angle range, the voice recognition processing is not carried out on the voice signal. The method and the device can realize that the sound source angle between the user and the handheld device is calculated when the user takes up the handheld device. Whether the user has the voice interaction intention or not is judged according to the sound source angle, and the condition of mistaken reception is avoided, so that the use experience of the user is improved.

Description

Radio receiving method of handheld device and handheld device

Technical Field

The application relates to the technical field of voice interaction, in particular to a reception method of a handheld device and the handheld device.

Background

With the development of voice interaction technology, more and more home terminal devices have a voice interaction function. By using the voice interaction function, the user can control the terminal devices to perform corresponding operations, such as starting, stopping and the like.

At present, for voice interaction of a handheld device, a user is usually required to press a key or input a wake-up word to trigger the handheld device to start a voice interaction function. In order to improve the user experience, besides the two ways, a way of lifting the wake-up is also defined. For example, a user may pick up the remote control from a desktop and trigger a voice interaction function of the remote control.

However, in the lift-off wake mode, if the user is simply picking up the handheld device, there is no intent to interact with the voice. At this time, if the user is speaking at the same time, the situation of mistaken reception is easily caused, and the user experience is poor.

Disclosure of Invention

The application provides a sound receiving method of a handheld device and the handheld device, which are used for solving the problem that in a lifting and awakening mode, if a user only picks up the handheld device, no voice interaction intention exists. At this time, if the user is speaking at the same time, the situation of mistaken reception is easily caused, resulting in the problem of poor user experience.

In a first aspect, the present embodiment provides a handheld device, including:

a sound collector configured to collect a voice signal input by a user;

a screen lifting sensor configured to detect whether the handheld device is lifted;

a controller configured to:

when a screen lifting signal sent by the screen lifting sensor is received, controlling the sound collector to start so that the sound collector collects a voice signal input by a user, wherein the screen lifting signal is used for indicating that the handheld device is lifted;

calculating a sound source angle between a user and the handheld device according to the voice signal, and executing voice recognition processing on the voice signal when the sound source angle is within a preset sound source angle range;

and when the sound source angle is not within the preset sound source angle range, performing voice recognition processing on the voice signal.

In a second aspect, this embodiment provides a sound reception method for a handheld device, where the method is applied to a controller of the handheld device, and includes:

when a screen lifting signal sent by a screen lifting sensor is received, controlling a sound collector to start so that the sound collector collects a voice signal input by a user, wherein the screen lifting signal is used for indicating that the handheld device is lifted;

The embodiment provides a radio receiving method of a handheld device of a terminal device and the handheld device, wherein if a controller of the handheld device receives a screen lifting signal sent by a screen lifting sensor, the controller controls a sound collector to start so that the sound collector collects a voice signal input by a user. And further calculating the sound source angle between the user and the handheld device according to the voice signal. And if the sound source angle is within the preset sound source angle range, performing voice recognition processing on the voice signal. And if the sound source angle is not within the preset sound source angle range, the voice recognition processing is not carried out on the voice signal. The method and the device can realize that the sound source angle between the user and the handheld device is calculated when the user takes up the handheld device. Whether the user has the voice interaction intention or not is judged according to the sound source angle, and the condition of mistaken reception is avoided, so that the use experience of the user is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

FIG. 1 illustrates a schematic diagram of the principles of voice interaction, in accordance with some embodiments;

FIG. 2 illustrates a radio reception system framework diagram of a handheld device, according to some embodiments;

FIG. 3 illustrates a schematic diagram of a method of computing an angle of a sound source of a handheld device according to some embodiments;

FIG. 4 illustrates a handheld device sound source angle scene schematic in accordance with some embodiments;

fig. 5 illustrates a signaling diagram of a sound reception method for a handheld device, according to some embodiments.

Detailed Description

To make the purpose and embodiments of the present application clearer, the following will clearly and completely describe the exemplary embodiments of the present application with reference to the attached drawings in the exemplary embodiments of the present application, and it is obvious that the described exemplary embodiments are only a part of the embodiments of the present application, and not all of the embodiments.

It should be noted that the brief descriptions of the terms in the present application are only for the convenience of understanding the embodiments described below, and are not intended to limit the embodiments of the present application. These terms should be understood in their ordinary and customary meaning unless otherwise indicated.

The terms "first," "second," "third," and the like in the description and claims of this application and in the above-described drawings are used for distinguishing between similar or analogous objects or entities and not necessarily for describing a particular sequential or chronological order, unless otherwise indicated. It is to be understood that the terms so used are interchangeable under appropriate circumstances.

The terms "comprises" and "comprising," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a product or apparatus that comprises a list of elements is not necessarily limited to all elements expressly listed, but may include other elements not expressly listed or inherent to such product or apparatus.

The term "module" refers to any known or later developed hardware, software, firmware, artificial intelligence, fuzzy logic, or combination of hardware or/and software code that is capable of performing the functionality associated with that element.

For clarity of explanation of the embodiments of the present application, a speech recognition network architecture provided by the embodiments of the present application is described below with reference to fig. 1.

Referring to fig. 1, fig. 1 is a schematic diagram of a voice recognition network architecture according to an embodiment of the present application. In fig. 1, the smart device is configured to receive input information and output a processing result of the information. The voice recognition service equipment is electronic equipment with voice recognition service deployed, the semantic service equipment is electronic equipment with semantic service deployed, and the business service equipment is electronic equipment with business service deployed. The electronic device may include a server, a computer, and the like, and the speech recognition service, the semantic service (also referred to as a semantic engine), and the business service are web services that can be deployed on the electronic device, wherein the speech recognition service is used for recognizing audio as text, the semantic service is used for semantic parsing of the text, and the business service is used for providing specific services such as a weather query service for ink weather, a music query service for QQ music, and the like. In one embodiment, in the architecture shown in fig. 1, there may be multiple entity service devices deployed with different business services, and one or more function services may also be aggregated in one or more entity service devices.

In some embodiments, the following describes an example of a process for processing information input to a smart device based on the architecture shown in fig. 1, where the information input to the smart device is an example of a query statement input by voice, the process may include the following three processes:

[ Speech recognition ]

The intelligent device can upload the audio of the query sentence to the voice recognition service device after receiving the query sentence input by voice, so that the voice recognition service device can recognize the audio as a text through the voice recognition service and then return the text to the intelligent device. In one embodiment, before uploading the audio of the query statement to the speech recognition service device, the smart device may perform denoising processing on the audio of the query statement, where the denoising processing may include removing echo and environmental noise.

[ semantic understanding ]

The intelligent device uploads the text of the query sentence identified by the voice identification service to the semantic service device, and the semantic service device performs semantic analysis on the text through semantic service to obtain the service field, intention and the like of the text.

[ semantic response ]

And the semantic service equipment issues a query instruction to corresponding business service equipment according to the semantic analysis result of the text of the query statement so as to obtain the query result given by the business service. The intelligent device can obtain the query result from the semantic service device and output the query result. As an embodiment, the semantic service device may further send a semantic parsing result of the query statement to the intelligent device, so that the intelligent device outputs a feedback statement in the semantic parsing result.

It should be noted that the architecture shown in fig. 1 is only an example, and does not limit the scope of the present application. In the embodiment of the present application, other architectures may also be adopted to implement similar functions, for example: all or part of the three processes can be completed by the intelligent terminal, and are not described herein.

In some embodiments, the intelligent device shown in fig. 1 may be a display device, such as an intelligent television, the functions of the speech recognition service device may be implemented by cooperation of a sound collector and a controller provided on the display device, and the functions of the semantic service device and the business service device may be implemented by the controller of the display device or by a server of the display device.

However, in the lift-off wake mode, if the user is simply picking up the handheld device, there is no intent to interact with the voice. At this time, if the user is speaking at the same time, the condition of wrong reception is easy to occur, resulting in poor user experience.

In order to solve the above problems, the present application provides a sound reception system of a handheld device, such as a sound reception system framework diagram of a handheld device shown in fig. 2. The system includes at least a user layer, a sensing layer, and a system layer. The user layer is a hierarchy for the user to perform input operation to input signals. The sensing layer is provided with various sensors for sensing the operation of a user and converting the operation of the user into a hierarchy of related signals. The system layer is a layer that receives signals and performs related operations on other components according to the received signals. The system layer is mainly a main chip, namely a controller. The system layer may also include components such as a memory module, a power supply, etc. The storage module can store information such as preset parameters.

It should be noted that the main chip in the embodiment of the present application may be disposed in a handheld device, and may also be disposed in other devices controlled by the handheld device. For example, when the handheld device is a remote controller, the main chip may be disposed in the display device or in the remote controller. The embodiment of the application uses the main chip to be arranged in the handheld device for the explanation of the scheme. It should be noted that the handheld device provided in the present application may be any terminal such as a mobile phone, a wearable device, an AR (Augmented Reality)/VR (Virtual Reality) device, a tablet Computer, a notebook Computer, a UMPC (Ultra-mobile Personal Computer), a netbook, a PDA (Personal Digital Assistant), and the like, and the embodiment of the present application is not limited thereto.

In some embodiments, in the user layer, the information input by the user may include: and (5) carrying out operation of the handheld equipment and inputting voice.

Further sensing layers may include a sound collector and a screen lift sensor. The sound collector is used for collecting voice signals input by a user, and the screen lifting sensor is used for detecting whether the handheld device is picked up or not. The sound collector can be a microphone, and the screen lifting sensor can be an acceleration sensor.

An acceleration sensor is a sensor capable of measuring acceleration. The damper is generally composed of a mass block, a damper, an elastic element, a sensitive element, an adjusting circuit and the like. In the acceleration process, the sensor obtains an acceleration value by measuring the inertial force borne by the mass block and utilizing Newton's second law. Common acceleration sensors include capacitive type, inductive type, etc. according to the sensor sensing element. The acceleration sensor used in the embodiment of the present application is not limited, and for example, a three-axis acceleration sensor may be used.

If the handheld device is picked up by a user, acceleration changes can be detected by the acceleration sensor. The acceleration sensor generates a screen lifting signal according to the acceleration change and sends the screen lifting signal to the main chip. And after receiving the screen lifting signal, the main chip controls the sound collector to start. And after the sound collector is started, the sound collector starts to collect the voice signal input by the user.

In some embodiments, the handheld device may also include a display. And after receiving a screen lifting signal sent by the acceleration sensor, the main chip controls the display to light the screen. A prompt such as "voice interaction or not" may be displayed on the user interface that lights up the screen.

After the voice signal that the user input is gathered to the sound collector, send voice signal to main chip. And the main chip calculates the sound source angle between the user and the handheld device according to the voice signal. And if the sound source angle between the user and the handheld device is within the preset sound source angle range, representing that the user intends to perform voice interaction, and performing voice recognition processing on the collected voice signals. It should be noted that the speech recognition processing procedure belongs to the prior art, and the detailed description is not provided in this application. The processes of performing intention judgment and intention execution according to the recognition result after performing the voice recognition processing also belong to the prior art, and detailed explanation is not provided in the present application.

If the sound source angle between the user and the handheld device is not within the preset sound source angle range, the fact that the user does not intend to carry out voice interaction is indicated, and voice recognition processing is not carried out on the collected voice signals.

In some embodiments, the sound collector may be a microphone array including at least two microphones. Calculating a sound source angle between a user and the handheld device according to the voice signal, specifically:

fig. 3 is a schematic diagram of a sound source angle calculation method. In fig. 3, a far-field incident signal is a voice signal input by a user, and a wave front means that a spherical surface is formed from a sound source to a propagation direction when a sound wave propagates. From a long distance, the wave front can be regarded as a plane, and calculation is also convenient.

P microphones

1, 2.. p-2, p-1, p are shown in fig. 3. The calculation formula of the time difference of the voice signals sent by the sound source reaching each microphone is as follows:

wherein p is the number of microphones, d is the distance between two adjacent microphones, and theta_mIs the sound source angle between the user and the handheld device. As shown in fig. 4, the sound source angle may specifically be the angle between the user and the central axis of the handheld device. The matrix can be constructed according to the formula:

wherein x is₁...x_pIs the X-coordinate value, y, of the microphone₁...y_pAs a Y-coordinate value τ of the microphone₁₂...τ_(p-1)pIs the time difference of the arrival of the voice signal sent by the sound source at the two adjacent microphones. From this matrix, p sound source angle values can be obtained. And finally, calculating by using a least square method to obtain a final sound source angle value. Detailed description of the inventionThe process is prior art and is not described in detail in this application. In addition to the above-described sound source localization method using time difference of arrival, other sound source localization methods may be adopted in the embodiments of the present application. E.g., based on high resolution spectral estimation. The basic principle is to obtain the corresponding received energy size at each angle by using a spatial spectrum scanning mode. And further determining the angle of the received maximum energy as the angle of arrival of the sound source.

The preset sound source angle range in the embodiment of the present application may be set by a user according to experience. For example, the preset sound source angle is set to 90 ° ± 15 °, if the calculated sound source angle is within the range of 90 ° ± 15 °, it may be determined that the user intends to perform voice interaction. It may be that the user picks up the handheld device and places the microphone of the handheld device in a range of positions that are offset from side to side by no more than 15 deg. directly in front of the user's mouth. If the calculated sound source angle is not within the range of 90 ° ± 15 °, it may be determined that the user does not intend to perform voice interaction. It may be that the user simply picks up the handheld device but does not place the microphone of the handheld device directly in front of the user's mouth with a positional offset of no more than 15 left and right.

For example, as shown in FIG. 4, if the user is in the B position, where the sound source angle is 90, the user is directly in front of the handheld device. If the user is in the position A, the included angle between the user and the central axis of the handheld device is 10 degrees, the sound source angle is 90 degrees +/-10 degrees, and the user intention can be determined to carry out voice interaction. If the angle between the user and the central axis of the handheld device is 50 degrees when the user is at the position C, the angle of the sound source is 90 degrees +/-50 degrees, and the value of the angle is not within the preset angle range of the sound source. It may be determined that the user did not intend to engage in a voice interaction.

In some embodiments, if it is determined that the sound source angle is within the preset sound source angle range, it is further determined whether the voice activity detection result is an end. If the voice activity detection result is not finished, the voice activity detection result indicates that the user continues to input the voice signal. The sound collector continuously collects the voice signal input by the user. And finally, the main chip performs voice recognition processing according to the complete voice signal. If the voice activity detection result is end, it indicates that the user does not continue inputting the voice signal. And the main chip performs voice recognition processing according to the currently acquired voice signal.

Illustratively, the user inputs a voice signal "turn on the television" after picking up the remote controller. And the main chip of the remote controller judges that the sound source angle is within a preset sound source angle range, namely, the user is determined to have the voice interaction intention. At this time, the user continues to input the voice signal "jump to the center set", and the voice activity detection result is not ended at this time. The sound collector continues to collect the voice signal input by the user, and obtains a complete voice signal, namely, turning on the television and jumping to the central set. And finally, turning on a television for the completed voice signal, and jumping to a central set for voice recognition processing. If the user does not continue inputting the voice signal after the user is determined to have the voice interaction intention, the voice activity detection result is ended at the moment. And finally, performing voice recognition processing according to the current voice signal 'turn on television'.

In some embodiments, if the sound source angle is not within the preset sound source angle range, the collection of the voice signal input by the user is stopped regardless of whether the voice activity detection result is not finished or finished.

In some embodiments, if the sound source angle is within the preset sound source angle range and the handheld device has a display, the result of performing the voice recognition process on the voice signal is finally displayed on the display.

In some embodiments, if the sound source angle is not within the preset sound source angle range, after determining that it is not within the preset sound source angle range, controlling the screen of the display to be extinguished.

In some embodiments, if the controller receives the screen-up signal, the controller controls the sound collector to start up, but the sound collector does not collect the voice signal, and the prompting language "please speak again" can be displayed on the display, and meanwhile, an alarm sound is given to remind the user. And if the voice signal is not collected by the sound collector within the preset time period, determining that the user has no voice interaction intention, and controlling the display to extinguish the screen. And if the voice signal is collected by the sound collector within the preset time period, determining that the user has the voice interaction intention, and performing voice recognition processing on the collected voice signal.

Based on the foregoing embodiments, the present application further provides a sound receiving method for a handheld device, as shown in the signaling diagram shown in fig. 5, the method includes the following steps:

step one, a user takes up the handheld device, a screen lifting sensor generates a screen lifting signal after detecting that the user takes up the handheld device, and the screen lifting signal is sent to a controller.

And step two, after receiving the screen lifting signal, the controller controls the sound collector to start. The sound collector collects a voice signal input by a user and sends the voice signal to the controller.

And step three, the controller calculates the sound source angle between the user and the handheld device according to the voice signal.

And step four, if the sound source angle is within the preset sound source angle range, performing voice recognition processing on the voice signals.

And step five, if the sound source angle is not within the preset sound source angle range, performing voice recognition processing on the voice signal.

In some embodiments, the sound collector is a microphone array, the microphone array comprising at least two microphones. The method comprises the following steps of calculating the sound source angle between a user and the handheld device according to a voice signal:

and calculating the sound source angle between the user and the handheld device according to the distance between two adjacent microphones and the time difference of the two adjacent microphones in receiving the voice signals. Wherein the sound source angle is the angle between the user and the central axis of the handheld device

In some embodiments, if the sound source angle is within the preset sound source angle range, it is continuously determined whether the voice activity detection result is over. If the voice activity detection result is not finished, the voice collector still can continuously collect the voice signal input by the user. The controller continues to receive the voice signal and finally performs voice recognition processing on the complete voice signal. If the voice activity detection result is end, the voice collector can not collect the voice signal input by the user any more. The controller performs a voice recognition process on the current voice signal.

In some embodiments, if the sound source angle is not within the preset sound source angle range, the sound collector is controlled to stop collecting the voice signal input by the user regardless of whether the voice activity detection result is over.

In some embodiments, the handheld device further comprises a display. And when receiving a screen lifting signal sent by the screen lifting sensor, the controller also controls the display to light the screen. After the screen is lighted up, if the sound source angle is within the preset sound source angle range, the result after the voice recognition processing is performed on the voice signal is displayed on the display. And if the sound source angle is not within the preset sound source angle range, controlling the display to extinguish the screen.

The same or similar contents in the embodiments of the present application may be referred to each other, and the related embodiments are not described in detail.

Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present application.

The foregoing description, for purposes of explanation, has been presented in conjunction with specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the embodiments to the precise forms disclosed above. Many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles and the practical application, to thereby enable others skilled in the art to best utilize the embodiments and various embodiments with various modifications as are suited to the particular use contemplated.

Claims

1. A handheld device, comprising:

a sound collector configured to collect a voice signal input by a user;

a controller configured to:

2. The handheld device of claim 1, wherein the sound collector comprises at least two microphones, and the sound source angle between the user and the handheld device is calculated according to the voice signal by the following specific steps:

according to the distance between two adjacent microphones and the time difference of the two adjacent microphones receiving the voice signals, calculating the sound source angle between the user and the handheld device, wherein the sound source angle is the angle between the user and the central axis of the handheld device.

3. The handheld device of claim 1, wherein the controller is further configured to: when the sound source angle is within a preset sound source angle range and a voice activity detection result is not finished, continuing to receive the voice signal and performing voice recognition processing on the received complete voice signal, wherein when the voice activity detection result is not finished, the voice collector can still continue to collect the voice signal input by the user;

the voice recognition processing method comprises the steps that a voice source angle is within a preset voice source angle range, when a voice activity detection result is end, voice recognition processing is carried out on a currently collected voice signal, wherein when the voice activity detection result is end, the voice collector cannot continuously collect the voice signal input by a user.

4. The handheld device of claim 1, wherein the controller is further configured to:

and when the sound source angle is not within the preset sound source angle range and the voice activity detection result is not finished or the voice activity detection result is finished, stopping collecting the voice signals input by the user.

5. The handheld device of claim 1, wherein the handheld device further comprises a display, and wherein upon receiving the screen lifting signal sent by the screen lifting sensor, the controller is further configured to: and controlling the display to light up the screen.

6. The handheld device of claim 5, wherein when the sound source angle is within the preset sound source angle range, the controller is further configured to: and displaying a result of performing voice recognition processing on the voice signal on the display.

7. The handheld device of claim 5, wherein when the sound source angle is not within the preset sound source angle range, the controller is further configured to: and controlling the display to extinguish the screen.

8. A sound reception method of a handheld device is applied to a controller of the handheld device and comprises the following steps:

9. The sound reception method of the handheld device according to claim 8, wherein the sound collector includes at least two microphones, and calculates a sound source angle between the user and the handheld device according to the voice signal, specifically including the steps of:

10. The method for radio reception in a handheld device of claim 8, the method further comprising: when the sound source angle is within a preset sound source angle range and a voice activity detection result is not finished, continuing to receive the voice signal and performing voice recognition processing on the received complete voice signal, wherein when the voice activity detection result is not finished, the voice collector can still continue to collect the voice signal input by the user;