CN113905323B

CN113905323B - Perception sound source height correction method suitable for service robot in audio playing

Info

Publication number: CN113905323B
Application number: CN202111261650.8A
Authority: CN
Inventors: 林志斌; 刘晓峻; 卢晶; 狄敏
Original assignee: Jiangsu Province Nanjing University Of Science And Technology Electronic Information Technology Co ltd; Nanjing Nanda Electronic Wisdom Service Robot Research Institute Co ltd; Nanjing University
Current assignee: Jiangsu Province Nanjing University Of Science And Technology Electronic Information Technology Co ltd; Nanjing Nanda Electronic Wisdom Service Robot Research Institute Co ltd; Nanjing University
Priority date: 2021-10-28
Filing date: 2021-10-28
Publication date: 2024-01-23
Anticipated expiration: 2041-10-28
Also published as: CN113905323A

Abstract

The invention discloses a perceived sound source height correction method suitable for a service robot when playing audio, which comprises the following steps: the service robot local equipment stores various Head Related Transfer Functions (HRTFs) to form a Head Related Transfer Function (HRTF) database, the various Head Related Transfer Functions (HRTFs) cover different height auditory height information, the service robot acquires the height information of a human-computer interaction main body listener according to a multimode sensing interaction mode, matches the Head Related Transfer Functions (HRTFs) according to physiological height characteristics, finely adjusts the matched Head Related Transfer Functions (HRTFs), convolves local audio data and outputs the convolved local audio data to the service robot playback equipment. The invention can effectively correct the human-computer interaction sound pitch problem of the service type robot in real time, and solves the problem of the differentiation of virtual sound images of different service type robots and different listeners when the service type robots are used for human-computer interaction.

Description

Perception sound source height correction method suitable for service robot in audio playing

Technical Field

The invention relates to a perceived sound source height correction method suitable for a service type robot when playing audio, and belongs to the technical field of sound of robots.

Background

Three-dimensional audio has been primarily used in the fields of movies, games, music, etc., and can reconstruct virtual sound images at arbitrary positions in space, creating immersed sound scenes, which are commonly found in movie theaters, home theaters, etc.

With the rapid development of the robot service industry, the requirements of customers on the playing audio comfort of the service robot are also improved. The audio playing quality of the robot is taken as the final expression form of the interactive robot man-machine interaction, and the robot is most intuitively focused by consumers, and the quality of the audio playing quality directly influences the artificial feeling of the robot audio interaction. How to effectively improve the quality of the audio played by the robot becomes an important problem, especially for the perception height of the audio played by the robot, and is an important evaluation index for evaluating the audio communication quality of the robot.

The Head-related transfer function (Head-Related Transfer Function, HRTF) reflects the filtering of sound signals by the outer ear, head, torso, etc. when the orientation of the sound signals entering the human ear is different. It describes how a person's ear perceives sound from a point in space, some spectral change can be observed as the sound source changes along the elevation axis: there is a notch at 7kHz and as the elevation axis increases, the frequency shifts upward, a shallow peak is seen at 12kHz at the midline plane, and in the high elevation region, this peak flattens out. Thus, the perception of elevation angle may be linked to local maxima or hot spots at some specific frequencies. The sound produced by the human auditory system is almost elevated if the spectral difference between the high elevation sound source and the horizontal speaker is applied to the original speaker. HRTF is studied in universities and laboratories at home and abroad, and the theoretical method has been applied to various aspects such as aerospace, military, games, sound and the like by people.

Disclosure of Invention

The invention aims to: in order to solve the problem of the difference of virtual sound images of different service robots and different listeners when the service robots are used for human-computer interaction, the invention provides a perceived sound source height correction method suitable for the service robots to play audio.

The technical scheme is as follows: in order to achieve the above purpose, the invention adopts the following technical scheme:

a perception sound source height correction method suitable for a service robot when playing audio comprises the following steps:

step 1, a service robot local device stores various Head Related Transfer Functions (HRTFs) to form a Head Related Transfer Function (HRTF) database, wherein the various Head Related Transfer Functions (HRTFs) cover different high auditory height information.

And 2, the service type robot acquires the height information of the human-computer interaction main listener according to the multimode sensing interaction mode to acquire the physiological height characteristics of the human-computer interaction main listener.

And 3, matching the Head Related Transfer Functions (HRTF) in the Head Related Transfer Function (HRTF) database according to the physiological height characteristics of the human-computer interaction main body listener, and selecting the corresponding Head Related Transfer Functions (HRTF).

And 4, calling the head related transfer function HRTF selected in the step 3, convoluting the local audio data, and outputting the convoluting local audio data to the service type robot playback equipment.

And 5, the service type robot acquires the physiological height characteristics of the human-computer interaction main body listener again according to the multimode sensing interaction mode.

And 6, performing high fine adjustment on the head-related transfer function HRTF obtained by matching in the step 3 according to the human-computer interaction main body listener physiological height characteristics obtained in the step 5 by the service robot in real time, and obtaining the head-related transfer function HRTF after the high fine adjustment.

And 7, performing sound fine adjustment by the service robot according to the head related transfer function HRTF subjected to the high fine adjustment obtained in the step 6, and obtaining the head related transfer function HRTF subjected to the sound fine adjustment.

And 8, calling the Head Related Transfer Function (HRTF) obtained in the step 7 after the sound fine tuning, convoluting the local audio data, and outputting the convoluting local audio data to the service type robot playback equipment.

And 9, repeating the steps 5-8 until the errors before and after the fine-tuned Head Related Transfer Function (HRTF) are within a preset threshold range, and obtaining the optimal Head Related Transfer Function (HRTF).

And step 10, calling the optimal Head Related Transfer Function (HRTF) obtained in the step 9, convoluting the local audio data, and outputting the convoluting local audio data to the service type robot playback equipment.

Preferably: the various head related transfer functions HRTFs in step 1 include a head related transfer function HRTF obtained through actual measurement, a head related transfer function HRTF obtained through model simulation and numerical calculation, a head related transfer function HRTF obtained through correction according to feedback information of a user, and a shared head related transfer function HRTF.

Preferably: the method for the service robot to acquire the height information of the human-computer interaction main listener according to the multimode sensing interaction mode in the step 2 comprises the following steps: and acquiring the height information of the human-computer interaction main listener by adopting infrared, ultrasonic or image.

Preferably: the matching method in the step 3 is to call the Head Related Transfer Function (HRTF) of the corresponding angle according to the physiological height information.

Preferably: the sound fine tuning method in step 7 includes: and controlling the time delay of different playback speakers of the service type robot, and improving the experience of the elevation angle of the virtual sound source. And controlling the service type robot to adjust the sound effect balance and the reverberation control.

Compared with the prior art, the invention has the following beneficial effects:

(1) The invention designs a correction method based on the physiological height characteristics of the listener in the human-computer interaction acquired by the robot, is used for the robot to adaptively match with the perceived sound source height of the listener in the human-computer interaction, improves the human-computer interaction friendliness of the service robot, and avoids the mechanized playing of sound signals with the same perceived height.

(2) The service type robot adopts a multimode interaction mode to acquire the listener height information of the human-computer interaction main body in real time.

(3) The perception height problem of the audio played by the service type robot is corrected in real time, so that the service type robot is more suitable for human-computer interaction listeners with different heights, and the human-computer interaction friendliness of the service type robot is improved.

Drawings

Fig. 1 is a method of time-domain synthesizing a virtual sound source.

Fig. 2 is a flow chart of the present invention.

Detailed Description

The present invention is further illustrated in the accompanying drawings and detailed description which are to be understood as being merely illustrative of the invention and not limiting of its scope, and various equivalent modifications to the invention will fall within the scope of the appended claims to the skilled person after reading the invention.

A perceived sound source height correction method suitable for a service robot when playing audio, as shown in figure 2, comprises the following steps:

The various head-related transfer functions HRTF include a head-related transfer function HRTF obtained by actual measurement by a professional method, a head-related transfer function HRTF obtained by model simulation and numerical calculation, a head-related transfer function HRTF obtained by correction according to user using feedback information, other mechanisms, or a head-related transfer function HRTF shared by users.

The invention firstly adopts the published HRTF data measured by the Davis division CIPIC laboratory of university California in the United states to realize the synthesis of the playing perception height of the service robot.

As described above, the human brain discriminates the direction of a sound source in three-dimensional space from the spectral characteristics when the sound reaches the eardrum. The response of the structure of the human body to sound, in particular the response of the auricle to sound, is called the "auricle effect". The "auricle effect" describes that the human auditory system functionally corresponds to a filter associated with the spatial direction of sound, the spectrum of which is modified for sounds in different spatial directions.

The essence of using HRTF data to achieve virtual sound source direction is to convolve HRTF data with the sound signal to be processed. The process of synthesizing a virtual sound source in the time domain using a mono sound source signal is shown in fig. 1. And (3) convolving the sound source signals with the HRTF data of the left ear and the right ear respectively, and retransmitting the signals through a loudspeaker to obtain the virtual sound source with azimuth information.

The service type robot acquires the height information of a human-computer interaction main body listener according to a multimode sensing interaction mode: the height information of the human-computer interaction main listener can be obtained by adopting infrared, ultrasonic or image modes and the like.

The matching method is to call the Head Related Transfer Function (HRTF) of the corresponding angle according to the physiological height information.

The convolution local audio data is loaded with standard or universal HRTF parameters, and the convolution local audio data is to be played, and standard playback is carried out. The content of the playback includes virtual sound source height sound effects based on the current HRTF direction.

The process of fine tuning training can be realized by playing the same sound source, and the sound fine tuning method comprises the following steps: and controlling the time delay of different playback speakers of the service type robot, and improving the experience of the elevation angle of the virtual sound source. Controlling the service type robot to adjust sound effect balance, reverberation control and the like.

And step 10, calling the optimal Head Related Transfer Function (HRTF) obtained in the step 9, convoluting the local audio data, outputting the local audio data to the service robot playback equipment, and outputting proper playback data.

The foregoing is only a preferred embodiment of the invention, it being noted that: it will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the principles of the present invention, and such modifications and adaptations are intended to be comprehended within the scope of the invention.

Claims

1. The perceived sound source height correction method suitable for the service type robot when playing the audio is characterized by comprising the following steps:

step 1, a service robot local device stores various Head Related Transfer Functions (HRTFs) to form a Head Related Transfer Function (HRTF) database, wherein the various Head Related Transfer Functions (HRTFs) cover different height hearing height information;

step 2, the service robot acquires the height information of the human-computer interaction main listener according to the multimode sensing interaction mode to acquire the physiological height characteristics of the human-computer interaction main listener;

the service type robot acquires the height information of a human-computer interaction main body listener according to a multimode sensing interaction mode: acquiring the height information of a human-computer interaction main listener by adopting infrared, ultrasonic or images;

step 3, matching the head related transfer functions HRTF in the head related transfer function HRTF database according to the physiological height characteristics of the human-computer interaction main body listener, and selecting the corresponding head related transfer functions HRTF;

step 4, calling the head related transfer function HRTF selected in the step 3, convoluting the local audio data, and outputting the convoluting local audio data to the service type robot playback equipment;

step 5, the service robot acquires the physiological height characteristics of the listener of the human-computer interaction main body again according to the multimode sensing interaction mode;

step 6, the service robot performs high fine adjustment on the head related transfer function HRTF obtained by matching in the step 3 according to the human-computer interaction main body listener physiological height characteristics obtained in the step 5 in real time, and the head related transfer function HRTF after the high fine adjustment is obtained;

step 7, the service robot performs sound fine adjustment according to the head related transfer function HRTF after the height fine adjustment obtained in the step 6, and obtains the head related transfer function HRTF after the sound fine adjustment;

the sound fine tuning method comprises the following steps: controlling the time delay of different playback speakers of the service robot, and improving the experience of the elevation angle of the virtual sound source; controlling a service robot to adjust sound effect balance and reverberation control;

step 8, calling the Head Related Transfer Function (HRTF) obtained in the step 7 after the sound fine adjustment, convoluting local audio data, and outputting the local audio data to the service type robot playback equipment;

step 9, repeating the steps 5-8 until the errors before and after the fine-tuned Head Related Transfer Function (HRTF) are within a preset threshold range, and obtaining the optimal Head Related Transfer Function (HRTF);

2. The method for correcting the perceived sound source height when the service robot plays the audio according to claim 1, wherein: the various head related transfer functions HRTFs in step 1 include a head related transfer function HRTF obtained through actual measurement, a head related transfer function HRTF obtained through model simulation and numerical calculation, a head related transfer function HRTF obtained through correction according to feedback information of a user, and a shared head related transfer function HRTF.

3. The perceived sound source height correction method suitable for use in a service robot playing audio according to claim 2, wherein: the matching method in the step 3 is to call the Head Related Transfer Function (HRTF) of the corresponding angle according to the physiological height information.