CN113905323A

CN113905323A - Perceptual sound source height correction method applicable to service type robot playing audio

Info

Publication number: CN113905323A
Application number: CN202111261650.8A
Authority: CN
Inventors: 林志斌; 刘晓峻; 卢晶; 狄敏
Original assignee: Jiangsu Province Nanjing University Of Science And Technology Electronic Information Technology Co ltd; Nanjing Nanda Electronic Wisdom Service Robot Research Institute Co ltd; Nanjing University
Current assignee: Jiangsu Province Nanjing University Of Science And Technology Electronic Information Technology Co ltd; Nanjing Nanda Electronic Wisdom Service Robot Research Institute Co ltd; Nanjing University
Priority date: 2021-10-28
Filing date: 2021-10-28
Publication date: 2022-01-07
Anticipated expiration: 2041-10-28
Also published as: CN113905323B

Abstract

The invention discloses a method for correcting the height of a perceived sound source when a service robot plays audio, which comprises the following steps: the service type robot local device stores various Head Related Transfer Functions (HRTFs) to form a Head Related Transfer Function (HRTF) database, the various Head Related Transfer Functions (HRTFs) cover different height auditory altitude information, the service type robot acquires the altitude information of a listener of a man-machine interaction main body according to a multi-mode sensing interaction mode, matches the Head Related Transfer Functions (HRTFs) according to physiological altitude characteristics, then finely adjusts the matched Head Related Transfer Functions (HRTFs), convolves local audio data and outputs the data to the service type robot playback device. The invention can effectively correct the height of the human-computer interaction sound of the service robot in real time, and solves the problem of the differentiation of virtual sound images when different service robots and different listeners use the service robots to carry out human-computer interaction.

Description

Perceptual sound source height correction method applicable to service type robot playing audio

Technical Field

The invention relates to a method for correcting the height of a perceived sound source when a service robot plays audio, and belongs to the technical field of sound of robots.

Background

Three-dimensional audio has been primarily applied in the fields of movies, games, music and the like, can reconstruct virtual sound images at any spatial position, creates immersive sound scenes, and is commonly used in movie theaters, home theaters and the like.

With the rapid development of the robot service industry, the requirement of customers on the comfort of playing audio of the service robot is also improved. The quality of the audio played by the robot is the final expression form of the human-computer interaction of the interactive robot, and is most directly concerned by consumers, and the quality of the audio played by the robot directly influences the artificial experience of the robot audio interaction. How to effectively improve the quality of the audio played by the robot becomes an important problem, especially for the perception height of the audio played by the robot, the method is an important evaluation index for evaluating the audio alternating current quality of the robot.

Head-Related Transfer Function (HRTF) reflects the filtering effect of the external ear, Head, torso, etc. on sound signals that enter the human ear when the orientation of the sound signals is different. It describes how the human ear perceives sound from a point in space, some spectral variation being observed as the source varies along the elevation axis: there is a notch at 7kHz, with the frequency shifting upward as the elevation axis increases, with a shallow peak seen at 12kHz in the mid-line plane, and a flattening of this peak in the high elevation region. Thus, the perception of elevation angle may be associated with local maxima or hot spots at some particular frequencies. The sound produced by the human auditory system is almost elevated if the spectral differences between a high elevation sound source and a horizontal loudspeaker are applied to the original loudspeaker. HRTF is researched in universities and laboratories at home and abroad, and the theoretical method is applied to spaceflight, military affairs, games, sound and other aspects by people.

Disclosure of Invention

The purpose of the invention is as follows: in order to solve the problem of differentiation of virtual sound images when different service robots and different listeners use the service robots to carry out human-computer interaction, the invention provides a method for correcting the height of a perceived sound source when the service robots play audio.

The technical scheme is as follows: in order to achieve the purpose, the invention adopts the technical scheme that:

a perceived sound source height correction method suitable for a service robot to play audio comprises the following steps:

step 1, various head-related transfer functions HRTF are stored in the local device of the service robot to form a head-related transfer function HRTF database, and the various head-related transfer functions HRTF cover different height auditory altitude information.

And 2, the service type robot acquires the height information of the listener of the human-computer interaction subject according to the multi-mode sensing interaction mode to obtain the physiological height characteristic of the listener of the human-computer interaction subject.

And 3, matching head-related transfer function HRTFs in the head-related transfer function HRTF database according to the physiological height characteristics of the listener of the man-machine interaction subject, and selecting the corresponding head-related transfer function HRTFs.

And 4, calling the head related transfer function HRTF selected in the step 3, convolving the local audio data, and outputting the local audio data to the service type robot playback equipment.

And 5, the service type robot acquires the physiological height characteristics of the listener of the man-machine interaction subject according to the multi-mode sensing interaction mode.

And 6, carrying out height fine adjustment on the head-related transfer function HRTF obtained by matching in the step 3 by the service type robot in real time according to the physiological height characteristics of the listener of the man-machine interaction subject obtained in the step 5 to obtain the head-related transfer function HRTF after the height fine adjustment.

And 7, performing sound fine adjustment on the service type robot according to the head related transfer function HRTF after the height fine adjustment obtained in the step 6 to obtain the head related transfer function HRTF after the sound fine adjustment.

And 8, calling the head related transfer function HRTF subjected to sound fine adjustment obtained in the step 7, convolving the local audio data, and outputting the local audio data to the service type robot playback equipment.

And 9, repeating the steps 5-8 until the front and rear errors of the head related transfer function HRTF after fine adjustment are within a preset threshold range, and obtaining the optimal head related transfer function HRTF.

And step 10, calling the optimal head related transfer function HRTF obtained in the step 9, convolving the local audio data and outputting the local audio data to the service type robot playback equipment.

Preferably: in step 1, the head related transfer functions HRTFs include a head related transfer function HRTF obtained through actual measurement, a head related transfer function HRTF obtained through model simulation and numerical calculation, a head related transfer function HRTF obtained through feedback information correction according to a user, and a shared head related transfer function HRTF.

Preferably: in step 2, the method for the service robot to acquire the height information of the listener of the man-machine interaction subject according to the multi-mode sensing interaction mode comprises the following steps: and acquiring the height information of a listener of the man-machine interaction subject by adopting infrared, ultrasound or images.

Preferably: the matching method in step 3 is to call a Head Related Transfer Function (HRTF) of a corresponding angle according to the physiological height information.

Preferably: the sound fine-tuning method in step 7 comprises the following steps: and time delays of different playback speakers of the service robot are controlled, and the experience of the elevation angle of the virtual sound source is improved. And controlling the service type robot to adjust sound effect equalization and reverberation control.

Compared with the prior art, the invention has the following beneficial effects:

(1) the invention designs a correction method based on the listener physiological height characteristics in human-computer interaction acquired by the robot, is used for the self-adaptive matching of the robot with the listener perceived sound source height in human-computer interaction, improves the human-computer interaction friendliness of the service robot, and avoids mechanical playing of sound signals with the same perceived height.

(2) The service robot acquires listener height information of a human-computer interaction subject in real time in a multi-mode interaction mode.

(3) The problem of the perceived height of the service robot playing audio is corrected in real time, so that the service robot is more suitable for human-computer interaction listeners with different heights, and the human-computer interaction friendliness of the service robot is improved.

Drawings

Fig. 1 illustrates a method for synthesizing a virtual sound source in a time domain.

FIG. 2 is a flow chart of the present invention.

Detailed Description

The present invention is further illustrated by the following description in conjunction with the accompanying drawings and the specific embodiments, it is to be understood that these examples are given solely for the purpose of illustration and are not intended as a definition of the limits of the invention, since various equivalent modifications will occur to those skilled in the art upon reading the present invention and fall within the limits of the appended claims.

A method for correcting the height of a perceived sound source when a service robot plays audio, as shown in fig. 2, includes the following steps:

The various head-related transfer functions HRTFs include a head-related transfer function HRTF obtained through actual measurement by a professional method, a head-related transfer function HRTF obtained through model simulation and numerical calculation, a head-related transfer function HRTF obtained through correction according to user use feedback information, and a head-related transfer function HRTF shared by other mechanisms or users.

The invention firstly adopts the public HRTF data measured by CIPIC laboratory of Davis university of California, USA, to realize the synthesis of the playing perception height of the service robot.

As described above, the human brain discriminates the sound source direction in a three-dimensional space according to the spectral characteristics when sound reaches the eardrum. The response of the structure of the human body to sound, in particular the response of the pinna to sound, is called the "pinna effect". The "pinna effect" indicates that the human auditory system is functionally equivalent to a filter related to the spatial direction of sound, and the frequency spectrum of sound in different spatial directions is modified.

The essence of using HRTF data to achieve virtual sound source direction is to convolve the HRTF data with the sound signal to be processed. The process of synthesizing a virtual sound source in the time domain using a monophonic sound source signal is shown in fig. 1. The sound source signal is convoluted with HRTF data of the left ear and the right ear respectively, and the virtual sound source with the azimuth information can be heard through the retransmission of the loudspeaker.

The method for the service robot to acquire the height information of the listener of the man-machine interaction subject according to the multi-mode sensing interaction mode comprises the following steps: the height information of the listener of the man-machine interaction subject can be acquired by adopting an infrared mode, an ultrasonic mode or an image mode.

The matching method is to call a Head Related Transfer Function (HRTF) of a corresponding angle according to the physiological height information.

The convolution local audio data is loaded with standard or universal HRTF parameters, and the convolution local audio data is intended to be played for standard playback. The played content includes virtual sound source height sound effects based on the current HRTF direction.

The fine tuning training process can be realized by playing the same sound source, and the sound fine tuning method comprises the following steps: and time delays of different playback speakers of the service robot are controlled, and the experience of the elevation angle of the virtual sound source is improved. And controlling the service type robot to adjust sound effect equalization, reverberation control and the like.

And step 10, calling the optimal head related transfer function HRTF obtained in the step 9, convolving the local audio data, outputting the local audio data to the service type robot playback equipment, and outputting appropriate playback data.

The above description is only of the preferred embodiments of the present invention, and it should be noted that: it will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the principles of the invention and these are intended to be within the scope of the invention.

Claims

1. A method for correcting the height of a perceived sound source when a service robot plays audio is characterized by comprising the following steps:

step 1, various head-related transfer functions HRTF are stored in local equipment of the service robot to form a head-related transfer function HRTF database, and the various head-related transfer functions HRTF cover different height auditory altitude information;

step 2, the service type robot acquires the height information of the listener of the human-computer interaction subject according to the multi-mode sensing interaction mode to obtain the physiological height characteristic of the listener of the human-computer interaction subject;

step 3, matching head-related transfer function HRTFs in a head-related transfer function HRTF database according to the physiological height characteristics of the listener of the man-machine interaction subject, and selecting corresponding head-related transfer function HRTFs;

step 4, calling the head related transfer function HRTF selected in the step 3, convolving the local audio data, and outputting the local audio data to the service type robot playback equipment;

step 5, the service type robot acquires the physiological height characteristics of the listener of the man-machine interaction subject again according to the multi-mode sensing interaction mode;

step 6, the service type robot performs height fine adjustment on the head related transfer function HRTF obtained by matching in the step 3 in real time according to the physiological height characteristics of the listener of the man-machine interaction subject obtained in the step 5 to obtain the head related transfer function HRTF after the height fine adjustment;

step 7, the service robot carries out sound fine adjustment according to the head related transfer function HRTF after the height fine adjustment obtained in the step 6 to obtain the head related transfer function HRTF after the sound fine adjustment;

step 8, calling the head related transfer function HRTF after the sound fine adjustment obtained in the step 7, convolving the local audio data, and outputting the local audio data to the service type robot playback equipment;

step 9, repeating the steps 5-8 until the front and back errors of the head related transfer function HRTF after fine adjustment are within a preset threshold range, and obtaining an optimal head related transfer function HRTF;

2. The method for correcting the perceived sound source height when the service robot plays the audio according to claim 1, wherein: in step 1, the head related transfer functions HRTFs include a head related transfer function HRTF obtained through actual measurement, a head related transfer function HRTF obtained through model simulation and numerical calculation, a head related transfer function HRTF obtained through feedback information correction according to a user, and a shared head related transfer function HRTF.

3. The method for correcting the perceived sound source height when the service robot plays the audio according to claim 2, wherein: in step 2, the method for the service robot to acquire the height information of the listener of the man-machine interaction subject according to the multi-mode sensing interaction mode comprises the following steps: and acquiring the height information of a listener of the man-machine interaction subject by adopting infrared, ultrasound or images.

4. The method as claimed in claim 3, wherein the method comprises the following steps: the matching method in step 3 is to call a Head Related Transfer Function (HRTF) of a corresponding angle according to the physiological height information.

5. The method for correcting the perceived sound source height when the service robot plays the audio according to claim 4, wherein: the sound fine-tuning method in step 7 comprises the following steps: controlling time delays of different playback speakers of the service robot and improving experience of the elevation angle of the virtual sound source; and controlling the service type robot to adjust sound effect equalization and reverberation control.