CN113905323B - Perception sound source height correction method suitable for service robot in audio playing - Google Patents

Perception sound source height correction method suitable for service robot in audio playing Download PDF

Info

Publication number
CN113905323B
CN113905323B CN202111261650.8A CN202111261650A CN113905323B CN 113905323 B CN113905323 B CN 113905323B CN 202111261650 A CN202111261650 A CN 202111261650A CN 113905323 B CN113905323 B CN 113905323B
Authority
CN
China
Prior art keywords
related transfer
head related
hrtf
transfer function
height
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111261650.8A
Other languages
Chinese (zh)
Other versions
CN113905323A (en
Inventor
林志斌
刘晓峻
卢晶
狄敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Province Nanjing University Of Science And Technology Electronic Information Technology Co ltd
Nanjing Nanda Electronic Wisdom Service Robot Research Institute Co ltd
Nanjing University
Original Assignee
Jiangsu Province Nanjing University Of Science And Technology Electronic Information Technology Co ltd
Nanjing Nanda Electronic Wisdom Service Robot Research Institute Co ltd
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Province Nanjing University Of Science And Technology Electronic Information Technology Co ltd, Nanjing Nanda Electronic Wisdom Service Robot Research Institute Co ltd, Nanjing University filed Critical Jiangsu Province Nanjing University Of Science And Technology Electronic Information Technology Co ltd
Priority to CN202111261650.8A priority Critical patent/CN113905323B/en
Publication of CN113905323A publication Critical patent/CN113905323A/en
Application granted granted Critical
Publication of CN113905323B publication Critical patent/CN113905323B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • H04S7/302Electronic adaptation of stereophonic sound system to listener position or orientation
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J11/00Manipulators not otherwise provided for
    • B25J11/0005Manipulators having means for high-level communication with users, e.g. speech generator, face recognition means
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J11/00Manipulators not otherwise provided for
    • B25J11/008Manipulators for service tasks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2420/00Techniques used stereophonic systems covered by H04S but not provided for in its groups
    • H04S2420/01Enhancing the perception of the sound image or of the spatial distribution using head related transfer functions [HRTF's] or equivalents thereof, e.g. interaural time difference [ITD] or interaural level difference [ILD]

Landscapes

  • Engineering & Computer Science (AREA)
  • Robotics (AREA)
  • Mechanical Engineering (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Stereophonic System (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The invention discloses a perceived sound source height correction method suitable for a service robot when playing audio, which comprises the following steps: the service robot local equipment stores various Head Related Transfer Functions (HRTFs) to form a Head Related Transfer Function (HRTF) database, the various Head Related Transfer Functions (HRTFs) cover different height auditory height information, the service robot acquires the height information of a human-computer interaction main body listener according to a multimode sensing interaction mode, matches the Head Related Transfer Functions (HRTFs) according to physiological height characteristics, finely adjusts the matched Head Related Transfer Functions (HRTFs), convolves local audio data and outputs the convolved local audio data to the service robot playback equipment. The invention can effectively correct the human-computer interaction sound pitch problem of the service type robot in real time, and solves the problem of the differentiation of virtual sound images of different service type robots and different listeners when the service type robots are used for human-computer interaction.

Description

Perception sound source height correction method suitable for service robot in audio playing
Technical Field
The invention relates to a perceived sound source height correction method suitable for a service type robot when playing audio, and belongs to the technical field of sound of robots.
Background
Three-dimensional audio has been primarily used in the fields of movies, games, music, etc., and can reconstruct virtual sound images at arbitrary positions in space, creating immersed sound scenes, which are commonly found in movie theaters, home theaters, etc.
With the rapid development of the robot service industry, the requirements of customers on the playing audio comfort of the service robot are also improved. The audio playing quality of the robot is taken as the final expression form of the interactive robot man-machine interaction, and the robot is most intuitively focused by consumers, and the quality of the audio playing quality directly influences the artificial feeling of the robot audio interaction. How to effectively improve the quality of the audio played by the robot becomes an important problem, especially for the perception height of the audio played by the robot, and is an important evaluation index for evaluating the audio communication quality of the robot.
The Head-related transfer function (Head-Related Transfer Function, HRTF) reflects the filtering of sound signals by the outer ear, head, torso, etc. when the orientation of the sound signals entering the human ear is different. It describes how a person's ear perceives sound from a point in space, some spectral change can be observed as the sound source changes along the elevation axis: there is a notch at 7kHz and as the elevation axis increases, the frequency shifts upward, a shallow peak is seen at 12kHz at the midline plane, and in the high elevation region, this peak flattens out. Thus, the perception of elevation angle may be linked to local maxima or hot spots at some specific frequencies. The sound produced by the human auditory system is almost elevated if the spectral difference between the high elevation sound source and the horizontal speaker is applied to the original speaker. HRTF is studied in universities and laboratories at home and abroad, and the theoretical method has been applied to various aspects such as aerospace, military, games, sound and the like by people.
Disclosure of Invention
The invention aims to: in order to solve the problem of the difference of virtual sound images of different service robots and different listeners when the service robots are used for human-computer interaction, the invention provides a perceived sound source height correction method suitable for the service robots to play audio.
The technical scheme is as follows: in order to achieve the above purpose, the invention adopts the following technical scheme:
a perception sound source height correction method suitable for a service robot when playing audio comprises the following steps:
step 1, a service robot local device stores various Head Related Transfer Functions (HRTFs) to form a Head Related Transfer Function (HRTF) database, wherein the various Head Related Transfer Functions (HRTFs) cover different high auditory height information.
And 2, the service type robot acquires the height information of the human-computer interaction main listener according to the multimode sensing interaction mode to acquire the physiological height characteristics of the human-computer interaction main listener.
And 3, matching the Head Related Transfer Functions (HRTF) in the Head Related Transfer Function (HRTF) database according to the physiological height characteristics of the human-computer interaction main body listener, and selecting the corresponding Head Related Transfer Functions (HRTF).
And 4, calling the head related transfer function HRTF selected in the step 3, convoluting the local audio data, and outputting the convoluting local audio data to the service type robot playback equipment.
And 5, the service type robot acquires the physiological height characteristics of the human-computer interaction main body listener again according to the multimode sensing interaction mode.
And 6, performing high fine adjustment on the head-related transfer function HRTF obtained by matching in the step 3 according to the human-computer interaction main body listener physiological height characteristics obtained in the step 5 by the service robot in real time, and obtaining the head-related transfer function HRTF after the high fine adjustment.
And 7, performing sound fine adjustment by the service robot according to the head related transfer function HRTF subjected to the high fine adjustment obtained in the step 6, and obtaining the head related transfer function HRTF subjected to the sound fine adjustment.
And 8, calling the Head Related Transfer Function (HRTF) obtained in the step 7 after the sound fine tuning, convoluting the local audio data, and outputting the convoluting local audio data to the service type robot playback equipment.
And 9, repeating the steps 5-8 until the errors before and after the fine-tuned Head Related Transfer Function (HRTF) are within a preset threshold range, and obtaining the optimal Head Related Transfer Function (HRTF).
And step 10, calling the optimal Head Related Transfer Function (HRTF) obtained in the step 9, convoluting the local audio data, and outputting the convoluting local audio data to the service type robot playback equipment.
Preferably: the various head related transfer functions HRTFs in step 1 include a head related transfer function HRTF obtained through actual measurement, a head related transfer function HRTF obtained through model simulation and numerical calculation, a head related transfer function HRTF obtained through correction according to feedback information of a user, and a shared head related transfer function HRTF.
Preferably: the method for the service robot to acquire the height information of the human-computer interaction main listener according to the multimode sensing interaction mode in the step 2 comprises the following steps: and acquiring the height information of the human-computer interaction main listener by adopting infrared, ultrasonic or image.
Preferably: the matching method in the step 3 is to call the Head Related Transfer Function (HRTF) of the corresponding angle according to the physiological height information.
Preferably: the sound fine tuning method in step 7 includes: and controlling the time delay of different playback speakers of the service type robot, and improving the experience of the elevation angle of the virtual sound source. And controlling the service type robot to adjust the sound effect balance and the reverberation control.
Compared with the prior art, the invention has the following beneficial effects:
(1) The invention designs a correction method based on the physiological height characteristics of the listener in the human-computer interaction acquired by the robot, is used for the robot to adaptively match with the perceived sound source height of the listener in the human-computer interaction, improves the human-computer interaction friendliness of the service robot, and avoids the mechanized playing of sound signals with the same perceived height.
(2) The service type robot adopts a multimode interaction mode to acquire the listener height information of the human-computer interaction main body in real time.
(3) The perception height problem of the audio played by the service type robot is corrected in real time, so that the service type robot is more suitable for human-computer interaction listeners with different heights, and the human-computer interaction friendliness of the service type robot is improved.
Drawings
Fig. 1 is a method of time-domain synthesizing a virtual sound source.
Fig. 2 is a flow chart of the present invention.
Detailed Description
The present invention is further illustrated in the accompanying drawings and detailed description which are to be understood as being merely illustrative of the invention and not limiting of its scope, and various equivalent modifications to the invention will fall within the scope of the appended claims to the skilled person after reading the invention.
A perceived sound source height correction method suitable for a service robot when playing audio, as shown in figure 2, comprises the following steps:
step 1, a service robot local device stores various Head Related Transfer Functions (HRTFs) to form a Head Related Transfer Function (HRTF) database, wherein the various Head Related Transfer Functions (HRTFs) cover different high auditory height information.
The various head-related transfer functions HRTF include a head-related transfer function HRTF obtained by actual measurement by a professional method, a head-related transfer function HRTF obtained by model simulation and numerical calculation, a head-related transfer function HRTF obtained by correction according to user using feedback information, other mechanisms, or a head-related transfer function HRTF shared by users.
The invention firstly adopts the published HRTF data measured by the Davis division CIPIC laboratory of university California in the United states to realize the synthesis of the playing perception height of the service robot.
As described above, the human brain discriminates the direction of a sound source in three-dimensional space from the spectral characteristics when the sound reaches the eardrum. The response of the structure of the human body to sound, in particular the response of the auricle to sound, is called the "auricle effect". The "auricle effect" describes that the human auditory system functionally corresponds to a filter associated with the spatial direction of sound, the spectrum of which is modified for sounds in different spatial directions.
The essence of using HRTF data to achieve virtual sound source direction is to convolve HRTF data with the sound signal to be processed. The process of synthesizing a virtual sound source in the time domain using a mono sound source signal is shown in fig. 1. And (3) convolving the sound source signals with the HRTF data of the left ear and the right ear respectively, and retransmitting the signals through a loudspeaker to obtain the virtual sound source with azimuth information.
And 2, the service type robot acquires the height information of the human-computer interaction main listener according to the multimode sensing interaction mode to acquire the physiological height characteristics of the human-computer interaction main listener.
The service type robot acquires the height information of a human-computer interaction main body listener according to a multimode sensing interaction mode: the height information of the human-computer interaction main listener can be obtained by adopting infrared, ultrasonic or image modes and the like.
And 3, matching the Head Related Transfer Functions (HRTF) in the Head Related Transfer Function (HRTF) database according to the physiological height characteristics of the human-computer interaction main body listener, and selecting the corresponding Head Related Transfer Functions (HRTF).
The matching method is to call the Head Related Transfer Function (HRTF) of the corresponding angle according to the physiological height information.
And 4, calling the head related transfer function HRTF selected in the step 3, convoluting the local audio data, and outputting the convoluting local audio data to the service type robot playback equipment.
The convolution local audio data is loaded with standard or universal HRTF parameters, and the convolution local audio data is to be played, and standard playback is carried out. The content of the playback includes virtual sound source height sound effects based on the current HRTF direction.
And 5, the service type robot acquires the physiological height characteristics of the human-computer interaction main body listener again according to the multimode sensing interaction mode.
And 6, performing high fine adjustment on the head-related transfer function HRTF obtained by matching in the step 3 according to the human-computer interaction main body listener physiological height characteristics obtained in the step 5 by the service robot in real time, and obtaining the head-related transfer function HRTF after the high fine adjustment.
And 7, performing sound fine adjustment by the service robot according to the head related transfer function HRTF subjected to the high fine adjustment obtained in the step 6, and obtaining the head related transfer function HRTF subjected to the sound fine adjustment.
The process of fine tuning training can be realized by playing the same sound source, and the sound fine tuning method comprises the following steps: and controlling the time delay of different playback speakers of the service type robot, and improving the experience of the elevation angle of the virtual sound source. Controlling the service type robot to adjust sound effect balance, reverberation control and the like.
And 8, calling the Head Related Transfer Function (HRTF) obtained in the step 7 after the sound fine tuning, convoluting the local audio data, and outputting the convoluting local audio data to the service type robot playback equipment.
And 9, repeating the steps 5-8 until the errors before and after the fine-tuned Head Related Transfer Function (HRTF) are within a preset threshold range, and obtaining the optimal Head Related Transfer Function (HRTF).
And step 10, calling the optimal Head Related Transfer Function (HRTF) obtained in the step 9, convoluting the local audio data, outputting the local audio data to the service robot playback equipment, and outputting proper playback data.
The foregoing is only a preferred embodiment of the invention, it being noted that: it will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the principles of the present invention, and such modifications and adaptations are intended to be comprehended within the scope of the invention.

Claims (3)

1. The perceived sound source height correction method suitable for the service type robot when playing the audio is characterized by comprising the following steps:
step 1, a service robot local device stores various Head Related Transfer Functions (HRTFs) to form a Head Related Transfer Function (HRTF) database, wherein the various Head Related Transfer Functions (HRTFs) cover different height hearing height information;
step 2, the service robot acquires the height information of the human-computer interaction main listener according to the multimode sensing interaction mode to acquire the physiological height characteristics of the human-computer interaction main listener;
the service type robot acquires the height information of a human-computer interaction main body listener according to a multimode sensing interaction mode: acquiring the height information of a human-computer interaction main listener by adopting infrared, ultrasonic or images;
step 3, matching the head related transfer functions HRTF in the head related transfer function HRTF database according to the physiological height characteristics of the human-computer interaction main body listener, and selecting the corresponding head related transfer functions HRTF;
step 4, calling the head related transfer function HRTF selected in the step 3, convoluting the local audio data, and outputting the convoluting local audio data to the service type robot playback equipment;
step 5, the service robot acquires the physiological height characteristics of the listener of the human-computer interaction main body again according to the multimode sensing interaction mode;
step 6, the service robot performs high fine adjustment on the head related transfer function HRTF obtained by matching in the step 3 according to the human-computer interaction main body listener physiological height characteristics obtained in the step 5 in real time, and the head related transfer function HRTF after the high fine adjustment is obtained;
step 7, the service robot performs sound fine adjustment according to the head related transfer function HRTF after the height fine adjustment obtained in the step 6, and obtains the head related transfer function HRTF after the sound fine adjustment;
the sound fine tuning method comprises the following steps: controlling the time delay of different playback speakers of the service robot, and improving the experience of the elevation angle of the virtual sound source; controlling a service robot to adjust sound effect balance and reverberation control;
step 8, calling the Head Related Transfer Function (HRTF) obtained in the step 7 after the sound fine adjustment, convoluting local audio data, and outputting the local audio data to the service type robot playback equipment;
step 9, repeating the steps 5-8 until the errors before and after the fine-tuned Head Related Transfer Function (HRTF) are within a preset threshold range, and obtaining the optimal Head Related Transfer Function (HRTF);
and step 10, calling the optimal Head Related Transfer Function (HRTF) obtained in the step 9, convoluting the local audio data, and outputting the convoluting local audio data to the service type robot playback equipment.
2. The method for correcting the perceived sound source height when the service robot plays the audio according to claim 1, wherein: the various head related transfer functions HRTFs in step 1 include a head related transfer function HRTF obtained through actual measurement, a head related transfer function HRTF obtained through model simulation and numerical calculation, a head related transfer function HRTF obtained through correction according to feedback information of a user, and a shared head related transfer function HRTF.
3. The perceived sound source height correction method suitable for use in a service robot playing audio according to claim 2, wherein: the matching method in the step 3 is to call the Head Related Transfer Function (HRTF) of the corresponding angle according to the physiological height information.
CN202111261650.8A 2021-10-28 2021-10-28 Perception sound source height correction method suitable for service robot in audio playing Active CN113905323B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111261650.8A CN113905323B (en) 2021-10-28 2021-10-28 Perception sound source height correction method suitable for service robot in audio playing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111261650.8A CN113905323B (en) 2021-10-28 2021-10-28 Perception sound source height correction method suitable for service robot in audio playing

Publications (2)

Publication Number Publication Date
CN113905323A CN113905323A (en) 2022-01-07
CN113905323B true CN113905323B (en) 2024-01-23

Family

ID=79026680

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111261650.8A Active CN113905323B (en) 2021-10-28 2021-10-28 Perception sound source height correction method suitable for service robot in audio playing

Country Status (1)

Country Link
CN (1) CN113905323B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105979441A (en) * 2016-05-17 2016-09-28 南京大学 Customized optimization method for 3D sound effect headphone reproduction
CN108540925A (en) * 2018-04-11 2018-09-14 北京理工大学 A kind of fast matching method of personalization head related transfer function

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105979441A (en) * 2016-05-17 2016-09-28 南京大学 Customized optimization method for 3D sound effect headphone reproduction
CN108540925A (en) * 2018-04-11 2018-09-14 北京理工大学 A kind of fast matching method of personalization head related transfer function

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于HRTF的虚拟声源定位;张宗帅;顾亚平;张俊;杨小平;;网络新媒体技术(第02期);全文 *
基于多维生理参数的头相关传递函数个人化方法;黄婉秋;曾向阳;王蕾;;西北工业大学学报(第02期);全文 *

Also Published As

Publication number Publication date
CN113905323A (en) 2022-01-07

Similar Documents

Publication Publication Date Title
JP5894634B2 (en) Determination of HRTF for each individual
EP3311593B1 (en) Binaural audio reproduction
US10440496B2 (en) Spatial audio processing emphasizing sound sources close to a focal distance
JP4921470B2 (en) Method and apparatus for generating and processing parameters representing head related transfer functions
JP5499513B2 (en) Sound processing apparatus, sound image localization processing method, and sound image localization processing program
US20050147261A1 (en) Head relational transfer function virtualizer
CN113170271B (en) Method and apparatus for processing stereo signals
CN112005559B (en) Method for improving positioning of surround sound
KR20180102596A (en) Synthesis of signals for immersive audio playback
CN106664499A (en) Audio signal processing apparatus
JP2020506639A (en) Audio signal processing method and apparatus
CN112956210B (en) Audio signal processing method and device based on equalization filter
CN109587601A (en) The system that sound is movable into and out listener head using virtual acoustic system
EP2822301B1 (en) Determination of individual HRTFs
CN110225445A (en) A kind of processing voice signal realizes the method and device of three-dimensional sound field auditory effect
CN113905323B (en) Perception sound source height correction method suitable for service robot in audio playing
KR20160136716A (en) A method and an apparatus for processing an audio signal
US10999694B2 (en) Transfer function dataset generation system and method
Flanagan et al. Discrimination of group delay in clicklike signals presented via headphones and loudspeakers
JP7332745B2 (en) Speech processing method and speech processing device
US11218832B2 (en) System for modelling acoustic transfer functions and reproducing three-dimensional sound
US20230403528A1 (en) A method and system for real-time implementation of time-varying head-related transfer functions
Pausch Spatial audio reproduction for hearing aid research: System design, evaluation and application
Vorländer et al. 3D Sound Reproduction
CN117202001A (en) Sound image virtual externalization method based on bone conduction equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant