WO2024066751A1 - 一种ar眼镜及其音频增强方法和装置、可读存储介质 - Google Patents

一种ar眼镜及其音频增强方法和装置、可读存储介质 Download PDF

Info

Publication number
WO2024066751A1
WO2024066751A1 PCT/CN2023/111770 CN2023111770W WO2024066751A1 WO 2024066751 A1 WO2024066751 A1 WO 2024066751A1 CN 2023111770 W CN2023111770 W CN 2023111770W WO 2024066751 A1 WO2024066751 A1 WO 2024066751A1
Authority
WO
WIPO (PCT)
Prior art keywords
sound source
glasses
wearer
audio
coordinate system
Prior art date
Application number
PCT/CN2023/111770
Other languages
English (en)
French (fr)
Inventor
王文
李昱锋
李佳明
Original Assignee
歌尔股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 歌尔股份有限公司 filed Critical 歌尔股份有限公司
Publication of WO2024066751A1 publication Critical patent/WO2024066751A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • GPHYSICS
    • G02OPTICS
    • G02BOPTICAL ELEMENTS, SYSTEMS OR APPARATUS
    • G02B27/00Optical systems or apparatus not provided for by any of the groups G02B1/00 - G02B26/00, G02B30/00
    • G02B27/01Head-up displays
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2430/00Signal processing covered by H04R, not provided for in its groups

Definitions

  • the present invention relates to the field of augmented reality (AR) technology, and in particular to AR glasses and their audio enhancement method and device, and readable storage medium.
  • AR augmented reality
  • AR technology uses computers to generate virtual information with realistic sight, hearing, force, touch and movement, displays the virtual information in the real world, and allows people to interact with the virtual information.
  • AR glasses based on AR technology allow the wearer to see not only real-world objects, but also virtual objects, as if they existed in the real world.
  • AR glasses wearers can also obtain ambient sounds in the surrounding real world through multiple microphones, and can also use virtual audio to improve the real world.
  • AR glasses wearers may want to obtain the sound of only one or part of the sound sources and ignore the sound of other sound sources.
  • directional microphones can be used to preferentially receive the sound from real sound sources located in a specific direction and/or at a specific distance, while eliminating noise from sound sources at other locations, the direction and/or distance of the directional microphone's maximum sensitivity to the sound source does not necessarily correspond to the direction and/or distance of the user's attention. Therefore, there is a need to assist AR glasses wearers in better obtaining the sound of the sound source they are concerned about.
  • the embodiments of the present invention provide an AR glasses and an audio enhancement method and device thereof, and a readable storage medium, which are intended to assist the wearer of the AR glasses to better obtain the sound of the sound source of interest.
  • an audio enhancement method for AR glasses comprising:
  • the enhanced audio signal is output to the wearer of the AR glasses through in-ear headphones.
  • an audio enhancement device for AR glasses comprising:
  • a sound source distribution detection unit used to detect the distribution of sound sources in the real world around the wearer of the AR glasses using a microphone array
  • a sound source position marking unit used to mark the sound source position of each detected sound source on the lens of the AR glasses
  • a target sound source locking unit used to lock the target sound source according to the eye gaze direction of the AR glasses wearer
  • An audio enhancement unit configured to extract an audio component related to the voiceprint feature of the target sound source from the audio signal received by the microphone array for enhancement processing
  • the audio output unit is used to output the enhanced audio signal to the wearer of the AR glasses through in-ear headphones.
  • an AR glasses comprising a microphone array, an eye tracker, in-ear headphones, a memory and a processor, wherein a computer program is stored in the memory, and the computer program is loaded and executed by the processor to implement the above-mentioned audio enhancement method of the AR glasses.
  • a readable storage medium stores one or more computer programs, and when the one or more computer programs are executed by a processor, the aforementioned audio enhancement method for AR glasses is implemented.
  • the AR glasses and their audio enhancement method and device, and readable storage medium provided by the embodiments of the present invention can achieve directional audio enhancement for a noisy environment with multiple sound sources in the real world by optimizing and amplifying the sounds of sound sources in the area near the direction of the AR glasses wearer's eye gaze, and suppressing and eliminating the sounds of sound sources in other areas. This can assist the AR glasses wearer in better obtaining the sounds of the sound sources they are concerned about, reduce interference from sounds from other sound sources, have a certain noise reduction effect, improve the user experience of the AR glasses wearer, and truly realize the role of AR glasses in augmented reality.
  • FIG1 is a schematic flow chart of an audio enhancement method for AR glasses provided in an embodiment of the present invention.
  • FIG. 2 is a schematic diagram of a position of four microphones arranged on AR glasses
  • FIG3 is a schematic diagram of the positions of the four microphones shown in FIG2 when the AR glasses are worn;
  • FIG4 is a schematic diagram of the planar positions of the four microphones shown in FIG3 ;
  • FIG5 is a schematic diagram showing the distribution of three sound sources around the wearer of AR glasses.
  • FIG6 is a schematic diagram of the planar positions of the three sound sources shown in FIG5 ;
  • FIG7 is a schematic diagram of the waveform of sound signals picked up by four microphones (MIC1 to MIC4) when the sound source 1 makes a sound alone in the scene shown in FIG6;
  • FIG8 is a schematic diagram of a wearer of AR glasses determining the location of a sound source by moving his position in the scenario shown in FIG6 ;
  • FIG9 is a schematic diagram of the coordinate positions of three sound sources around the wearer of AR glasses in the world coordinate system
  • FIG10 is a schematic diagram of the coordinate positions of the three sound sources shown in FIG9 converted into the camera coordinate system
  • FIG11 is a schematic diagram of marking a sound source position on an AR glasses lens according to an embodiment of the present invention.
  • FIG12 is a schematic diagram showing the principle of obtaining the eye gaze direction using an eye tracker
  • FIG13 is a schematic diagram of distinguishing and marking a target sound source on an AR glasses lens according to an embodiment of the present invention.
  • FIG14 is a schematic diagram of the structure of an audio enhancement device for AR glasses provided in an embodiment of the present invention.
  • FIG. 15 is a schematic diagram of the functional module structure of the AR glasses provided in an embodiment of the present invention.
  • FIG1 is a flow chart of the audio enhancement method for AR glasses provided by an embodiment of the present invention. Referring to FIG1 , the method includes steps S110 to S150:
  • Step S110 Use a microphone array to detect the distribution of sound sources in the real world around the wearer of the AR glasses.
  • This step is based on the principle of binaural positioning and uses a microphone array composed of multiple microphones set on the AR glasses to detect the distribution of sound sources in the real world around the wearer of the AR glasses.
  • Binaural localization refers to the property of binaural hearing that determines the direction of a sound source.
  • the auditory system can determine the location of the sound source based on this information.
  • the sound source information obtained by a microphone array composed of two or more microphones can be calculated to determine the sound source.
  • the following uses a microphone array consisting of four microphones as an example to illustrate the principle of determining the location of the sound source.
  • Figure 2 is a schematic diagram of the positions of four microphones on AR glasses. It is understood that the number of microphones set on AR glasses is not limited to four, and is not limited to the position setting method shown in Figure 2. Generally speaking, the more microphones that constitute the microphone array and the more symmetrical the position setting, the more accurately the sound source position can be determined.
  • Fig. 3 is a schematic diagram of the positions of the four microphones shown in Fig. 2 when the AR glasses are worn.
  • Fig. 4 is a schematic diagram of the planar positions of the four microphones shown in Fig. 3.
  • Figure 5 is a schematic diagram of the distribution of the three sound sources around the wearer of AR glasses.
  • Figure 6 is a schematic diagram of the planar position of the three sound sources shown in Figure 5.
  • the first direction line of each sound source can be obtained based on the sound signals of different intensities picked up by each microphone in the microphone array.
  • Figure 7 is a schematic diagram of the waveform of the sound signal picked up by the four microphones (MIC1-4) when the sound source 1 makes a sound alone in the scene shown in Figure 6.
  • the four microphones (MIC1-4) can pick up sound signals of different intensities.
  • the waveform of the sound signal picked up by the four microphones is shown in Figure 7. By comparing the intensities of the sound signals, the direction of the sound source can be located.
  • the sound signal strength picked up by MIC3 and MIC4 is greater than that of MIC1 and MIC2, and the approximate position of the sound source 1 can be preliminarily determined. Based on the slight difference between MIC3 and MIC4, the first direction line of the sound source 1 in the coordinate system of MIC3 and MIC4 can be accurately determined.
  • a microphone array sound source localization algorithm can still be combined, such as a sound source localization algorithm based on arrival delay, a sound source localization algorithm based on high-resolution spectrum estimation, or a sound source localization algorithm based on beamforming, to obtain the first direction line of each sound source relative to the position of the wearer of the AR glasses.
  • the second direction line of each sound source can be obtained based on the same positioning algorithm or principle according to the sound signals of different intensities picked up by each microphone in the microphone array. Then, the position of each sound source can be determined according to the intersection position of the first direction line and the second direction line of each sound source.
  • FIG8 is a schematic diagram of the AR glasses wearer determining the location of the sound source by moving the position in the scene shown in FIG6. As shown in FIG8, when the AR glasses wearer is at position 1, the direction line 11 of the sound source 1 relative to position 1 can be obtained, and when the AR glasses wearer is at position 2, the direction line 12 of the sound source 1 relative to position 2 can be obtained. By calculating the intersection position of the two sets of direction lines 11 and 12, the location of the sound source 1 can be determined.
  • the above method can be used to determine the position of each detected sound source, so as to know the real world around the wearer of AR glasses.
  • the distribution of sound sources can be used to determine the position of each detected sound source, so as to know the real world around the wearer of AR glasses.
  • Step S120 marking the sound source position of each detected sound source on the lens of the AR glasses.
  • This step S120 may be specifically as follows:
  • a world coordinate system is established with the center of the head of the AR glasses wearer as the origin.
  • its coordinate position in the world coordinate system can be represented by (x, y, z) coordinates.
  • Figure 9 is a schematic diagram of the coordinate positions of the three sound sources around the AR glasses wearer in the world coordinate system.
  • a camera coordinate system is established with the pupil of the AR glasses wearer as the coordinate origin.
  • the coordinate position of each detected sound source in the world coordinate system is converted to the coordinate position in the camera coordinate system.
  • the established camera coordinate system is a plane coordinate system, and the coordinate position of each detected sound source in the camera coordinate system can be represented by (x, y) coordinates.
  • Figure 10 is a schematic diagram of the coordinate positions of the three sound sources shown in Figure 9 converted into the camera coordinate system.
  • the coordinate position of each detected sound source in the camera coordinate system is marked on the lens of the AR glasses using a first marking method.
  • Figure 11 is a schematic diagram of marking the location of the sound source on the lenses of AR glasses given in an embodiment of the present invention.
  • the marking method shown in Figure 11 is centered on the coordinate position of the sound source in the camera coordinate system, and displays a circular dotted frame with a set radius (for example, 2 cm), marking the location of each detected sound source on the lenses of the AR glasses.
  • the dotted frame can also be marked with red, yellow and other line colors
  • the circular frame can also be designed into triangles, squares, ellipses and other shapes.
  • Step S130 lock the target sound source according to the gaze direction of the AR glasses wearer.
  • This step S130 can be specifically as follows: using an eye tracker to obtain the eye gaze direction of the AR glasses wearer, and converting the eye gaze direction into a coordinate position in the camera coordinate system; when the coordinate distance between the eye gaze direction and a sound source in the camera coordinate system is less than a preset distance value, determining the sound source as a target sound source; and marking the target sound source on the lenses of the AR glasses with a second marking method different from the first marking method, thereby locking the target sound source.
  • the eye tracker uses eye-tracking technology to track eye movements, locates the pupil through image processing technology, obtains the coordinates of the pupil center, and calculates the person's gaze point through corresponding algorithms.
  • the eye tracking device can be used to obtain the eye gaze direction of the wearer of AR glasses.
  • Figure 12 is a schematic diagram of the principle of using an eye tracker to obtain the eye gaze direction. As shown in Figure 12, when the pupil moves from the origin O(0,0) of the eye coordinate system to the point E(ex,ey), the direction vector of the OE line can be calculated, and the direction vector is the eye gaze direction of the wearer of AR glasses.
  • the eye coordinate system (Eye Coordinates) can also be established with the pupil of the wearer of AR glasses as the coordinate origin, so the eye coordinate system is the same as the camera coordinate system.
  • the eye coordinate system is a right-handed orthogonal standard coordinate system, and the eye is interpreted as looking towards the negative Z axis of this coordinate system while taking a photo.
  • the eye gaze direction OE can be converted into the coordinate C(cx,cy) in the camera coordinate system by the following formula:
  • EX and EY are the length and width of the human eye respectively. Usually, these two values use the default statistical values.
  • the eyeball gaze focus coordinates fall within the range of a certain sound source, and the wearer of the AR glasses is looking at the direction of the sound source.
  • the sound source is determined as the target sound source, and all other non-attention sound sources are determined as non-target sound sources. It can be understood that the accuracy of determining the target sound source can be improved by reducing the m value.
  • the target sound source Once the target sound source is determined, it can be distinguished and marked on the lenses of the AR glasses to lock on to the target sound source.
  • FIG13 is a schematic diagram of a method of distinguishing and marking a target sound source on an AR glasses lens according to an embodiment of the present invention.
  • a solid circular frame is used to mark the location of sound source 1
  • a dotted circular frame is still used to mark the locations of the other two sound sources, that is, the dotted frame is transformed into a solid frame to distinguish and mark the sound source 1, thereby locking the sound source 1 as the target sound source.
  • different colors, different shapes, etc. can also be used to distinguish and mark the sound source from the unselected sound source.
  • Step S140 extracting audio components related to the voiceprint features of the target sound source from the audio signal received by the microphone array and performing enhancement processing.
  • This step S140 can be specifically as follows: according to the coordinate position of the target sound source in the camera coordinate system, searching the voiceprint database to obtain the voiceprint features of the target sound source; extracting the audio component related to the voiceprint features of the target sound source from the audio signal currently received by the microphone array; amplifying the gain of the extracted audio component, and/or reducing or turning off the gain of other unextracted audio components.
  • Voiceprint is a sound wave spectrum that carries speech information and is displayed by electroacoustic instruments. It is a biological feature composed of more than 100 characteristic dimensions such as wavelength, frequency, and intensity. It has the characteristics of stability, measurability, and uniqueness.
  • the voiceprint features of the target sound source In order to obtain the voiceprint features of the target sound source, it is necessary to detect the voiceprint features of the target sound source after the above step S110 and before the present step S140.
  • the voiceprint features are extracted from each sound source, and the voiceprint features are associated with the corresponding sound source positions to establish a voiceprint database.
  • the sound signal in addition to locating each sound source detected, the sound signal must also be recorded to extract the voiceprint features of each sound source, and the voiceprint features of each sound source are associated with its corresponding sound source position to establish a voiceprint database, so that the voiceprint of a specific sound source can be subsequently extracted from the mixed sound signal.
  • the voiceprint features of each sound source are associated with its sound source position, and the established voiceprint database can be saved in the form of an array with the sound source as the number.
  • the voiceprint features of the target sound source can be obtained by searching the voiceprint database according to the coordinate position of the target sound source in the camera coordinate system. For example, the sound source number is determined to be 1 according to the coordinate position of the target sound source in the camera coordinate system, and then the voiceprint features corresponding to the sound source number 1 are searched in the voiceprint database.
  • the audio signal of the microphone array can be processed according to the voiceprint characteristics of the target sound source, and the audio component related to the voiceprint characteristics of the target sound source can be extracted from the audio signal currently received by the microphone array.
  • the gain of the extracted audio component can be amplified according to the multiple set by the user, and the gain of other unextracted audio components can be reduced or turned off at the same time, so as to highlight the target sound source and weaken the non-target sound source.
  • Step S150 output the enhanced audio signal to the wearer of the AR glasses through an in-ear headset.
  • the speakers or audio interface of the AR glasses are arranged at the temples close to the human ears when they are normally worn.
  • the speakers need to be pulled out to become in-ear headphones for wearing, or external dual-channel in-ear headphones need to be connected through the audio interface for wearing.
  • step S150 in order to better obtain the sound of the sound source that the wearer of the AR glasses is concerned about in the outside real world, it is necessary to isolate the sound of the external environment.
  • the wearer cannot directly listen to the sound played by the speaker through the ears, but must listen to the processed audio signal of the microphone array through in-ear headphones to achieve directional audio enhancement.
  • the audio enhancement method of AR glasses of the embodiment of the present invention is adopted.
  • directional audio enhancement can be achieved, thereby assisting the AR glasses wearer to better obtain the sound of the sound source he is concerned about, reducing the interference of other sound sources, having a certain noise reduction effect, improving the user experience of the AR glasses wearer, and truly realizing the role of AR glasses in augmented reality.
  • FIG. 14 is a schematic diagram of the structure of an audio enhancement device for AR glasses provided in an embodiment of the present invention.
  • the audio enhancement device for AR glasses in an embodiment of the present invention includes:
  • a sound source distribution detection unit 141 is used to detect the distribution of sound sources in the real world around the wearer of the AR glasses using a microphone array;
  • a sound source position marking unit 142 used to mark the sound source position of each detected sound source on the lens of the AR glasses;
  • a target sound source locking unit 143 is used to lock the target sound source according to the eye gaze direction of the AR glasses wearer
  • the audio enhancement unit 144 is used to extract the audio component related to the voiceprint feature of the target sound source from the audio signal received by the microphone array and perform enhancement processing;
  • the audio output unit 145 is used to output the enhanced audio signal to the wearer of the AR glasses through the in-ear headphones.
  • the above-mentioned sound source distribution detection unit 141 is specifically used for:
  • a first direction line of each detected sound source is obtained based on sound signals of different intensities picked up by each microphone in the microphone array;
  • a second direction line of each detected sound source is obtained based on sound signals of different intensities picked up by each microphone in the microphone array; the position of each detected sound source is determined based on the position of the intersection of the first direction line and the second direction line of each detected sound source.
  • the above-mentioned sound source position marking unit 142 is specifically used for:
  • a world coordinate system is established with the center of the head of the AR glasses wearer as the coordinate origin, and the coordinate position of each detected sound source in the world coordinate system is determined;
  • a camera coordinate system is established with the pupil of the AR glasses wearer as the coordinate origin, and the coordinate position of each detected sound source in the world coordinate system is converted into the coordinate position in the camera coordinate system according to the conversion formula obtained by the camera calibration algorithm; the coordinate position of each detected sound source in the camera coordinate system is marked on the lenses of the AR glasses using the first marking method.
  • the target sound source locking unit 143 is specifically used to:
  • An eye tracker is used to obtain the eye gaze direction of the wearer of the AR glasses, and the eye gaze direction is converted into a coordinate position in a camera coordinate system; when the coordinate distance between the eye gaze direction and a sound source in the camera coordinate system is less than a preset distance value, the sound source is determined as a target sound source; and the target sound source is distinguishably marked on the lenses of the AR glasses using a second marking method different from the first marking method, so as to lock the target sound source.
  • the audio enhancement device of the AR glasses of the embodiment of the present invention may further include:
  • the voiceprint feature extraction unit 146 is used to extract voiceprint features from each sound source detected by the sound source distribution detection unit 141, and associate the voiceprint features with the corresponding sound source positions to establish a voiceprint database.
  • the audio enhancement unit 144 is specifically used to:
  • the voiceprint database is searched to obtain the voiceprint features of the target sound source; the audio component related to the voiceprint features of the target sound source is extracted from the audio signal currently received by the microphone array; the gain of the extracted audio component is amplified, and/or the gain of other unextracted audio components is reduced or turned off.
  • FIG15 is a schematic diagram of the functional module structure of the AR glasses provided by the embodiment of the present invention.
  • the AR glasses provided by the embodiment of the present invention include: a microphone array, an eye tracker, in-ear headphones, a memory, and a processor, wherein the memory may be a memory, such as a high-speed random access memory (RAM), or a non-volatile memory (non-volatile memory), such as at least one disk storage device, etc.
  • a computer program is stored in the memory, and the computer program is loaded and executed by the processor to implement the aforementioned audio enhancement method for AR glasses.
  • the AR glasses may also selectively include a communication module, etc.
  • Speakers, in-ear headphones, microphone arrays, eye trackers, memories, processors, communication modules, etc. may be interconnected through an internal bus, which may be an ISA (Industry Standard Architecture) bus, a PCI (Peripheral Component Interconnect) bus, or an EISA (Extended Industry Standard Architecture) bus, etc.
  • the bus may be divided into an address bus, a data bus, a control bus, etc. For ease of representation, only one bidirectional arrow is used in FIG15, but it does not mean that there is only one bus or one type of bus.
  • an embodiment of the present invention further proposes a readable storage medium, which stores one or more computer programs.
  • the one or more computer programs are executed by a processor, they implement the aforementioned audio enhancement method for AR glasses.
  • Readable storage media include permanent and non-permanent, removable and non-removable media, and can be implemented by any method or technology to store information. Information can be computer-readable instructions, data structures, modules of programs or other data. Examples of readable storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, compact disk read-only memory (CD-ROM), digital versatile disk (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission medium that can be used to store information that can be accessed by a computing device.
  • PRAM phase change memory
  • SRAM static random access memory
  • DRAM dynamic random access memory
  • RAM random access memory
  • ROM read-only memory
  • EEPROM electrically erasable programmable read-only memory
  • flash memory or other
  • the present invention may be provided as methods, devices, or computer program products. Therefore, the present invention may take the form of a complete hardware embodiment, a complete software embodiment, or a combination of software and hardware embodiments. Furthermore, the present invention may take the form of a computer program product implemented on one or more readable storage media containing a computer program.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Optics & Photonics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Measurement Of Velocity Or Position Using Acoustic Or Ultrasonic Waves (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

本发明实施例公开了一种AR眼镜及其音频增强方法和装置、可读存储介质。所述的AR眼镜的音频增强方法包括:利用麦克风阵列检测AR眼镜佩戴者周围真实世界中的声源分布情况;在AR眼镜的镜片上标记出检测到的每个声源的声源位置;根据AR眼镜佩戴者的眼球注视方向锁定目标声源;从麦克风阵列接收的音频信号中提取与目标声源的声纹特征相关的音频分量进行增强处理;将增强处理后的音频信号通过入耳式耳机输出给AR眼镜佩戴者。采用本发明实施例的方案,可以实现定向音频的强化,以此辅助AR眼镜佩戴者更好的获取其所关注声源的声音,降低其他声源声音的干扰,具有一定的降噪效果,真正实现AR眼镜增强现实的作用。

Description

一种AR眼镜及其音频增强方法和装置、可读存储介质
本申请要求于2022年09月30日提交中国专利局、申请号为202211211572.5、发明名称为“一种AR眼镜及其音频增强方法和装置、可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及增强现实(Augmented Reality,AR)技术领域,尤其涉及一种AR眼镜及其音频增强方法和装置、可读存储介质。
背景技术
AR技术是利用计算机生成一种逼真的视、听、力、触和动等感觉的虚拟信息,将虚拟信息放在真实世界中展现,并且让人和虚拟信息进行互动。
基于AR技术的AR眼镜,其佩戴者不仅可以看见真实世界对象,还可以看见产生的虚拟对象,如同在真实世界中存在一样。除了视觉上的增强现实外,AR眼镜佩戴者还可以通过多个麦克风收音来获取周围真实世界中的环境声音,而且还可以使用虚拟音频来改善真实世界。
针对真实世界中多个声源的嘈杂环境,AR眼镜佩戴者可能会希望仅获取其中的一个或部分声源的声音,忽略掉其他声源的声音。虽然可以利用定向麦克风优先接收来自位于特定方向上和/或特定距离处的真实声源的声音,同时消除来自其它位置处声源的噪声,但是定向麦克风对声源的最大灵敏度的方向和/或距离不一定对应于用户关注的方向和/或用户关注的距离。因此,存在辅助AR眼镜佩戴者更好的获取其所关注声源的声音的需求。
发明内容
本发明实施例提供了一种AR眼镜及其音频增强方法和装置、可读存储介质,旨在辅助AR眼镜佩戴者更好的获取其所关注声源的声音。
依据本发明的第一方面,提供了一种AR眼镜的音频增强方法,包括:
利用麦克风阵列检测AR眼镜佩戴者周围真实世界中的声源分布情况;
在AR眼镜的镜片上标记出检测到的每个声源的位置;
根据AR眼镜佩戴者的眼球注视方向锁定目标声源;
从麦克风阵列接收的音频信号中提取与所述目标声源的声纹特征相关的音频分量进行增强处理;
将增强处理后的音频信号通过入耳式耳机输出给AR眼镜佩戴者。
依据本发明的第二方面,提供了一种AR眼镜的音频增强装置,包括:
声源分布检测单元,用于利用麦克风阵列检测AR眼镜佩戴者周围真实世界中的声源分布情况;
声源位置标记单元,用于在AR眼镜的镜片上标记出检测到的每个声源的声源位置;
目标声源锁定单元,用于根据AR眼镜佩戴者的眼球注视方向锁定目标声源;
音频增强单元,用于从麦克风阵列接收的音频信号中提取与所述目标声源的声纹特征相关的音频分量进行增强处理;
音频输出单元,用于将增强处理后的音频信号通过入耳式耳机输出给AR眼镜佩戴者。
依据本发明的第三方面,提供了一种AR眼镜,包括麦克风阵列、眼动仪、入耳式耳机、存储器和处理器,所述存储器中存储有计算机程序,所述计算机程序由所述处理器加载并执行,以实现上述的AR眼镜的音频增强方法。
依据本发明的第四方面,提供了一种可读存储介质,所述可读存储介质存储一个或多个计算机程序,所述一个或多个计算机程序当被处理器执行时,实现前述的AR眼镜的音频增强方法。
本发明实施例提供的技术方案可以实现以下有益效果:
本发明实施例提供的AR眼镜及其音频增强方法和装置、可读存储介质,针对真实世界中多个声源的嘈杂环境,通过将AR眼镜佩戴者的眼球注视方向附近区域内的声源声音优化放大,对其他区域声源声音进行抑制消除,可以实现定向音频的强化,以此辅助AR眼镜佩戴者更好的获取其所关注声源的声音,降低其他声源声音的干扰,具有一定的降噪效果,提升AR眼镜佩戴者的用户体验,真正实现AR眼镜增强现实的作用。
附图说明
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明中记载的一些实施例,对于本领域普通技术人员来讲,还可以根据这些附图获得其他的附图。在附图中:
图1是本发明实施例提供的AR眼镜的音频增强方法的流程示意图;
图2是AR眼镜上设置4个麦克风的一种位置示意图;
图3是图2所示的4个麦克风在AR眼镜佩戴时的位置示意图;
图4是图3所示的4个麦克风的平面位置示意图;
图5是AR眼镜佩戴者周围3个声源的分布示意图;
图6是图5所示3个声源的平面位置示意图;
图7是图6所示场景在声源1单独发声时,4个麦克风(MIC1~4)拾取的声音信号波形示意图;
图8是图6所示场景下AR眼镜佩戴者通过位置移动确定声源位置的示意图;
图9是AR眼镜佩戴者周围3个声源在世界坐标系中的坐标位置示意图;
图10是图9所示的3个声源转换为相机坐标系中的坐标位置示意图;
图11是本发明实施例给出的一种在AR眼镜镜片上标记声源位置的示意图;
图12是利用眼动仪获取眼球注视方向的原理示意图;
图13是本发明实施例给出的一种在AR眼镜镜片上区别标记目标声源的示意图;
图14是本发明实施例提供的AR眼镜的音频增强装置的结构示意图;
图15是本发明实施例提供的AR眼镜的功能模块结构示意图。
具体实施方式
下面将参照附图更详细地描述本发明实施例。提供这些实施例是为了能够更透彻地理解本发明,并且能够将本发明的范围完整的传达给本领域的技术人员。虽然附图中显示了本发明的示例性实施例,然而应当理解,可以以各种形式实现本发明而不应被这里阐述的实施例所限制。
本发明实施例提供了一种AR眼镜的音频增强方法。图1是本发明实施例提供的AR眼镜的音频增强方法的流程示意图。参见图1所示,包括步骤S110~S150:
步骤S110,利用麦克风阵列检测AR眼镜佩戴者周围真实世界中的声源分布情况。
本步骤是基于双耳定位原理,利用AR眼镜上设置的多个麦克风构成的麦克风阵列,来检测AR眼镜佩戴者周围真实世界中的声源分布情况。
双耳定位是指双耳听觉所具有的判断声源方位的属性。声源相对于听音者在不同的位置发声时,声波到达听者两耳的强度和时间不同,听觉***根据这些信息可以确定声源的位置。与此类似,可以根据两个或多个麦克风组成的麦克风阵列获取的声源信息进行计算,以确定声源 位置。下面以4个麦克风构成的麦克风阵列为示例对确定声源位置的原理进行说明。
图2是AR眼镜上设置4个麦克风的一种位置示意图。可以理解,AR眼镜上设置的麦克风数量不限于4个,且不局限于图2所示的位置设置方式,一般而言,构成麦克风阵列的麦克风数量越多,位置设置的越对称,可以越准确地确定声源位置。
图3是图2所示的4个麦克风在AR眼镜佩戴时的位置示意图。图4是图3所示的4个麦克风的平面位置示意图。
假设AR眼镜佩戴者周围真实世界中存在3个声源,且声源分布情况如图5所示。图5是AR眼镜佩戴者周围3个声源的分布示意图。图6是图5所示3个声源的平面位置示意图。
当AR眼镜佩戴者在第一位置处时,根据麦克风阵列中每个麦克风拾取的不同强度声音信号,可以获取到每个声源的第一方向线。
以图6所示场景为例,图7是图6所示场景在声源1单独发声时,4个麦克风(MIC1~4)拾取的声音信号波形示意图。当声源1单独发声时,4个麦克风(MIC1~4)可以拾取到不同强度的声音信号,4个麦克风拾取的声音信号波形如图7所示,通过比对声音信号的强度即可定位出声源方向。
在图7中可以看到,MIC3和MIC4拾取到的声音信号强度大于MIC1和MIC2,可以初步判断声源1的大***置。再根据MIC3和MIC4之间的微小差异,可以精确判断出声源1在MIC3和MIC4坐标系中的第一方向线。
即使在多个声源同时发声的情形下,仍可以结合麦克风阵列声源定位算法,例如基于到达时延的声源定位算法,或基于高分辨率谱估计的声源定位算法,或基于波束形成的声源定位算法等,获取到每个声源相对于AR眼镜佩戴者所处位置的第一方向线。
当AR眼镜佩戴者移动到第二位置处时,根据麦克风阵列中每个麦克风拾取的不同强度声音信号,基于同样的定位算法或原理,可以获取到每个声源的第二方向线。进而根据每个声源的第一方向线和第二方向线的交点位置,确定出每个声源的位置。
图8是图6所示场景下AR眼镜佩戴者通过位置移动确定声源位置的示意图。如图8所示,当AR眼镜佩戴者在位置1处时,可以获取到声源1相对于位置1的方向线11,当AR眼镜佩戴者在位置2处时,可以获取到声源1相对于位置2的方向线12。计算两组方向线11和12的交点位置,即可确定出声源1所在位置。
采用上述方法可以确定出检测到的每个声源位置,从而获知AR眼镜佩戴者周围真实世界中 的声源分布情况。
步骤S120,在AR眼镜的镜片上标记出检测到的每个声源的声源位置。本步骤S120可以具体为:
首先,以AR眼镜佩戴者的头部中心为坐标原点建立世界坐标系,对检测到的每个声源可以用(x,y,z)坐标表示其在世界坐标系中的坐标位置。以AR眼镜佩戴者周围真实世界中存在3个声源的场景为例。图9是AR眼镜佩戴者周围3个声源在世界坐标系中的坐标位置示意图。
接着,以AR眼镜佩戴者的瞳孔为坐标原点建立相机坐标系,根据相机标定算法得到的转换公式,将检测到的每个声源在世界坐标系中的坐标位置转换为相机坐标系中的坐标位置。
建立的相机坐标系为平面坐标系,对检测到的每个声源可以用(x,y)坐标表示其在相机坐标系中的坐标位置。图10是图9所示的3个声源转换为相机坐标系中的坐标位置示意图。
然后,在AR眼镜的镜片上对检测到的每个声源采用第一标记方式标记出其在相机坐标系中的坐标位置。
在AR眼镜的镜片上可以采用多种方式标记出声源位置。图11是本发明实施例给出的一种在AR眼镜镜片上标记声源位置的示意图。图11所示的标记方式是以相机坐标系中声源的坐标位置为中心,显示一个设定半径(例如2厘米)的圆形虚线框,将检测到的每个声源的位置在AR眼镜的镜片上标记出来。当然,虚线框还可以采用红色、黄色等线条颜色进行醒目标记,圆形框还可以设计成三角形、方形、椭圆形等形状。
步骤S130,根据AR眼镜佩戴者的眼球注视方向锁定目标声源。
本步骤S130可以具体为:利用眼动仪获取AR眼镜佩戴者的眼球注视方向,并将该眼球注视方向转换成相机坐标系中的坐标位置;当眼球注视方向与某个声源在相机坐标系中的坐标距离小于预设距离值时,将该声源确定为目标声源;在AR眼镜的镜片上对所述目标声源采用不同于所述第一标记方式的第二标记方式进行区别标记,以此锁定目标声源。
眼动仪采用的是眼动追踪(Eye-tracking)技术追踪眼睛的运动,通过图像处理技术,定位瞳孔位置,获取瞳孔中心坐标,并通过相应算法,计算人的注视点。
利用眼动仪可以获取AR眼镜佩戴者的眼球注视方向。图12是利用眼动仪获取眼球注视方向的原理示意图。如图12所示,当瞳孔从眼坐标系的原点O(0,0)点移动到E(ex,ey)点时,此时可以计算得到OE线的方向向量,该方向向量即为AR眼镜佩戴者的眼球注视方向。
需要说明的是,眼坐标系(Eye Coordinates)也可以是以AR眼镜佩戴者的瞳孔为坐标原点建立,因此该眼坐标系同于相机坐标系。在计算机图形中眼坐标系是一个右手性的正交标准坐标系,并将眼睛解读为正看向这个坐标系的负Z轴同时拍摄相片。
对于一个1920*1080的相机镜头显示范围来讲,当瞳孔从眼坐标系的原点O(0,0)点移动到E(ex,ey)点时,可以通过如下公式将眼球注视方向OE转换成相机坐标系中的坐标C(cx,cy):
其中,EX,EY分别取值为人眼部的长度值和宽度值,通常情况下这两个值采用的是默认的统计数值。
时,也即眼球注视方向与某个声源在相机坐标系中的坐标距离小于预设距离值m,此时可以认为眼球注视焦点坐标落在某个声源的范围内,AR眼镜佩戴者正在注视该声源方向,将该声源确定为目标声源,将其他所有非注视的声源确定为非目标声源。可以理解,通过降低m值可以提升目标声源的确定精度。
一旦确定出目标声源,就可以在AR眼镜的镜片上区别标记出该目标声源,以此锁定目标声源。
图13是本发明实施例给出的一种在AR眼镜镜片上区别标记目标声源的示意图。对比图11和图13所示,声源1所在位置处使用的是圆形实线框标记,而其他两个声源所在位置处仍使用的是圆形虚线框标记,也即通过将虚线框变换为实线框区别标记出来声源1,以此将声源1锁定为目标声源。当然,还可以辅以不同颜色、不同形状等方式与未被选中的声源进行区别标记。
步骤S140,从麦克风阵列接收的音频信号中提取与目标声源的声纹特征相关的音频分量进行增强处理。
本步骤S140可以具体为:根据目标声源在相机坐标系中的坐标位置,查找声纹数据库获取目标声源的声纹特征;从麦克风阵列当前接收的音频信号中提取与目标声源的声纹特征相关的音频分量;对提取的音频分量的增益进行放大,和/或对其他未提取的音频分量的增益进行降低或关闭。
声纹(Voiceprint),是用电声学仪器显示的携带言语信息的声波频谱,是由波长、频率以及强度等百余种特征维度组成的生物特征,具有稳定性、可测量性、唯一性等特点。
为了能够获取目标声源的声纹特征,需要在上述步骤S110之后本步骤S140之前,对检测到 的每个声源分别提取声纹特征,并将声纹特征与对应的声源位置进行关联,建立声纹数据库。
也即,对检测到的每个声源除了进行定位外,也要记录声音信号,从中提取每个声源的声纹特征,并且将每个声源的声纹特征与其对应的声源位置进行关联,建立声纹数据库,以便后续从混杂的声音信号中提取出特定声源的声纹。
现已有多种技术或手段可以在多个声源环境下提取目标声源的声纹。因此对检测到的每个声源分别提取声纹特征,不是本发明讨论的内容,并且可以采用多种现有技术实现。
将每个声源的声纹特征与其声源位置进行关联,建立的声纹数据库可以以声源为编号采用数组的形式进行保存。锁定目标声源后,可以根据目标声源在相机坐标系中的坐标位置,查找声纹数据库获取目标声源的声纹特征。例如,根据目标声源在相机坐标系中的坐标位置确定声源编号为1,然后在声纹数据库中查找得到对应声源编号1的声纹特征。
接着就可以根据目标声源的声纹特征,对麦克风阵列的音频信号进行处理,从麦克风阵列当前接收的音频信号中提取与目标声源的声纹特征相关的音频分量。然后根据用户设定倍数对提取的音频分量的增益进行放大,也可以同时对其他未提取的音频分量的增益进行降低或关闭,以达到突出目标声源,弱化非目标声源的目的。
步骤S150,将增强处理后的音频信号通过入耳式耳机输出给AR眼镜佩戴者。
AR眼镜的扬声器或音频接口设置在正常佩戴时靠近人耳的镜腿处,当使用本发明实施例上述的音频增强方法时,需将扬声器拉出变成入耳式耳机进行佩戴,或者通过音频接口外接双声道入耳式耳机进行佩戴。
本步骤S150中,AR眼镜佩戴者为了更好的获取外界真实世界中其所关注的声源的声音,需要隔绝外界环境的声音,不能通过耳朵直接接听扬声器播放的声音,而是要通过入耳式耳机来接听处理后的麦克风阵列的音频信号,实现定向音频的强化。
由以上各步骤说明可知,针对真实世界中多个声源的嘈杂环境,采用本发明实施例的AR眼镜的音频增强方法,通过将AR眼镜佩戴者的眼球注视方向附近区域内的声源声音优化放大,对其他区域声源声音进行抑制消除,可以实现定向音频的强化,以此辅助AR眼镜佩戴者更好的获取其所关注声源的声音,降低其他声源声音的干扰,具有一定的降噪效果,提升AR眼镜佩戴者的用户体验,真正实现AR眼镜增强现实的作用。
与前述的方法同属于一个技术构思,本发明实施例还提供了一种AR眼镜的音频增强装置。 图14是本发明实施例提供的AR眼镜的音频增强装置的结构示意图,如图14所示,本发明实施例的AR眼镜的音频增强装置包括:
声源分布检测单元141,用于利用麦克风阵列检测AR眼镜佩戴者周围真实世界中的声源分布情况;
声源位置标记单元142,用于在AR眼镜的镜片上标记出检测到的每个声源的声源位置;
目标声源锁定单元143,用于根据AR眼镜佩戴者的眼球注视方向锁定目标声源;
音频增强单元144,用于从麦克风阵列接收的音频信号中提取与所述目标声源的声纹特征相关的音频分量进行增强处理;
音频输出单元145,用于将增强处理后的音频信号通过入耳式耳机输出给AR眼镜佩戴者。
在一些实施例中,上述的声源分布检测单元141具体用于:
当AR眼镜佩戴者在第一位置处时,根据麦克风阵列中每个麦克风拾取的不同强度声音信号,获取检测到的每个声源的第一方向线;当AR眼镜佩戴者在第二位置处时,根据麦克风阵列中每个麦克风拾取的不同强度声音信号,获取检测到的每个声源的第二方向线;根据检测到的每个声源的第一方向线和第二方向线的交点位置,确定出检测到的每个声源的位置。
在一些实施例中,上述的声源位置标记单元142具体用于:
以AR眼镜佩戴者的头部中心为坐标原点建立世界坐标系,对检测到的每个声源确定出其在世界坐标系中的坐标位置;以AR眼镜佩戴者的瞳孔为坐标原点建立相机坐标系,根据相机标定算法得到的转换公式,将检测到的每个声源在世界坐标系中的坐标位置转换为相机坐标系中的坐标位置;在AR眼镜的镜片上对检测到的每个声源采用第一标记方式标记出其在相机坐标系中的坐标位置。
在一些实施例中,上述的目标声源锁定单元143具体用于:
利用眼动仪获取AR眼镜佩戴者的眼球注视方向,并将所述眼球注视方向转换成相机坐标系中的坐标位置;当所述眼球注视方向与某个声源在相机坐标系中的坐标距离小于预设距离值时,将所述声源确定为目标声源;在AR眼镜的镜片上对所述目标声源采用不同于所述第一标记方式的第二标记方式进行区别标记,以此锁定所述目标声源。
在一些实施例中,如图14所示,本发明实施例的AR眼镜的音频增强装置还可以包括:
声纹特征提取单元146,用于对所述声源分布检测单元141检测到的每个声源分别提取声纹特征,并将声纹特征与对应的声源位置进行关联,建立声纹数据库。
在一些实施例中,上述的音频增强单元144具体用于:
根据所述目标声源在相机坐标系中的坐标位置,查找所述声纹数据库获取所述目标声源的声纹特征;从麦克风阵列当前接收的音频信号中提取与所述目标声源的声纹特征相关的音频分量;对提取的音频分量的增益进行放大,和/或对其他未提取的音频分量的增益进行降低或关闭。
本发明实施例的AR眼镜的音频增强装置中的各个模块或单元的实现过程,可以参见上述方法实施例,在此不再赘述。
与前述的AR眼镜的音频增强方法同属于一个技术构思,本发明实施例还提供了一种AR眼镜。图15是本发明实施例提供的AR眼镜的功能模块结构示意图。参见图15,本发明实施例提供的AR眼镜包括:麦克风阵列、眼动仪、入耳式耳机、存储器和处理器,其中,存储器可以是内存,例如高速随机存取存储器(Random-Access Memory,RAM),也可以是非易失性存储器(non-volatile memory),例如至少一个磁盘存储器等。存储器中存储有计算机程序,该计算机程序由处理器加载并执行,以实现前述的AR眼镜的音频增强方法。
在硬件层面上,该AR眼镜还可以选择性地包括通信模块等。扬声器、入耳式耳机、麦克风阵列、眼动仪、存储器、处理器、通信模块等可以通过内部总线相互连接,该内部总线可以是ISA(Industry Standard Architecture,工业标准体系结构)总线、PCI(Peripheral Component Interconnect,外设部件互连标准)总线或EISA(Extended Industry Standard Architecture,扩展工业标准结构)总线等。总线可以分为地址总线、数据总线、控制总线等。为便于表示,图15中仅用一个双向箭头表示,但并不表示仅有一根总线或一种类型的总线。
最后,本发明实施例还提出了一种可读存储介质,该可读存储介质存储一个或多个计算机程序,该一个或多个计算机程序当被处理器执行时,实现前述的AR眼镜的音频增强方法。
可读存储介质包括永久性和非永久性、可移动和非可移动媒体,可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。可读存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。
本领域内的技术人员应明白,本发明的方案可提供为方法、装置、或计算机程序产品。因此,本发明可采用完全硬件实施例、完全软件实施例、或软件和硬件相结合实施例的形式。而且,本发明可采用在一个或多个包含有计算机程序的可读存储介质上实施的计算机程序产品的形式。
还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括要素的过程、方法、商品或者设备中还存在另外的相同要素。
以上仅为本发明的实施例而已,并不用于限制本发明。对于本领域技术人员来说,本发明可以有各种更改和变化。凡在本发明的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在本发明的权利要求范围之内。

Claims (10)

  1. 一种AR眼镜的音频增强方法,其特征在于,包括:
    利用麦克风阵列检测AR眼镜佩戴者周围真实世界中的声源分布情况;
    在AR眼镜的镜片上标记出检测到的每个声源的声源位置;
    根据AR眼镜佩戴者的眼球注视方向锁定目标声源;
    从麦克风阵列接收的音频信号中提取与所述目标声源的声纹特征相关的音频分量进行增强处理;
    将增强处理后的音频信号通过入耳式耳机输出给AR眼镜佩戴者。
  2. 根据权利要求1所述方法,其特征在于,所述利用麦克风阵列检测AR眼镜佩戴者周围真实世界中的声源分布情况包括:
    当AR眼镜佩戴者在第一位置处时,根据麦克风阵列中每个麦克风拾取的不同强度声音信号,获取检测到的每个声源的第一方向线;
    当AR眼镜佩戴者在第二位置处时,根据麦克风阵列中每个麦克风拾取的不同强度声音信号,获取检测到的每个声源的第二方向线;
    根据检测到的每个声源的第一方向线和第二方向线的交点位置,确定出检测到的每个声源的位置。
  3. 根据权利要求1所述方法,其特征在于,所述在AR眼镜的镜片上标记出检测到的每个声源的声源位置包括:
    以AR眼镜佩戴者的头部中心为坐标原点建立世界坐标系,对检测到的每个声源确定出其在世界坐标系中的坐标位置;
    以AR眼镜佩戴者的瞳孔为坐标原点建立相机坐标系,根据相机标定算法得到的转换公式,将检测到的每个声源在世界坐标系中的坐标位置转换为相机坐标系中的坐标位置;
    在AR眼镜的镜片上对检测到的每个声源标记出其在相机坐标系中的坐标位置。
  4. 根据权利要求3所述方法,其特征在于,所述根据AR眼镜佩戴者的眼球注视方向锁定目标声源包括:
    利用眼动仪获取AR眼镜佩戴者的眼球注视方向,并将所述眼球注视方向转换成相机坐标系中的坐标位置;
    当所述眼球注视方向与某个声源在相机坐标系中的坐标距离小于预设距离值时,将所述声 源确定为目标声源;
    在AR眼镜的镜片上对所述目标声源进行区别标记,以此锁定所述目标声源。
  5. 根据权利要求1-4任一项所述方法,其特征在于,所述方法还包括:对检测到的每个声源分别提取声纹特征,并将声纹特征与对应的声源位置进行关联,建立声纹数据库。
  6. 根据权利要求5所述方法,其特征在于,所述从麦克风阵列接收的音频信号中提取与所述目标声源的声纹特征相关的音频分量进行增强处理包括:
    根据所述目标声源在相机坐标系中的坐标位置,查找所述声纹数据库获取所述目标声源的声纹特征;
    从麦克风阵列当前接收的音频信号中提取与所述目标声源的声纹特征相关的音频分量;
    对提取的音频分量的增益进行放大,和/或对其他未提取的音频分量的增益进行降低或关闭。
  7. 一种AR眼镜的音频增强装置,其特征在于,包括:
    声源分布检测单元,用于利用麦克风阵列检测AR眼镜佩戴者周围真实世界中的声源分布情况;
    声源位置标记单元,用于在AR眼镜的镜片上标记出检测到的每个声源的声源位置;
    目标声源锁定单元,用于根据AR眼镜佩戴者的眼球注视方向锁定目标声源;
    音频增强单元,用于从麦克风阵列接收的音频信号中提取与所述目标声源的声纹特征相关的音频分量进行增强处理;
    音频输出单元,用于将增强处理后的音频信号通过入耳式耳机输出给AR眼镜佩戴者。
  8. 根据权利要求7所述装置,其特征在于,所述装置还包括:
    声纹特征提取单元,用于对所述声源分布检测单元检测到的每个声源分别提取声纹特征,并将声纹特征与对应的声源位置进行关联,建立声纹数据库。
  9. 一种AR眼镜,包括麦克风阵列、眼动仪、入耳式耳机、存储器和处理器,所述存储器中存储有计算机程序,所述计算机程序由所述处理器加载并执行,以实现权利要求1~6任一项所述的AR眼镜的音频增强方法。
  10. 一种可读存储介质,所述可读存储介质存储一个或多个计算机程序,所述一个或多个计算机程序当被处理器执行时,实现权利要求1~6任一项所述的AR眼镜的音频增强方法。
PCT/CN2023/111770 2022-09-30 2023-08-08 一种ar眼镜及其音频增强方法和装置、可读存储介质 WO2024066751A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211211572.5 2022-09-30
CN202211211572.5A CN116126132A (zh) 2022-09-30 2022-09-30 一种ar眼镜及其音频增强方法和装置、可读存储介质

Publications (1)

Publication Number Publication Date
WO2024066751A1 true WO2024066751A1 (zh) 2024-04-04

Family

ID=86305192

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/111770 WO2024066751A1 (zh) 2022-09-30 2023-08-08 一种ar眼镜及其音频增强方法和装置、可读存储介质

Country Status (2)

Country Link
CN (1) CN116126132A (zh)
WO (1) WO2024066751A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116126132A (zh) * 2022-09-30 2023-05-16 歌尔科技有限公司 一种ar眼镜及其音频增强方法和装置、可读存储介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160080874A1 (en) * 2014-09-16 2016-03-17 Scott Fullam Gaze-based audio direction
CN112669578A (zh) * 2020-12-18 2021-04-16 上海影创信息科技有限公司 余光区域中基于声源的感兴趣对象告警方法和***
US20210174823A1 (en) * 2019-12-10 2021-06-10 Spectrum Accountable Care Company System for and Method of Converting Spoken Words and Audio Cues into Spatially Accurate Caption Text for Augmented Reality Glasses
CN112986914A (zh) * 2021-02-10 2021-06-18 中国兵器装备集团自动化研究所 一种单兵头盔及其目标声源定位和声纹识别方法
US20210281956A1 (en) * 2020-03-09 2021-09-09 International Business Machines Corporation Hearing Assistance Device with Smart Audio Focus Control
CN113596673A (zh) * 2021-07-14 2021-11-02 宁波旗芯电子科技有限公司 Ar眼镜扬声器的定向发声方法、装置和发声设备
CN116126132A (zh) * 2022-09-30 2023-05-16 歌尔科技有限公司 一种ar眼镜及其音频增强方法和装置、可读存储介质

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160080874A1 (en) * 2014-09-16 2016-03-17 Scott Fullam Gaze-based audio direction
US20210174823A1 (en) * 2019-12-10 2021-06-10 Spectrum Accountable Care Company System for and Method of Converting Spoken Words and Audio Cues into Spatially Accurate Caption Text for Augmented Reality Glasses
US20210281956A1 (en) * 2020-03-09 2021-09-09 International Business Machines Corporation Hearing Assistance Device with Smart Audio Focus Control
CN112669578A (zh) * 2020-12-18 2021-04-16 上海影创信息科技有限公司 余光区域中基于声源的感兴趣对象告警方法和***
CN112986914A (zh) * 2021-02-10 2021-06-18 中国兵器装备集团自动化研究所 一种单兵头盔及其目标声源定位和声纹识别方法
CN113596673A (zh) * 2021-07-14 2021-11-02 宁波旗芯电子科技有限公司 Ar眼镜扬声器的定向发声方法、装置和发声设备
CN116126132A (zh) * 2022-09-30 2023-05-16 歌尔科技有限公司 一种ar眼镜及其音频增强方法和装置、可读存储介质

Also Published As

Publication number Publication date
CN116126132A (zh) 2023-05-16

Similar Documents

Publication Publication Date Title
US10959037B1 (en) Gaze-directed audio enhancement
US10979845B1 (en) Audio augmentation using environmental data
US10154360B2 (en) Method and system of improving detection of environmental sounds in an immersive environment
US11068668B2 (en) Natural language translation in augmented reality(AR)
Andreopoulou et al. Identification of perceptually relevant methods of inter-aural time difference estimation
US11477592B2 (en) Methods and systems for audio signal filtering
CN106782584A (zh) 音频信号处理设备、方法和电子设备
CN206349145U (zh) 音频信号处理设备
US10880669B2 (en) Binaural sound source localization
WO2024066751A1 (zh) 一种ar眼镜及其音频增强方法和装置、可读存储介质
JP2017092732A (ja) 聴覚支援システムおよび聴覚支援装置
WO2019105238A1 (zh) 重构语音信号的方法、终端及计算机存储介质
CN106302974B (zh) 一种信息处理的方法及电子设备
US9866983B2 (en) Sound image direction sense processing method and apparatus
Ahrens et al. A head-mounted microphone array for binaural rendering
Li et al. Multiple active speaker localization based on audio-visual fusion in two stages
JP2018152834A (ja) 仮想聴覚環境において音声信号出力を制御する方法及び装置
CN115586492A (zh) 一种ar眼镜及其声源虚拟重定位方法和装置
US20230421984A1 (en) Systems and methods for dynamic spatial separation of sound objects
WO2022178852A1 (zh) 一种辅助聆听方法及装置
CN116913328B (zh) 音频处理方法、电子设备及存储介质
US20230421983A1 (en) Systems and methods for orientation-responsive audio enhancement
US11689878B2 (en) Audio adjustment based on user electrical signals
CN116320891A (zh) 一种音频处理方法、穿戴式设备
WO2023250171A1 (en) Systems and methods for orientation-responsive audio enhancement

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23869997

Country of ref document: EP

Kind code of ref document: A1