CN116312620A - Audio processing method, head-mounted display device, and computer-readable storage medium - Google Patents

Audio processing method, head-mounted display device, and computer-readable storage medium Download PDF

Info

Publication number
CN116312620A
CN116312620A CN202310091631.8A CN202310091631A CN116312620A CN 116312620 A CN116312620 A CN 116312620A CN 202310091631 A CN202310091631 A CN 202310091631A CN 116312620 A CN116312620 A CN 116312620A
Authority
CN
China
Prior art keywords
audio
information
user
key
mounted display
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310091631.8A
Other languages
Chinese (zh)
Inventor
赵冠博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Goertek Inc
Original Assignee
Goertek Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Goertek Inc filed Critical Goertek Inc
Priority to CN202310091631.8A priority Critical patent/CN116312620A/en
Publication of CN116312620A publication Critical patent/CN116312620A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/16Devices for psychotechnics; Testing reaction times ; Devices for evaluating the psychological state
    • A61B5/165Evaluating the state of mind, e.g. depression, anxiety
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/02Detecting, measuring or recording pulse, heart rate, blood pressure or blood flow; Combined pulse/heart-rate/blood pressure determination; Evaluating a cardiovascular condition not otherwise provided for, e.g. using combinations of techniques provided for in this group with electrocardiography or electroauscultation; Heart catheters for measuring blood pressure
    • A61B5/024Detecting, measuring or recording pulse rate or heart rate
    • A61B5/02438Detecting, measuring or recording pulse rate or heart rate with portable devices, e.g. worn by the patient
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/16Devices for psychotechnics; Testing reaction times ; Devices for evaluating the psychological state
    • A61B5/163Devices for psychotechnics; Testing reaction times ; Devices for evaluating the psychological state by tracking eye movement, gaze, or pupil change
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/16Devices for psychotechnics; Testing reaction times ; Devices for evaluating the psychological state
    • A61B5/168Evaluating attention deficit, hyperactivity
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/68Arrangements of detecting, measuring or recording means, e.g. sensors, in relation to patient
    • A61B5/6801Arrangements of detecting, measuring or recording means, e.g. sensors, in relation to patient specially adapted to be attached to or worn on the body surface
    • A61B5/6802Sensor mounted on worn items
    • A61B5/6803Head-worn items, e.g. helmets, masks, headphones or goggles
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/72Signal processing specially adapted for physiological signals or for diagnostic purposes
    • A61B5/7235Details of waveform analysis
    • A61B5/7264Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems
    • A61B5/7267Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems involving training the classification device
    • GPHYSICS
    • G02OPTICS
    • G02BOPTICAL ELEMENTS, SYSTEMS OR APPARATUS
    • G02B27/00Optical systems or apparatus not provided for by any of the groups G02B1/00 - G02B26/00, G02B30/00
    • G02B27/01Head-up displays
    • G02B27/017Head mounted
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G02OPTICS
    • G02BOPTICAL ELEMENTS, SYSTEMS OR APPARATUS
    • G02B27/00Optical systems or apparatus not provided for by any of the groups G02B1/00 - G02B26/00, G02B30/00
    • G02B27/01Head-up displays
    • G02B27/017Head mounted
    • G02B2027/0178Eyeglass type
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2203/00Indexing scheme relating to G06F3/00 - G06F3/048
    • G06F2203/01Indexing scheme relating to G06F3/01
    • G06F2203/011Emotion or mood input determined on the basis of sensed human body parameters such as pulse, heart rate or beat, temperature of skin, facial expressions, iris, voice pitch, brain activity patterns
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Public Health (AREA)
  • Molecular Biology (AREA)
  • Veterinary Medicine (AREA)
  • General Health & Medical Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Surgery (AREA)
  • Biophysics (AREA)
  • Pathology (AREA)
  • Biomedical Technology (AREA)
  • Heart & Thoracic Surgery (AREA)
  • Medical Informatics (AREA)
  • Psychiatry (AREA)
  • Human Computer Interaction (AREA)
  • Developmental Disabilities (AREA)
  • Artificial Intelligence (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Social Psychology (AREA)
  • Hospice & Palliative Care (AREA)
  • Psychology (AREA)
  • Child & Adolescent Psychology (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • Educational Technology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Theoretical Computer Science (AREA)
  • Cardiology (AREA)
  • Physiology (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Optics & Photonics (AREA)
  • Quality & Reliability (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)

Abstract

The application discloses an audio processing method, a head-mounted display device and a computer readable storage medium, wherein the audio processing method comprises the following steps: dynamically acquiring external environment audio information, and identifying whether preset key audio information exists in the environment audio information through a converged audio identification neural network model; if the key audio information exists, compensating and adjusting acoustic parameters of the key audio information to obtain key audio enhancement information; and outputting the key audio enhancement information. In the process of using the head-mounted display device, the method and the device can effectively prompt the user of external key environment information so that the user can timely distinguish the condition of the external environment.

Description

Audio processing method, head-mounted display device, and computer-readable storage medium
Technical Field
The present application relates to the technical field of wearable devices, and in particular, to an audio processing method, a head-mounted display device, and a computer readable storage medium.
Background
VR (virtual reality) devices or AR (Augmented Reality) devices are currently being rapidly developed and popularized as head-mounted display devices. With the continuous improvement of immersion of VR/AR devices, users often reduce the attention to external environmental elements, such as less sensitive to external sounds, in the process of wearing a head-mounted display device for immersion experience. However, in some application scenarios, the user still wants to clearly hear some external key sounds during the process of wearing the head-mounted display device, such as an alarm sound (for example, triggering fire alarm, knocking, calling by other people to the user, broadcasting a sound to a platform in a bus scene, a car whistling in a walking scene, etc. That is, the user wears the head-mounted display device to perform immersive experience, and meanwhile, there is a need to pay attention to external environment elements, because the external environment elements in many application scenarios contain key information for the user, even dangerous prompt information, but the head-mounted display device which brings immersive experience to the user at present is almost completely isolated from the hearing of the outside, so that the user cannot effectively acquire the external key environment information in time, and no doubt causes great inconvenience to the user.
Therefore, how to effectively prompt the user with external key environment information when the user is immersed in the head-mounted display device so as to avoid the situation that the user cannot timely distinguish things in the external environment is a technical problem to be solved by the person skilled in the art.
Disclosure of Invention
The main objective of the present application is to provide an audio processing method, a head-mounted display device and a computer readable storage medium, which aim to solve the technical problem that in the process of using the head-mounted display device, external key environment information cannot be effectively prompted to a user, so that the user cannot timely distinguish the external environment condition.
To achieve the above object, the present application provides an audio processing method applied to a head-mounted display device, the method including:
dynamically acquiring external environment audio information, and identifying whether preset key audio information exists in the environment audio information through a converged audio identification neural network model;
if the key audio information exists, compensating and adjusting acoustic parameters of the key audio information to obtain key audio enhancement information;
and outputting the key audio enhancement information.
Optionally, the step of performing compensation adjustment of acoustic parameters on the key audio information includes:
acquiring a current concentration factor of a user on an augmented reality environment presented in the head-mounted display device;
determining a current environmental audio loss level associated with the current concentration factor, wherein the higher the current concentration factor is, the higher the associated current environmental audio loss level is;
inquiring to obtain a sound source space position deviation value and/or an audio intensity loss value of the current environment audio loss level mapping from a preset loss level mapping relation;
and compensating and adjusting acoustic parameters of the key audio information according to the mapped sound source space position deviation value and/or the audio intensity loss value.
Optionally, the step of performing compensation adjustment of acoustic parameters on the key audio information according to the mapped sound source spatial position deviation value and/or the audio intensity loss value includes:
determining the beam phase displacement of the key audio information according to the mapped sound source space position deviation value and/or determining the beam amplitude loss of the key audio information according to the mapped audio intensity loss value;
determining audio parameter compensation information of the key audio information according to the beam phase displacement and/or the beam amplitude loss;
And according to the audio parameter compensation information, carrying out compensation adjustment on acoustic parameters on the key audio information so as to compensate beam phase displacement and/or beam amplitude loss of the key audio information.
Optionally, before the step of determining the current environmental audio loss level associated with the current concentration factor, the method further comprises:
playing preset virtual test audio, wherein the sound source space position of the virtual test audio is a preset space position;
outputting a preset guiding interface for guiding a user to judge the sound source space position of the virtual test audio;
acquiring azimuth information input by a user in response to the preset guide interface, comparing the azimuth information with the preset space azimuth, and determining the key audio resolution of the user according to a comparison result;
determining the perception sensitivity of a user to the key audio information according to the key audio resolution, and selecting a concentration mapping gradient matched with the perception sensitivity from a preset mapping gradient database according to the perception sensitivity, wherein the concentration mapping gradient comprises a plurality of concentration coefficients and an environmental audio loss level associated with each concentration coefficient;
The step of determining the current environmental audio loss level associated with the current concentration factor comprises:
and determining the current environmental audio loss level associated with the current concentration coefficient according to the matched concentration mapping gradient.
Optionally, the step of obtaining the current concentration factor of the user for the augmented reality environment presented in the head mounted display device comprises:
detecting current user physiological characteristic information and device use state information, wherein the user physiological characteristic information comprises at least one of pupil size, blink frequency, heart rate, respiratory rate and body temperature, and the device use state information comprises at least one of use duration, motion state, electricity consumption rate and current running application program of the head-mounted display device;
and determining the current concentration coefficient of the user to the augmented reality environment presented in the head-mounted display device according to the physiological characteristic information of the user and the use state information of the device.
Optionally, the step of determining the current concentration factor of the user for the augmented reality environment presented in the head mounted display device according to the physiological characteristic information of the user and the device usage state information comprises:
Acquiring a preset concentration recognition neural network model;
inputting the physiological characteristic information of the user and the equipment use state information into the concentration recognition neural network model, and predicting to obtain the current concentration coefficient of the user for the augmented reality environment presented in the head-mounted display equipment.
Optionally, the method further comprises:
acquiring demand identification audio information corresponding to at least one application scene;
correlating the plurality of the requirement identification audio information with a key audio tag to obtain a key audio sample set, and correlating the plurality of environmental noise information with an interference audio tag to obtain an interference audio sample set, wherein the environmental noise information does not contain the requirement identification audio information;
training a preset neural network model through the key audio sample set and the interference audio sample set to obtain a converged audio recognition neural network model.
Optionally, after the step of outputting the key audio enhancement information, the method further comprises:
and displaying the sound source space position corresponding to the key audio information on a display interface of the head-mounted display device in a mode of marking on a radar chart or an azimuth scale.
The application also provides a head-mounted display device, the head-mounted display device is a physical device, the head-mounted display device includes: the audio processing device includes a memory, a processor, and a program of the audio processing method stored on the memory and executable on the processor, which when executed by the processor, can implement the steps of the audio processing method as described above.
The present application also provides a computer-readable storage medium having stored thereon a program that implements an audio processing method, the program being executed by a processor to implement the steps of the audio processing method as described above.
The present application also provides a computer program product comprising a computer program which, when executed by a processor, implements the steps of an audio processing method as described above.
According to the technical scheme, whether preset key audio information exists in the environment audio information is identified through the converged audio identification neural network model, so that whether the environment audio information has audio information which is important to a user in a current scene or not is identified, capture of the key audio information is achieved, if the key audio information exists, compensation adjustment of acoustic parameters is carried out on the key audio information, key audio enhancement information is obtained, the key audio enhancement information is output, acoustic compensation is carried out on the key audio information under the condition that the key audio information exists in the environment audio information is determined, volume of the key audio information is enhanced, attention of the user is transferred to the prompted key audio information, or sound source space positions corresponding to the key audio information are calibrated, the user can know the key audio information clearly and accurately, in addition, in the process of immersing the head-mounted display device, the user can effectively prompt the external environment information to the user, environmental resolution is improved, and hidden danger of the user is avoided, and the environment is prevented.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.
In order to more clearly illustrate the technical solutions of the present embodiment or the prior art, the drawings used in the description of the embodiment or the prior art will be briefly described below, and it will be obvious to those skilled in the art that other drawings can be obtained from these drawings without inventive effort.
Fig. 1 is a schematic flow chart of a first embodiment of an audio processing method of the present application;
FIG. 2 is a flowchart of a second embodiment of an audio processing method according to the present application;
FIG. 3 is a flowchart of a third embodiment of an audio processing method according to the present application;
FIG. 4 is a schematic view of a scenario in which a user performs sound source localization on key audio information while wearing a head mounted display device;
fig. 5 is a schematic view of a scene in which the spatial positions of audio sources of key audio information are visually displayed in the embodiment of the application;
FIG. 6 is a preset guidance interface in an embodiment of the present application;
fig. 7 is a schematic device structure diagram of a hardware running environment related to the head-mounted display device in this embodiment.
The implementation, functional features and advantages of the present application will be further described with reference to the accompanying drawings in conjunction with the embodiments.
Detailed Description
In order to make the above objects, features and advantages of the present invention more comprehensible, the following description of the embodiments accompanied with the accompanying drawings will be given in detail. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In this embodiment, the head mounted display device includes, but is not limited to, a Mixed Reality (Mixed Reality) -MR device (e.g., MR glasses or MR helmets), an augmented Reality (Augmented Reality) -AR device (e.g., AR glasses or AR helmets), a virtual Reality (virtual Reality) -VR device (e.g., VR glasses or VR helmets), an augmented Reality (Extended Reality) -XR device, or some combination thereof, and the like head mounted display device.
Since in some application scenarios, a user may wish to clearly hear some external key sounds during the process of wearing the head-mounted display device, such as an alarm sound (e.g., triggering a fire alarm), a knock sound, and other people's call to the user in a home scenario, a platform broadcast sound in a bus scenario, a car whistle in a walking scenario, etc. That is, the user wears the head-mounted display device to perform immersive experience, and meanwhile, there is a need to pay attention to external environment elements, because the external environment elements in many application scenarios contain key information for the user, even dangerous prompt information, but the head-mounted display device which brings immersive experience to the user at present is almost completely isolated from the hearing of the outside, so that the user cannot effectively acquire the external key environment information in time, and no doubt causes great inconvenience to the user.
Example 1
Based on this, referring to fig. 1, the present embodiment provides an audio processing method, which includes:
step S10, dynamically acquiring external environment audio information, and identifying whether preset key audio information exists in the environment audio information or not through a converged audio identification neural network model;
in this embodiment, the environmental audio information refers to sound information generated by the external environment during the process of wearing the head-mounted display device by the user. I.e. the ambient audio information is distinguishable from the sound generated by the head mounted display device itself.
In one embodiment, the environmental audio information may be captured by a microphone on the head mounted display device. In another embodiment, the environmental audio information may be obtained by receiving environmental audio information sent from other terminal devices (e.g., a smart watch, a cell phone, or a smart speaker) communicatively coupled to the head mounted display device.
It should be noted that, a plurality of operation modes may be set in the system of the head-mounted display device, and the types of operation modes may include an immersion mode and an intelligent mode, and a user may trigger the head-mounted display device to enter the immersion mode by triggering a function key corresponding to the immersion mode. After the head-mounted display device enters the immersion mode, the head-mounted display device completely isolates the external environment audio information, and step S10, and subsequent steps S20 and S30 of the embodiment are not performed, so that the immersion experience of the user using the head-mounted display device is improved. The user can trigger the head-mounted display device to enter the intelligent mode by triggering the function key corresponding to the intelligent mode. After the head-mounted display device enters the smart mode, step S10 is performed during the process of using the head-mounted display device by the user: and dynamically acquiring external environment audio information, identifying whether preset key audio information exists in the environment audio information through a converged audio identification neural network model, and executing the following steps S20 and S30. Therefore, when the user is in an application scene with higher safety coefficient or lower knowledge requirement on external environment elements (for example, the user is in a relatively quiet or VR activity experience place supervised by a special person), the user can make the head-mounted display device enter an immersion mode by touching the specific keys so as to furthest promote the immersion of the user on the head-mounted display device. When the user is in an application scene with low safety coefficient or high requirement for understanding external environment elements (such as fire alarm sound and knocking sound in a home scene, calling sound of other people to the user, platform broadcasting sound in a bus sitting scene, car whistling sound in a walking scene and the like), the user can enable the head-mounted display device to enter an intelligent mode through touching the specific keys so as to effectively prompt the user for external key environment information, and the user can timely distinguish the environment conditions, so that the immersion sense of the process of using the head-mounted display device by the user is improved as much as possible. That is, the user can select the current most suitable operation mode according to the specific application scene of the current head-mounted display device so as to match the actual use requirement of the user, and better product experience is brought to the user.
In this embodiment, the environmental audio information may be input to the converged audio recognition neural network model, so that whether the preset key audio information exists in the environmental audio information is recognized by the converged audio recognition neural network model. The key audio information refers to audio that is important to the user, such as alarm sounds in a home scene (e.g., triggering fire alarm), knocks, and other calls to the user, platform broadcast sounds in a bus sitting scene, car whistles in a walking scene, etc.
Step S20, if the key audio information exists, compensating and adjusting acoustic parameters of the key audio information to obtain key audio enhancement information;
in this embodiment, because the user focuses on the augmented reality content of the head-mounted display device during the use of the head-mounted display device, even if the key audio information in the environmental audio information is prompted, the user may not be able to perceive the key audio information due to immersion in the augmented reality content, so that the external key audio information is ignored (this is because the sensitivity of the sensory system of the human body may be affected by the subjective focus point), and therefore, in this embodiment, the key audio enhancement information needs to be obtained by performing compensation adjustment on the acoustic parameters of the key audio information. For example, the volume of the key audio information can be improved by compensating the beam amplitude of the key audio information, so that the attention of the user can be better transferred to the prompted key audio information, and the sound source space position corresponding to the key audio information can be more accurately presented by compensating the beam phase displacement of the key audio information, so that the user can more easily and clearly distinguish the sound production position corresponding to the key audio information, and the risk can be timely avoided. It is known to those skilled in the art that the present techniques such as beam forming, adaptive filtering, and adaptive volume may be used to perform compensation adjustment on acoustic parameters on the key audio information, so as to satisfy the situation that the user is concentrating on the augmented reality content, and also obtain the prompt of the key audio information in time.
And step S30, outputting the key audio enhancement information.
In this embodiment, the key audio enhancement information may be played through the headphones corresponding to the left and right ears in the head-mounted display device, so as to achieve the purpose of outputting the key audio enhancement information.
According to the technical scheme, whether preset key audio information exists in the environment audio information is identified through a converged audio identification neural network model, so that whether the environment audio information has audio information which is important for a user in a current scene or not is identified, capturing of the key audio information is achieved, if the key audio information exists, compensation adjustment of acoustic parameters is conducted on the key audio information, key audio enhancement information is obtained, the key audio enhancement information is output, accordingly, acoustic compensation is conducted on the key audio information under the condition that the key audio information exists in the environment audio information is determined, volume of the key audio information is enhanced, attention of the user is more favorably transferred to the prompted key audio information, or sound source space positions corresponding to the key audio information are calibrated, the user can clearly and accurately acquire the key audio information and distinguish sound emitting positions corresponding to the key audio information, in the process of immersing the head-mounted display device, the user can effectively prompt the external key audio information to the user, the situation that the user cannot distinguish things in time is avoided, and hidden danger of the environment is reduced.
Illustratively, after the step of outputting the key audio enhancement information, the method further comprises:
and step A10, displaying the sound source space position corresponding to the key audio information on a display interface of the head-mounted display device in a mode of marking on a radar chart or an azimuth scale.
It should be noted that the ability of binaural localization, that is, the resolving power of a sound source, represents quantitatively the accuracy of the azimuth angle of the sound source with respect to its own position, that is, the minimum azimuth angle that can be resolved. For example, the 360-degree angle around the user is equally divided into four equal parts, the user can only respectively determine which of the four equal parts the azimuth angle of the sound source relative to the position of the user belongs to, and the user cannot subdivide the sound source, and at this time, the minimum azimuth angle which the user can distinguish is 90 degrees. For another example, the 360 degrees around the user are equally divided into eight equal parts, and the user can only separate which of the eight equal parts the azimuth of the sound source relative to the own position belongs to, and the minimum azimuth that the user can distinguish is 40 degrees (as shown in fig. 6).
However, since the user is immersed and excessively focused during the process of using the head-mounted display device, the external stimulus of the auditory system is excessively ignored, the minimum azimuth angle is increased, and the positioning accuracy is reduced, so that the user may misjudge the spatial position of the sound source corresponding to the key audio information, and unnecessary trouble and potential danger are caused. For example, as shown in fig. 4, if the minimum azimuth is α1 under normal conditions (i.e., not focused on the augmented reality content presented by the head-mounted display device), the minimum azimuth is α2 when focused on (i.e., focused on the augmented reality content presented by the head-mounted display device), and α1< α2. If B is a hazard message (e.g., falling weight on a construction site, whistling in a walking scene), a false recognition of B' may be caused in a focus scene.
Based on this, in this embodiment, by displaying the sound source spatial position corresponding to the key audio information on the display interface of the head-mounted display device in a manner of performing identification on the radar chart or the azimuth scale, the sound source position is visualized, that is, in this embodiment, on the basis of enhancing the hearing feeling of the user, the sound source spatial position information is prompted to the user in a visualized manner, so that the sound source spatial position information is more easily received by the user. Fig. 5, fig. 5 is a schematic view of a scene in which the spatial positions of sound sources of key audio information are visually displayed in the embodiment of the present application,
the visualized display interface is shown in the right diagram in fig. 5. The display interface can display graphic information such as a radar chart or an azimuth scale (the azimuth scale can comprise a head-drive angle scale and an acoustic source position scale), and the spatial positions of the acoustic sources corresponding to the key audio information are marked on the display radar chart or the azimuth scale, so that visual prompts are provided for users, and further, the users can more clearly and accurately distinguish sounding positions corresponding to the key audio information, and further, the users can effectively prompt the users with external key environment information in the process of immersing and using the head-mounted display device, and unnecessary troubles and potential hazards are reduced.
In one possible implementation, referring to fig. 2, the step of performing compensation adjustment of acoustic parameters on the key audio information includes:
step S21, obtaining a current concentration factor of a user on an augmented reality environment presented in the head-mounted display device;
in this embodiment, the current concentration factor is used to characterize the user's current concentration in the augmented reality environment presented in the head mounted display device. The greater the current concentration factor, the higher the concentration.
In an embodiment, the step of obtaining the current concentration factor of the user for the augmented reality environment presented in the head mounted display device may specifically be: and acquiring the concentration factor input by the user, and taking the concentration factor as the current concentration factor of the user on the augmented reality environment presented in the head-mounted display device.
In another embodiment, the step of obtaining the current concentration factor of the user for the augmented reality environment presented in the head mounted display device may specifically be: and inquiring the concentration factor mapped by the target application program from a preset application program mapping factor table by acquiring the target application program currently running by the head-mounted display device, and taking the concentration factor mapped by the target application program as the current concentration factor. The application program mapping coefficient table stores a mapping relation of one-to-one mapping between each application program and the concentration coefficient. Applications include, but are not limited to, VR games, VR movies, VR shopping, photographing, music playing, setting (setting functions of parameters such as sound or image), information notification, weather status, voice calls, etc. Those skilled in the art will appreciate that different applications often map different concentration factors, such as those for VR games or VR viewing mappings, that are often higher than those for music playback, setting (setting functions for parameters such as sound or image) or information notification.
Step S22, determining a current environmental audio loss level associated with the current concentration factor, wherein the higher the current concentration factor is, the higher the associated current environmental audio loss level is;
in this embodiment, the current environmental audio loss level is used to characterize the degree of loss of environmental audio information caused by a reduced perception of environmental audio by the user as concentrating on the augmented reality environment presented in the head-mounted display device. The greater the current environmental audio loss level, the greater the level of loss of the environmental audio information.
It will be appreciated by those skilled in the art that the more focused the user is on the augmented reality content of the head mounted display device, the less sensitive the user perceives the external environmental audio information, and the easier it is to ignore the external critical audio information. Thus, the higher the current concentration factor, the higher the current environmental audio loss level associated with the current concentration factor. As an example, the current environmental audio loss level associated with the current concentration coefficient may be obtained by querying from a preset concentration coefficient mapping table. The concentration factor mapping table comprises a plurality of different concentration factors and environment audio loss levels mapped by the concentration factors.
Step S23, inquiring to obtain a sound source space position deviation value and/or an audio intensity loss value of the current environment audio loss level mapping from a preset loss level mapping relation;
in this embodiment, the information of the loss level mapping relationship includes a plurality of different environmental audio loss levels, and a sound source spatial position deviation value and/or an audio intensity loss value mapped by each environmental audio loss level. It is readily understood that the higher the current ambient audio loss level, the higher the mapped source spatial position deviation value and/or audio intensity loss value.
And step S24, carrying out compensation adjustment on acoustic parameters on the key audio information according to the mapped sound source space position deviation value and/or the audio intensity loss value.
Illustratively, the step of performing compensation adjustment of acoustic parameters on the key audio information according to the mapped sound source spatial position deviation value and/or the audio intensity loss value includes:
step B10, determining the wave beam phase displacement of the key audio information according to the mapped sound source space position deviation value and/or determining the wave beam amplitude loss of the key audio information according to the mapped audio intensity loss value;
Step B20, determining audio parameter compensation information of the key audio information according to the beam phase displacement and/or the beam amplitude loss;
and step B30, according to the audio parameter compensation information, performing compensation adjustment of acoustic parameters on the key audio information so as to compensate beam phase displacement and/or beam amplitude loss of the key audio information.
In this embodiment, the larger the sound source spatial position deviation value, the larger the beam phase displacement of the key audio information is explained. Correspondingly, the greater the audio intensity loss value, the greater the beam amplitude loss of the critical audio information. It is easy to understand that the audio parameter compensation information includes phase displacement compensation information for compensating or correcting spatial position deviation of the audio source and/or beam amplitude compensation information for compensating or correcting audio intensity loss, so that compensation adjustment of acoustic parameters is conveniently performed on the key audio information according to the audio parameter compensation information, and the purpose of accurately compensating beam phase displacement and/or beam amplitude loss of the key audio information is achieved.
According to the method, the current concentration coefficient of the user on the augmented reality environment presented in the head-mounted display device is obtained, and the current environment audio loss level associated with the current concentration coefficient is determined, wherein the higher the current concentration coefficient is, the higher the associated current environment audio loss level is; inquiring to obtain a sound source space position deviation value and/or an audio intensity loss value of the current environment audio loss level mapping from a preset loss level mapping relation; according to the mapped sound source space position deviation value and/or the audio intensity loss value, the key audio information is subjected to compensation adjustment of acoustic parameters, so that the volume of the key audio information is improved by compensating the wave beam amplitude of the key audio information, the attention of a user is more favorably transferred to the prompted key audio information, and/or the sound source space position corresponding to the key audio information is more accurately presented by compensating the wave beam phase displacement of the key audio information, so that the sound production position corresponding to the key audio information is more favorably, clearly and accurately distinguished by the user, and the external key environment information can be effectively prompted to the user in the process of immersing the head-mounted display device, and the situation that the user cannot timely distinguish things in the external environment is avoided.
In one possible implementation, before the step of determining the current environmental audio loss level associated with the current concentration factor, the method further includes:
step C10, playing preset virtual test audio, wherein the sound source space position of the virtual test audio is a preset space direction;
in this embodiment, the virtual test audio has sounding azimuth information (i.e. sound source spatial positions) relative to the user, and it should be noted that the virtual test audio is an audio simulated by the head-mounted display device, that is, the audio with a frequency response time difference is played by controlling the internal headphones corresponding to the left ear and the right ear in the head-mounted display device (the distances from the sound source to the left ear and the right ear are different, so that the time of the sound signal propagating to the two ears has a slight difference, and such a time difference can help us to know the spatial positions of the sound sources, and according to the time and loudness difference of the audio brought by the ears of the person, the sound source spatial positions corresponding to the audio can be determined), so that the user feels the audio transmitted from the preset spatial azimuth, thereby simulating stereo with the sound source spatial position information.
Step C20, outputting a preset guiding interface for guiding a user to judge the sound source space position of the virtual test audio;
In an embodiment, the preset guiding interface may be configured to guide the user to directly input a spatial azimuth range value of the sounding position corresponding to the virtual test audio, for example, the spatial azimuth range value is an azimuth range value of 30 to 60 degrees in front of the right of the user.
In another embodiment, the preset guiding interface may include an azimuth selecting ring, where the azimuth selecting ring is divided into a plurality of equal parts, and the position of the user is preset to be in the center of the azimuth selecting ring, and the user may select one of the equal parts by touch to complete input of the spatial azimuth information of the sounding position corresponding to the virtual test audio. As shown in fig. 6, fig. 6 is a preset guiding interface in an embodiment of the present application, where the preset guiding interface displays an azimuth selection ring, where the azimuth selection ring is equally divided into eight equal-divided blocks, each equal-divided block is 40 degrees (i.e., the minimum azimuth angle currently resolved is 40 degrees), and a user discriminates, by listening to a virtual test audio played by the head-mounted display device, an azimuth pointed by a dark equal-divided block that is located at a sound source spatial position, so that the user can complete input of spatial azimuth information corresponding to the sound source spatial position by touching the dark equal-divided block.
Step C30, acquiring azimuth information input by a user in response to the preset guide interface, comparing the azimuth information with the preset space azimuth, and determining the key audio resolution of the user according to a comparison result;
it should be noted that the key audio resolution is used to characterize the recognition capability of the spatial location of the sound source of the key audio information.
In this embodiment, in the comparison result, if the deviation between the azimuth information and the preset spatial azimuth is smaller, the higher the key audio resolution of the user is indicated. Conversely, the greater the deviation between the azimuth information and the preset spatial azimuth, the lower the key audio resolution of the user is indicated.
It should be noted that, a person skilled in the art may sequentially play a plurality of virtual test audios, and the spatial positions of sound sources corresponding to each virtual test audio are different, so as to sequentially test the resolutions of key audios for different sounding directions of a user.
Step C40, determining the perception sensitivity of a user to the key audio information according to the key audio resolution, and selecting a concentration mapping gradient matched with the perception sensitivity from a preset mapping gradient database according to the perception sensitivity, wherein the concentration mapping gradient comprises a plurality of concentration coefficients and an environmental audio loss level associated with each concentration coefficient;
Wherein the smaller the perceived sensitivity, the greater the concentration map gradient. The greater the perceived sensitivity, the less the concentration map gradient.
To facilitate an understanding of embodiments of the present application, an example is enumerated in which the concentration map gradients stored in the map gradient database are, in order from small to large: a first mapping gradient and a second mapping gradient. In the first mapping gradient, the associated environmental audio loss level is a low audio loss level when the concentration factor range is [ 0.1,0.35), the associated environmental audio loss level is a medium audio loss level when the concentration factor range is [ 0.35,0.65), and the associated environmental audio loss level is a medium and high audio loss level when the concentration factor range is [ 0.65,0.9). In the second mapping gradient, the associated environmental audio loss level is a medium audio loss level when the concentration factor range is [ 0.1,0.35), the associated environmental audio loss level is a medium and high audio loss level when the concentration factor range is [ 0.35,0.65), and the associated environmental audio loss level is a high audio loss level when the concentration factor range is [ 0.65,0.9). It should be noted that, the above examples of mapping gradient database and concentration mapping gradient are only used to assist understanding of the present application, and are not limited to the present application, and more forms of simple transformation based on the technical concept or technical principle of the embodiments of the present application are all within the protection scope of the present application.
The step of determining the current environmental audio loss level associated with the current concentration factor comprises:
and step C50, determining the current environmental audio loss level associated with the current concentration coefficient according to the matched concentration mapping gradient.
In this embodiment, it is easy to understand that the larger the concentration map gradient, the larger the current environmental audio loss level associated with the current concentration coefficient at the same time (because the larger the current environmental audio loss level associated with the same concentration coefficient).
The embodiment obtains the azimuth information input by the user in response to the preset guiding interface by playing the preset virtual testing audio, wherein the sound source space position of the virtual testing audio is the preset space azimuth, and outputs the preset guiding interface guiding the user to judge the sound source space position of the virtual testing audio, compares the azimuth information with the preset space azimuth, determines the key audio resolution of the user according to the comparison result, thereby accurately detecting the recognition capability of the user for sound source localization, determines the perception sensitivity of the user to the key audio information according to the key audio resolution, and selects the concentration mapping gradient matched with the perception sensitivity from the preset mapping gradient database according to the size of the perception sensitivity, and because the preset mapping gradient database is provided with a plurality of concentration mapping gradients matched with the perception sensitivity one by one, different perception sensitivities are different, the embodiment can accurately test the concentration mapping (the concentration gradient is used for representing the personal situation of the user) of the user according to the perception of the key audio information by the user, more accurately determines the concentration mapping (the concentration gradient is used for the personal context factor to be more relevant to the current context, and more accurately matches the actual context loss, and more accords with the current context loss, and the actual context loss can be more accurately determined according to the relationship between the current context and the personal context loss, the method is convenient for the follow-up more personalized determination of the audio parameter compensation information which is close to the personal demands of the user, and compensates the wave beam phase displacement and/or the wave beam amplitude loss of the key audio information, so that the external key environment information can be effectively prompted to the user, and the immersion sense of the user in the process of using the head-mounted display device is improved as much as possible on the basis of timely distinguishing the environment condition.
In one embodiment, the step of obtaining the current concentration factor of the user for the augmented reality environment presented in the head mounted display device comprises:
step D10, detecting current user physiological characteristic information and device use state information, wherein the user physiological characteristic information comprises at least one of pupil size, blink frequency, heart rate, respiratory rate and body temperature, and the device use state information comprises at least one of use duration, motion state, electricity consumption rate and current running application program of the head-mounted display device;
and step D20, determining the current concentration coefficient of the user to the augmented reality environment presented in the head-mounted display device according to the physiological characteristic information of the user and the use state information of the device.
Illustratively, further in one possible implementation, the step of determining the current concentration factor of the user for the augmented reality environment presented in the head mounted display device according to the user physiological characteristic information and the device usage state information includes:
step E10, acquiring a preset concentration recognition neural network model;
and E20, inputting the physiological characteristic information of the user and the equipment use state information into the concentration recognition neural network model, and predicting to obtain a current concentration coefficient of the user for the augmented reality environment presented in the head-mounted display equipment.
In this embodiment, it will be understood by those skilled in the art that the physiological characteristic information of the user, such as pupil size and heart rate, may reflect the emotion of the user, and when the emotion of the user is relatively rough (e.g., playing VR games or viewing VR videos is prone to being excited and stressed, etc.), the concentration of the user on the augmented reality environment presented in the head-mounted display device (or called immersion) is often reflected relatively high, and when the emotion of the user is relatively flat, the concentration of the user on the augmented reality environment presented in the head-mounted display device is often reflected relatively low. While user physiological characteristic information such as blink frequency, heart rate, respiration rate, and body temperature may reflect the user's physical or psychological level of activity, when the level of activity is high, it tends to reflect that the user's concentration in the augmented reality environment presented in the head mounted display device is relatively high, and when the level of activity is low, it tends to reflect that the user's concentration in the augmented reality environment presented in the head mounted display device is relatively low.
In addition, in the present embodiment, it is easy to understand that the duration of use of the head-mounted display device is long, and it is more capable of reflecting that the concentration of the user on the augmented reality environment presented in the head-mounted display device is high. And the motion state of the head-mounted display device is active, so that the physical activity degree of the user is high, and the user is often reflected to have relatively high concentration on the augmented reality environment presented in the head-mounted display device. The greater the power consumption rate of the head-mounted display device, the more complex the operating environment of the application program that tends to indicate that the head-mounted display device is running, the more attractive the user's attention to the augmented reality environment presented in the head-mounted display device (e.g., the power consumption rate of VR games or VR viewing applications is greater than the power consumption rate of music playing, setting or information notification). Correspondingly, different currently running applications often have different attentions from users to the augmented reality environment presented in the head mounted display device, e.g., the user's attention to VR games or VR viewing is often more focused than the music playing, setting or information notification is required.
Therefore, the embodiment detects the current physiological characteristic information of the user and the equipment use state information, wherein the physiological characteristic information of the user comprises at least one of pupil size, blink frequency, heart rate, breathing rate and body temperature, the equipment use state information comprises at least one of use duration, motion state, electricity consumption rate and current running application program of the head-mounted display equipment, and the attention degree of the user to the augmented reality environment presented in the head-mounted display equipment is judged by integrating multiple factors of the physiological characteristic information of the user and the equipment use state information, so that the current concentration factor of the user is accurately measured.
Example two
In another embodiment of the present application, the same or similar content as the first embodiment may be referred to the description above, and will not be repeated. On this basis, referring to fig. 3, the method further includes:
step S40, obtaining requirement identification audio information corresponding to at least one application scene;
in this embodiment, the types of application scenes may include, but are not limited to, a sitting bus scene, a home scene, a walking scene, and the like. The demand identification audio information corresponding to the sitting bus scene can be a platform broadcasting sound. The demand identification audio information corresponding to the home scene may be a fire alarm sound, a knock sound, or a call sound of other people to the user. The demand identification audio information corresponding to the walking scene may be a car whistle.
Step S50, associating a plurality of the requirement identification audio information with a key audio tag to obtain a key audio sample set, and associating a plurality of environmental noise information with an interference audio tag to obtain an interference audio sample set, wherein the environmental noise information does not contain the requirement identification audio information;
and step S60, training a preset neural network model through the key audio sample set and the interference audio sample set to obtain a converged audio recognition neural network model.
According to the embodiment, the requirement identification audio information corresponding to at least one application scene is obtained, a plurality of requirement identification audio information are associated with the key audio labels to obtain a key audio sample set, a plurality of environmental noise information are associated with the interference audio labels to obtain an interference audio sample set, and the preset neural network model is trained through the key audio sample set and the interference audio sample set, so that the convergent audio identification neural network model can be accurately and efficiently trained, the follow-up process of accurately identifying whether the preset key audio information exists in the environmental audio information through the convergent audio identification neural network model is facilitated.
Example III
The embodiment of the invention also provides an audio processing device, which comprises:
the recognition module is used for dynamically collecting external environment audio information and recognizing whether preset key audio information exists in the environment audio information or not through a converged audio recognition neural network model;
the compensation module is used for carrying out compensation adjustment on acoustic parameters of the key audio information to obtain key audio enhancement information;
and the output module is used for outputting the key audio enhancement information.
Optionally, the compensation module is further configured to:
acquiring a current concentration factor of a user on an augmented reality environment presented in the head-mounted display device;
determining a current environmental audio loss level associated with the current concentration factor, wherein the higher the current concentration factor is, the higher the associated current environmental audio loss level is;
inquiring to obtain a sound source space position deviation value and/or an audio intensity loss value of the current environment audio loss level mapping from a preset loss level mapping relation;
and compensating and adjusting acoustic parameters of the key audio information according to the mapped sound source space position deviation value and/or the audio intensity loss value.
Optionally, the compensation module is further configured to:
determining the beam phase displacement of the key audio information according to the mapped sound source space position deviation value and/or determining the beam amplitude loss of the key audio information according to the mapped audio intensity loss value;
determining audio parameter compensation information of the key audio information according to the beam phase displacement and/or the beam amplitude loss;
and according to the audio parameter compensation information, carrying out compensation adjustment on acoustic parameters on the key audio information so as to compensate beam phase displacement and/or beam amplitude loss of the key audio information.
Optionally, the compensation module is further configured to:
playing preset virtual test audio, wherein the sound source space position of the virtual test audio is a preset space position;
outputting a preset guiding interface for guiding a user to judge the sound source space position of the virtual test audio;
acquiring azimuth information input by a user in response to the preset guide interface, comparing the azimuth information with the preset space azimuth, and determining the key audio resolution of the user according to a comparison result;
determining the perception sensitivity of a user to the key audio information according to the key audio resolution, and selecting a concentration mapping gradient matched with the perception sensitivity from a preset mapping gradient database according to the perception sensitivity, wherein the concentration mapping gradient comprises a plurality of concentration coefficients and an environmental audio loss level associated with each concentration coefficient;
The step of determining the current environmental audio loss level associated with the current concentration factor comprises:
and determining the current environmental audio loss level associated with the current concentration coefficient according to the matched concentration mapping gradient.
Optionally, the compensation module is further configured to:
detecting current user physiological characteristic information and device use state information, wherein the user physiological characteristic information comprises at least one of pupil size, blink frequency, heart rate, respiratory rate and body temperature, and the device use state information comprises at least one of use duration, motion state, electricity consumption rate and current running application program of the head-mounted display device;
and determining the current concentration coefficient of the user to the augmented reality environment presented in the head-mounted display device according to the physiological characteristic information of the user and the use state information of the device.
Optionally, the compensation module is further configured to:
acquiring a preset concentration recognition neural network model;
inputting the physiological characteristic information of the user and the equipment use state information into the concentration recognition neural network model, and predicting to obtain the current concentration coefficient of the user for the augmented reality environment presented in the head-mounted display equipment.
Optionally, the audio processing device further comprises a training module, wherein the training module is used for:
acquiring demand identification audio information corresponding to at least one application scene;
correlating the plurality of the requirement identification audio information with a key audio tag to obtain a key audio sample set, and correlating the plurality of environmental noise information with an interference audio tag to obtain an interference audio sample set, wherein the environmental noise information does not contain the requirement identification audio information;
training a preset neural network model through the key audio sample set and the interference audio sample set to obtain a converged audio recognition neural network model.
Optionally, the output module is further configured to:
and displaying the sound source space position corresponding to the key audio information on a display interface of the head-mounted display device in a mode of marking on a radar chart or an azimuth scale.
The audio processing device provided by the embodiment of the invention adopts the audio processing method in the first embodiment or the second embodiment, and can solve the technical problem that the user cannot timely distinguish the external environment condition because the user cannot be effectively prompted with external key environment information in the process of using the head-mounted display device. Compared with the prior art, the audio processing device provided by the embodiment of the invention has the same beneficial effects as the audio processing method provided by the embodiment, and other technical features in the audio processing device are the same as the features disclosed by the method of the embodiment, and are not described in detail herein.
Example IV
An embodiment of the present invention provides a head-mounted display device, including: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor, so that the at least one processor can execute the audio processing method in the first embodiment.
Referring now to fig. 7, a schematic diagram of a head mounted display device suitable for use in implementing embodiments of the present disclosure is shown. The head mounted display device in the embodiments of the present disclosure may be an earphone or a head mounted display device, or the like. Among other head mounted display devices, the head mounted display device includes, but is not limited to, a Mixed Reality (Mixed Reality) -MR device (e.g., MR glasses or MR helmets), an augmented Reality (Augmented Reality) -AR device (e.g., AR glasses or AR helmets), a virtual Reality (virtual Reality) -VR device (e.g., VR glasses or VR helmets), an augmented Reality (Extended Reality) -XR device, or some combination thereof, and the like. The head mounted display device shown in fig. 7 is only one example and should not impose any limitations on the functionality and scope of use of the disclosed embodiments.
As shown in fig. 7, the head mounted display device may include a processing means 1001 (e.g., a central processor, a graphics processor, etc.) which may perform various appropriate actions and processes according to a program stored in a read only memory (ROM 1002) or a program loaded from a storage means into a random access memory (RAM 1004). In the RAM1004, various programs and data required for the operation of the head mounted display device are also stored. The processing device 1001, the ROM1002, and the RAM1004 are connected to each other by a bus 1005. An input/output (I/O) interface is also connected to bus 1005.
In general, the following systems may be connected to the I/O interface 1006: input devices 1007 including, for example, a touch screen, touchpad, keyboard, mouse, image sensor, microphone, accelerometer, gyroscope, and the like; an output device 1008 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage device 1003 including, for example, a magnetic tape, a hard disk, and the like; and communication means 1009. The communication means 1009 may allow the head mounted display device to communicate wirelessly or by wire with other devices to exchange data. While a head mounted display device having various systems is shown in the figures, it should be understood that not all of the illustrated systems are required to be implemented or provided. More or fewer systems may alternatively be implemented or provided.
In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network through a communication device, or installed from the storage device 1003, or installed from the ROM 1002. The above-described functions defined in the method of the embodiment of the present disclosure are performed when the computer program is executed by the processing device 1001.
The head-mounted display device provided by the invention can solve the technical problem that the user cannot timely distinguish the external environment condition due to the fact that the user cannot be effectively prompted with external key environment information in the process of using the head-mounted display device by adopting the audio processing method in the embodiment. Compared with the prior art, the beneficial effects of the head-mounted display device provided by the embodiment of the invention are the same as those of the audio processing method provided by the embodiment, and other technical features of the head-mounted display device are the same as those disclosed by the method of the previous embodiment, so that details are not repeated.
It should be understood that portions of the present disclosure may be implemented in hardware, software, firmware, or a combination thereof. In the description of the above embodiments, particular features, structures, materials, or characteristics may be combined in any suitable manner in any one or more embodiments or examples.
The foregoing is merely illustrative of the present invention, and the present invention is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
Example five
An embodiment of the present invention provides a computer-readable storage medium having computer-readable program instructions stored thereon for performing the audio processing method in the above-described embodiment.
The computer readable storage medium according to the embodiments of the present invention may be, for example, a usb disk, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In this embodiment, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, or device. Program code embodied on a computer readable storage medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.
The above-described computer-readable storage medium may be embodied in a head-mounted display device; or may be present alone without being fitted into the head mounted display device.
The computer-readable storage medium carries one or more programs that, when executed by the head-mounted display device, cause the head-mounted display device to: dynamically acquiring external environment audio information, and identifying whether preset key audio information exists in the environment audio information through a converged audio identification neural network model; if the key audio information exists, compensating and adjusting acoustic parameters of the key audio information to obtain key audio enhancement information; and outputting the key audio enhancement information.
Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present disclosure may be implemented in software or hardware. Wherein the name of the module does not constitute a limitation of the unit itself in some cases.
The computer readable storage medium provided by the invention stores the computer readable program instructions for executing the audio processing method, and can solve the technical problem that the user cannot timely distinguish the external environment condition because the user cannot be effectively prompted with external key environment information in the process of using the head-mounted display device. Compared with the prior art, the beneficial effects of the computer readable storage medium provided by the embodiment of the present invention are the same as those of the audio processing method provided by the first embodiment or the second embodiment, and are not described herein.
Example six
The embodiments of the invention also provide a computer program product comprising a computer program which, when executed by a processor, implements the steps of the audio processing method as described above.
The computer program product provided by the application can solve the technical problem that in the process of using the head-mounted display device, external key environment information cannot be effectively prompted to a user, so that the user cannot timely distinguish the external environment condition. Compared with the prior art, the beneficial effects of the computer program product provided by the embodiment of the present invention are the same as those of the audio processing method provided by the first embodiment or the second embodiment, and are not described herein.
The foregoing description is only of the preferred embodiments of the present application and is not intended to limit the scope of the claims, and all equivalent structures or equivalent processes using the descriptions and drawings of the present application, or direct or indirect application in other related technical fields are included in the scope of the claims.

Claims (10)

1. An audio processing method, wherein the audio processing method is applied to a head-mounted display device, the method comprising:
dynamically acquiring external environment audio information, and identifying whether preset key audio information exists in the environment audio information through a converged audio identification neural network model;
if the key audio information exists, compensating and adjusting acoustic parameters of the key audio information to obtain key audio enhancement information;
and outputting the key audio enhancement information.
2. The audio processing method of claim 1, wherein the step of performing the compensatory adjustment of the acoustic parameters to the critical audio information comprises:
acquiring a current concentration factor of a user on an augmented reality environment presented in the head-mounted display device;
determining a current environmental audio loss level associated with the current concentration factor, wherein the higher the current concentration factor is, the higher the associated current environmental audio loss level is;
Inquiring to obtain a sound source space position deviation value and/or an audio intensity loss value of the current environment audio loss level mapping from a preset loss level mapping relation;
and compensating and adjusting acoustic parameters of the key audio information according to the mapped sound source space position deviation value and/or the audio intensity loss value.
3. The audio processing method according to claim 2, wherein the step of performing compensation adjustment of acoustic parameters on the key audio information according to the mapped sound source spatial position deviation value and/or the audio intensity loss value comprises:
determining the beam phase displacement of the key audio information according to the mapped sound source space position deviation value and/or determining the beam amplitude loss of the key audio information according to the mapped audio intensity loss value;
determining audio parameter compensation information of the key audio information according to the beam phase displacement and/or the beam amplitude loss;
and according to the audio parameter compensation information, carrying out compensation adjustment on acoustic parameters on the key audio information so as to compensate beam phase displacement and/or beam amplitude loss of the key audio information.
4. The audio processing method of claim 2, wherein prior to the step of determining the current environmental audio loss level associated with the current concentration factor, the method further comprises:
Playing preset virtual test audio, wherein the sound source space position of the virtual test audio is a preset space position;
outputting a preset guiding interface for guiding a user to judge the sound source space position of the virtual test audio;
acquiring azimuth information input by a user in response to the preset guide interface, comparing the azimuth information with the preset space azimuth, and determining the key audio resolution of the user according to a comparison result;
determining the perception sensitivity of a user to the key audio information according to the key audio resolution, and selecting a concentration mapping gradient matched with the perception sensitivity from a preset mapping gradient database according to the perception sensitivity, wherein the concentration mapping gradient comprises a plurality of concentration coefficients and an environmental audio loss level associated with each concentration coefficient;
the step of determining the current environmental audio loss level associated with the current concentration factor comprises:
and determining the current environmental audio loss level associated with the current concentration coefficient according to the matched concentration mapping gradient.
5. The audio processing method of claim 2, wherein the step of obtaining the current concentration factor of the user for the augmented reality environment presented in the head mounted display device comprises:
Detecting current user physiological characteristic information and device use state information, wherein the user physiological characteristic information comprises at least one of pupil size, blink frequency, heart rate, respiratory rate and body temperature, and the device use state information comprises at least one of use duration, motion state, electricity consumption rate and current running application program of the head-mounted display device;
and determining the current concentration coefficient of the user to the augmented reality environment presented in the head-mounted display device according to the physiological characteristic information of the user and the use state information of the device.
6. The audio processing method of claim 5, wherein the step of determining a current concentration factor of the user for the augmented reality environment presented in the head mounted display device based on the user physiological characteristic information and the device usage state information comprises:
acquiring a preset concentration recognition neural network model;
inputting the physiological characteristic information of the user and the equipment use state information into the concentration recognition neural network model, and predicting to obtain the current concentration coefficient of the user for the augmented reality environment presented in the head-mounted display equipment.
7. The audio processing method of claim 1, wherein the method further comprises:
acquiring demand identification audio information corresponding to at least one application scene;
correlating the plurality of the requirement identification audio information with a key audio tag to obtain a key audio sample set, and correlating the plurality of environmental noise information with an interference audio tag to obtain an interference audio sample set, wherein the environmental noise information does not contain the requirement identification audio information;
training a preset neural network model through the key audio sample set and the interference audio sample set to obtain a converged audio recognition neural network model.
8. The audio processing method according to any one of claims 1 to 7, wherein after the step of outputting the key audio enhancement information, the method further comprises:
and displaying the sound source space position corresponding to the key audio information on a display interface of the head-mounted display device in a mode of marking on a radar chart or an azimuth scale.
9. A head-mounted display device, the head-mounted display device comprising:
at least one processor; the method comprises the steps of,
A memory communicatively coupled to the at least one processor; wherein,,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the steps of the audio processing method according to any one of claims 1 to 8.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a program that implements an audio processing method, the program implementing the audio processing method being executed by a processor to implement the steps of the audio processing method according to any one of claims 1 to 8.
CN202310091631.8A 2023-02-03 2023-02-03 Audio processing method, head-mounted display device, and computer-readable storage medium Pending CN116312620A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310091631.8A CN116312620A (en) 2023-02-03 2023-02-03 Audio processing method, head-mounted display device, and computer-readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310091631.8A CN116312620A (en) 2023-02-03 2023-02-03 Audio processing method, head-mounted display device, and computer-readable storage medium

Publications (1)

Publication Number Publication Date
CN116312620A true CN116312620A (en) 2023-06-23

Family

ID=86800384

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310091631.8A Pending CN116312620A (en) 2023-02-03 2023-02-03 Audio processing method, head-mounted display device, and computer-readable storage medium

Country Status (1)

Country Link
CN (1) CN116312620A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117998274A (en) * 2024-04-07 2024-05-07 腾讯科技(深圳)有限公司 Audio processing method, device and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117998274A (en) * 2024-04-07 2024-05-07 腾讯科技(深圳)有限公司 Audio processing method, device and storage medium
CN117998274B (en) * 2024-04-07 2024-06-25 腾讯科技(深圳)有限公司 Audio processing method, device and storage medium

Similar Documents

Publication Publication Date Title
US10362432B2 (en) Spatially ambient aware personal audio delivery device
US20210021946A1 (en) Capturing Audio Impulse Responses of a Person with a Smartphone
US10154360B2 (en) Method and system of improving detection of environmental sounds in an immersive environment
CN105700686B (en) Control method and electronic equipment
KR20200085030A (en) Processing Method of Audio signal and electronic device supporting the same
KR20180062174A (en) Method for Producing Haptic Signal and the Electronic Device supporting the same
WO2015163031A1 (en) Information processing device, information processing method, and program
JP2022518883A (en) Generating a modified audio experience for audio systems
TW201820315A (en) Improved audio headset device
KR20190061681A (en) Electronic device operating in asscociated state with external audio device based on biometric information and method therefor
US11482237B2 (en) Method and terminal for reconstructing speech signal, and computer storage medium
JP2018078398A (en) Autonomous assistant system using multifunctional earphone
CN111338474B (en) Virtual object pose calibration method and device, storage medium and electronic equipment
US20230345196A1 (en) Augmented reality interaction method and electronic device
CN116312620A (en) Audio processing method, head-mounted display device, and computer-readable storage medium
Sodnik et al. Spatial auditory human-computer interfaces
CN112506336A (en) Head mounted display with haptic output
KR20200120105A (en) Electronic device and method for providing information to relieve stress thereof
CN109257490A (en) Audio-frequency processing method, device, wearable device and storage medium
KR20190090281A (en) Electronic device for controlling sound and method for operating thereof
CN112614507A (en) Method and apparatus for detecting noise
CN112927718B (en) Method, device, terminal and storage medium for sensing surrounding environment
WO2023142266A1 (en) Remote interaction method, remote interaction device and computer storage medium
CN115967887A (en) Method and terminal for processing sound image direction
CN112203250A (en) Intelligent glasses control method and device for monitoring sitting posture, storage medium and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination