WO2022143322A1 - 一种增强现实交互方法及电子设备 - Google Patents

一种增强现实交互方法及电子设备 Download PDF

Info

Publication number
WO2022143322A1
WO2022143322A1 PCT/CN2021/140320 CN2021140320W WO2022143322A1 WO 2022143322 A1 WO2022143322 A1 WO 2022143322A1 CN 2021140320 W CN2021140320 W CN 2021140320W WO 2022143322 A1 WO2022143322 A1 WO 2022143322A1
Authority
WO
WIPO (PCT)
Prior art keywords
real
virtual object
pose information
electronic device
sound source
Prior art date
Application number
PCT/CN2021/140320
Other languages
English (en)
French (fr)
Inventor
马家威
张文洋
李龙华
张国荣
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP21914061.3A priority Critical patent/EP4254353A4/en
Publication of WO2022143322A1 publication Critical patent/WO2022143322A1/zh
Priority to US18/344,299 priority patent/US20230345196A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/006Mixed reality
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • H04S7/302Electronic adaptation of stereophonic sound system to listener position or orientation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • G06F3/012Head tracking input arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/03Arrangements for converting the position or the displacement of a member into a coded form
    • G06F3/0304Detection arrangements using opto-electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/167Audio in a user interface, e.g. using voice commands for navigating, audio feedback
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • H04S7/302Electronic adaptation of stereophonic sound system to listener position or orientation
    • H04S7/303Tracking of listener position or orientation
    • H04S7/304For headphones
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/165Management of the audio stream, e.g. setting of volume, audio stream path
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/11Positioning of individual sound objects, e.g. moving airplane, within a sound field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • H04S7/302Electronic adaptation of stereophonic sound system to listener position or orientation
    • H04S7/303Tracking of listener position or orientation

Definitions

  • the present application relates to the field of terminal technologies, and in particular, to an augmented reality (AR) interaction method and electronic device.
  • AR augmented reality
  • AR technology can realize the integration of the virtual world and the real world, but how to improve the realism of virtual objects has rarely been paid attention to. Therefore, the current virtual characters generated by AR technology are not very realistic, resulting in poor user experience.
  • the present application provides an AR interaction method and electronic device, which are used to improve the authenticity of virtual objects and improve user experience.
  • an embodiment of the present application provides an AR interaction method, and the method includes the following steps:
  • the electronic device determines the pose information of the real object and/or the position of the real sound source in the real scene, wherein the pose information of the real object is used to represent the position and the pose of the real object;
  • the pose information of the real object and/or the position of the real sound source determine the pose information of the virtual object, wherein the pose information of the virtual object is used to represent the position and attitude of the virtual object; finally, the The electronic device generates an AR image including the virtual object according to the pose information of the virtual object; displays the AR image; and/or the electronic device generates an AR image including the virtual object according to the pose information of the real object and/or the electronic device according to the pose information of the real object and the virtual object
  • the pose information of the virtual object is generated, and the 3D 3D audio data of the virtual object is generated; and the 3D audio data is played.
  • the virtual object can perceive the pose of the real object and the position of the real sound source, and make corresponding action or sound responses based on the above information, so that the virtual object can be like a real object. Therefore, the method realizes the virtual-real interaction in which the visual and auditory of the virtual object and the real object are combined, which can improve the intelligence level of the virtual object, thereby improving the experience of virtual interaction.
  • the electronic device may also establish a world coordinate system with the electronic device as the origin of coordinates; in this design, the pose information of the real object is specifically used to represent that the real object is in The position and attitude in the world coordinate system; the position of the real sound source is the position of the real sound source in the world coordinate system; the pose information of the virtual object is specifically used to represent the virtual object position and pose in the world coordinate system.
  • the accuracy of the pose information of the real object, the position of the real sound source, and the pose information of the virtual object determined by the electronic device can be improved.
  • the electronic device may establish a world coordinate system with the electronic device as the origin of coordinates through the following steps:
  • the electronic device acquires the first real scene image collected by the camera, and acquires the attitude information of the electronic device measured by the inertial measurement unit; then the electronic device obtains the first real scene image and the attitude of the electronic device according to the first real scene image Information, using a set coordinate system construction algorithm (such as simultaneous localization and mapping (simultaneous localization and mapping, SLAM) algorithm) to establish the world coordinate system.
  • a set coordinate system construction algorithm such as simultaneous localization and mapping (simultaneous localization and mapping, SLAM) algorithm
  • the electronic device may determine the pose information of the real object in the real scene through the following steps:
  • the electronic device acquires the second real scene image collected by the camera, and identifies the key parts of the real object in the second real scene image; then, the electronic device constructs the SLAM point cloud collision technology according to the synchronous positioning and the map, and determines the the pose information of the key part (for example, the head) of the real object in the second real scene image; finally, the electronic device uses the pose information of the key part of the real object as the pose of the real object information.
  • the electronic device can use the pose information of the key parts of the real object as the pose information of the real object.
  • the electronic device may determine the position of the real sound source in the real scene by performing the following steps:
  • the electronic device acquires the real scene sound information collected by the microphone, and determines the positional relationship between the real sound source and the microphone through sound source localization; then, the electronic device can The positional relationship between them determines the position of the real sound source in the real scene. For example, the position of the real sound source in the world coordinate system is determined according to the positional relationship between the real sound source and the microphone.
  • the electronic device may determine the pose information of the virtual object according to the pose information of the real object and/or the position of the real sound source, and at least one of the following: The action of the real object identified by the pose information of the object, the voice command issued by the real sound source, the image model, action feature, action response package in the virtual object model corresponding to the virtual object, and the user through the display screen touch operation on the virtual object.
  • the pose information of the real object determined by the electronic device may be regarded as the pose information of the real object seen by the virtual object, and the position of the real sound source may be regarded as the real sound source heard by the virtual object s position.
  • the electronic device can determine the position of the virtual object by using the pose of the real object and the position of the real sound source perceived by the virtual object, so that the virtual object can perceive the pose of the real object and the real sound source according to the real object. position and respond accordingly, so that virtual objects can be like real objects.
  • the electronic device may also consider other factors, so as to improve the flexibility of determining the pose information of the virtual object.
  • the electronic device may generate 3D audio data of the virtual object according to the pose information of the real object and the pose information of the virtual object through the following steps:
  • the electronic device respectively determines the distance between the ears of the virtual object and the real object according to the pose information of the real object and the pose information of the virtual object; According to the distance between the virtual object and the ears of the real object, the volume difference and time difference between the ears of the real object are calculated; finally, the electronic device calculates the volume difference and time difference of the real object according to the Raw sound data of virtual objects to generate 3D audio data.
  • the real device can generate 3D audio data according to the volume difference and time difference of the real object, so that the authenticity of the 3D sound produced by playing the 3D audio data can be improved.
  • the original sound data is set or determined according to at least one of the following:
  • the position and attitude information of the real object, the pose information of the virtual object, the motion of the real object recognized by the pose information of the real object, the sound command issued by the real sound source, the virtual object The image model, action feature, action response package in the virtual object model corresponding to the object, and the user's touch operation on the virtual object through the display screen.
  • the electronic device may further filter the 3D audio data according to the pose information of the real object, so as to simulate the 3D audio data Reflection and refraction during transmission, thus making the 3D sound heard by real objects more realistic.
  • an embodiment of the present application further provides an electronic device, including a unit or a module for performing each step of the above-mentioned first aspect.
  • the present application provides an electronic device, comprising at least one processing element and at least one storage element, wherein the at least one storage element is used for storing programs and data, and the at least one processing element is used for executing in the memory Stored programs so that each design provided in the first aspect of the present application can be implemented.
  • the electronic device may further include components such as a display screen, a camera, and an audio circuit.
  • an embodiment of the present application further provides a computer storage medium, where a software program is stored in the storage medium, and when the software program is read and executed by one or more processors, it can realize the design features of the first aspect. provided method.
  • the embodiments of the present application further provide a computer program containing instructions, when the instructions are run on a computer, the computer can execute the method provided by each design in the above-mentioned first aspect.
  • an embodiment of the present application further provides a chip system, where the chip system includes a processor for supporting an electronic device to implement the functions involved in each design in the first aspect above.
  • the chip system further includes a memory for storing necessary program instructions and data of the electronic device.
  • the chip system may be composed of chips, or may include chips and other discrete devices.
  • FIG. 1 is a schematic diagram of virtual-real audio-visual interaction provided by an embodiment of the present application
  • FIG. 2 is a structural diagram of an electronic device provided by an embodiment of the present application.
  • FIG. 3 is a software architecture diagram of an electronic device provided by an embodiment of the present application.
  • FIG. 4 is an implementation block diagram of an AR interaction method provided by an embodiment of the present application.
  • 5A is a schematic diagram of a world coordinate system of an electronic device according to an embodiment of the present application.
  • 5B is a simulation diagram of a SLAM point cloud collision result provided by an embodiment of the application.
  • FIG. 6 is a schematic diagram of a virtual-real audio-visual interaction provided by an embodiment of the present application.
  • FIG. 7 is an AR interaction flowchart provided by an embodiment of the present application.
  • FIG. 8 is another AR interaction flowchart provided by an embodiment of the present application.
  • FIG. 9 is a schematic diagram of an AR interaction example provided by an embodiment of the present application.
  • FIG. 10 is a flowchart for implementing an AR interaction service provided by an embodiment of the present application.
  • FIG. 11 is another AR interaction flowchart provided by an embodiment of the present application.
  • FIG. 12 is still another AR interaction flowchart provided by the embodiment of the present application.
  • FIG. 13 is a structural diagram of another electronic device provided by an embodiment of the application.
  • FIG. 14 is a structural diagram of still another electronic device provided by an embodiment of the present application.
  • the present application provides an AR interaction method and electronic device, which are used to improve the authenticity of virtual objects and improve user experience.
  • the method and the electronic device are based on the same technical concept. Since the principles of the method and the electronic device to solve the problem are similar, the implementation of the electronic device and the method can be referred to each other, and the repetition will not be repeated.
  • the electronic device can be a mobile phone, a tablet computer, a notebook computer, a netbook, a vehicle-mounted device, a smart home device (such as a smart TV), and a business intelligent terminal (including: video phone, conference desktop smart terminal, etc.), personal digital Assistant (personal digital assistant, PDA), augmented reality (augmented reality, AR) ⁇ virtual reality (virtual reality, VR) device, etc.
  • PDA personal digital assistant
  • augmented reality augmented reality, AR
  • VR virtual reality
  • the specific form of the electronic device is not limited in this application.
  • 3D display uses the parallax of the user's eyes to construct a 3D look and feel. Similar to the concept of parallax, the user's two ears can use the volume difference and time difference between the sound from the same sound source to reach the two ears to perform the most basic sound source localization.
  • the volume difference is the volume difference between the same sound reaching both ears of the user.
  • the user can perceive whether the position of the sound source is on the left or right according to the volume difference.
  • the user cannot accurately locate the sound source simply by relying on the difference in volume. Therefore, the user also needs to locate the sound source through the time difference, which is the HAAS effect.
  • the condition for realizing the Haas effect is that the time difference between the same sound reaching the ears of the user needs to be within 30ms, otherwise two sound images will be generated in the human brain to form an echo.
  • a virtual object is a kind of lifelike thing in the virtual world with the sense of sight, hearing, force, touch, movement, etc., which is generated by using AR technology and computer and other equipment.
  • the virtual objects can be characters, animals, scenes, still lifes, and texts, etc.; they can generate various animation effects, such as posture changes, sound, motion expressions, interaction with real objects, and so on.
  • Each virtual object is generally generated by a computer according to a corresponding virtual object model.
  • Each virtual object model can not only set the image of the corresponding virtual object, but also set its sound characteristics, action characteristics, etc., as well as content such as sound response packets and action response packets.
  • the sound response of the virtual object is to make a sound in response to the real object in the real world, which is also called the auditory response;
  • the action response of the virtual object is to make an action in response to the real object in the real world. , also known as visual response.
  • the virtual object can react (action response and/or sound response) to the real object based on the listening mode.
  • the monitoring mode may specifically include: visual monitoring and/or auditory monitoring.
  • the visual monitoring that is, the virtual object can respond according to the position and posture information of the real object within its field of view (the field of view of the electronic device's camera).
  • Auditory monitoring that is, the virtual object can react according to the sound of the real scene (the sound information of the real scene obtained through the microphone).
  • Real objects are things that actually exist in the real world.
  • the real object may also be a character, an action, a scene, a scene, or the like.
  • a real object emits sound it can be used as a real sound source.
  • multiple refers to two or more. At least one means one and more.
  • AR technology uses a computer to generate a realistic virtual world (virtual environment), which can immerse the user in the virtual world and realize the natural interaction between the user and the virtual world.
  • AR technology mainly focuses on how to improve the integration of the virtual world and the real world, but lacks attention to improving the reality of virtual objects in the virtual world.
  • the first real object may directly respond (action response and/or sound response) when seeing or hearing the second real object.
  • the second reality object can also make an action response and/or a sound response according to the action response and/or sound response of the first reality object .
  • the virtual object in order to improve the realism of the virtual object, in the embodiment of the present application, can be like a real object, and the above-mentioned audio-visual interaction can be realized with the real object, as shown in FIG. 1 .
  • the real object can not only see and hear the virtual object, but the virtual object can also see and/or hear the real object, and can make some corresponding action response and/or sound response, so that the real object or the user can perceive
  • To virtual objects can have audio-visual functions.
  • virtual objects can see and/or hear real objects just like real objects, and make some corresponding action responses and/or sound responses, so that the authenticity of the virtual objects can be improved, It enables users to immerse themselves in the virtual world and ultimately improves the user experience.
  • the AR interaction method provided in this embodiment of the present application may be applied to the electronic device as shown in FIG. 2 , and the structure of the electronic device will be described below first.
  • Figure 2 shows one possible structure of an electronic device.
  • the electronic device 200 includes: a communication unit 201 , a processor 202 , a memory 203 , a display unit 204 , an input unit 205 , an audio circuit 206 , a sensor 207 , a camera 208 and other components. Each component of the electronic device 200 will be described in detail below with reference to FIG. 2 .
  • the communication unit 201 is used to realize the functions of the electronic device 200 and realize data communication with other devices.
  • the communication unit 201 may include a wireless communication module 2011 and a mobile communication module 2012 .
  • the electronic device 200 also needs to cooperate with components such as an antenna, a modem processor and a baseband processor in the processor 202 to implement a communication function.
  • the wireless communication module 2011 can provide applications on electronic devices including wireless local area networks (WLAN) (such as wireless fidelity (Wi-Fi) networks), Bluetooth (BT), global navigation satellite system (global navigation satellite system, GNSS), frequency modulation (frequency modulation, FM), near field communication technology (near field communication, NFC), infrared technology (infrared, IR) and other wireless communication solutions.
  • WLAN wireless local area networks
  • BT Bluetooth
  • GNSS global navigation satellite system
  • frequency modulation frequency modulation
  • FM near field communication technology
  • NFC near field communication technology
  • IR infrared technology
  • the wireless communication module 2011 may be one or more devices integrating at least one communication processing module.
  • the wireless communication module 2011 receives electromagnetic waves via an antenna, performs signal frequency modulation and filtering processing on the electromagnetic waves, and sends the processed signals to the processor 202 .
  • the wireless communication module 2011 can also receive the signal to be sent from the processor 202, perform frequency modulation, amplify, and radiate out the electromagnetic wave through the antenna. Through the wireless communication module 2011, the electronic device 200 can be connected to some peripheral devices, such as connecting to an access point, connecting to a wireless headset or a wireless audio system.
  • the mobile communication module 2012 can provide mobile communication solutions including 2G/3G/4G/5G etc. applied on the electronic device.
  • the mobile communication module 2012 may include at least one filter, switch, power amplifier, low noise amplifier (LNA), and the like.
  • the mobile communication module 2012 can receive electromagnetic waves through the antenna, filter, amplify, etc. the received electromagnetic waves, and transmit them to the modulation and demodulation processor for demodulation.
  • the mobile communication module 2012 can also amplify the signal modulated by the modulation and demodulation processor, and then convert it into electromagnetic waves and radiate it out through the antenna.
  • at least part of the functional modules of the mobile communication module 2012 may be provided in the processor 202 .
  • at least part of the functional modules of the mobile communication module 2012 may be provided in the same device as at least part of the modules of the processor 202 .
  • the electronic device 200 can establish a wireless connection with a base station in a mobile communication system according to the mobile communication module 2012 , and receive services of the mobile communication system through the mobile communication module 2012 .
  • the communication unit 201 may further include a communication interface for physically connecting the electronic device 200 with other devices.
  • the communication interface may be connected with the communication interface of the other device through a cable to realize data transmission between the electronic device 200 and the other device.
  • the electronic device 200 can be connected to devices such as earphones, audio devices, and the like.
  • the memory 203 can be used to store software programs and data.
  • the processor 202 executes various functions and data processing of the electronic device 200 by running software programs and data stored in the memory 203 .
  • the software program may include an AR application program implementing the AR interaction method.
  • the memory 203 may mainly include a program storage area and a data storage area.
  • the storage program area may store an operating system, various software programs, etc.; the storage data area may store user input or data created by the electronic device 200 in the process of running the software program, and the like. Wherein, the operating system can be Wait.
  • the memory 203 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.
  • the AR application that implements the control method may be stored in the stored program area, and data such as the virtual object model, the position and attitude of the virtual object, and the position and attitude of the real object may be stored in the storage data area middle.
  • the input unit 205 can be used to receive character information and signals input by the user.
  • the input unit 205 may include a touch panel 2051 and other input devices (eg, function keys).
  • the touch panel 2051 also referred to as a touch screen, can collect user touch operations on or near it, generate corresponding touch information and send it to the processor 202, so that the processor 202 executes commands corresponding to the touch information.
  • the touch panel 2051 can be implemented by various types of resistive, capacitive, infrared, and surface acoustic waves.
  • the display unit 204 is used for presenting a user interface and realizing human-computer interaction.
  • the display unit 204 can display information input by the user or information provided to the user, as well as various menus of the electronic device 200, various main interfaces (including icons of various applications), windows of various applications, Images captured by cameras, AR images containing virtual objects generated by AR applications, etc.
  • the display unit 204 may include a display panel 2041.
  • the display panel 2041 is also called a display screen. configuration.
  • the touch panel 2051 can cover the display panel 2041, although in FIG. 2 , the touch panel 2051 and the display panel 2041 are implemented as two independent components to realize the electronic device 200, but in this embodiment of the present application, the touch panel 2051 may be integrated with the display panel 2041 (ie, a touch display screen) to implement the input and output functions of the electronic device 200.
  • the processor 202 is the control center of the electronic device 200, using various interfaces and lines to connect various components, by running or executing the software programs and/or modules stored in the memory 203, and calling
  • the data in the memory 203 executes various functions of the electronic device 200 and processes data, thereby realizing various services of the electronic device 200 .
  • the processor 202 may run an AR application program stored in the memory 203 to implement the AR interaction method provided by the embodiments of the present application.
  • the processor 202 may include one or more processing units.
  • the processor 202 may integrate an application processor, a modem processor, a baseband processor, a graphics processor (graphics processing unit, GPU), etc., wherein the application processor mainly processes an operating system, a user interface, an application program, and the like,
  • the modem processor mainly handles wireless communication. It can be understood that, the above-mentioned modulation and demodulation processor may not be integrated into the processor 202.
  • the audio circuit 206 may provide an audio interface between the user and the electronic device 200 .
  • the audio circuit 206 can transmit the electrical signal converted from the received audio data to the speaker 2061, and the speaker 2061 converts it into a sound signal for output.
  • the microphone 2062 converts the collected sound signals into electrical signals, which are received by the audio circuit 206 and then converted into audio data for further processing such as transmission or storage.
  • the electronic device 200 can collect the sound signal emitted by the sound source through the microphone 2062, so that the sound source localization can be performed according to the collected sound signal.
  • the electronic device 200 may also output the 3D audio data through the speaker 2061 after generating the 3D audio data.
  • the electronic device 200 may also include one or more sensors 207, such as light sensors, motion sensors, ultrasonic sensors, and other sensors.
  • the electronic device 200 can implement various functions according to the real-time sensor data collected by the sensor 207 .
  • the motion sensor may include an inertial measurement unit (inertial measurement unit, IMU).
  • IMU inertial measurement unit
  • the IMU is a device for measuring the attitude information of the electronic device 200 .
  • the posture information of the electronic device 200 is used to represent the motion posture of the electronic device 200 , and may specifically include: posture angles and accelerations of the electronic device 200 on three direction axes.
  • the IMU may include three acceleration sensors and three gyroscopes. Each accelerometer is used to measure the acceleration of one direction axis, and each gyroscope is used to measure the attitude angle of one direction axis.
  • the electronic device 200 may further include at least one camera 208 to capture images of real scenes.
  • the cameras 208 include a front camera located on the front of the electronic device 200 and a rear camera located on the back of the electronic device 200 .
  • the structure of the electronic device 200 shown in FIG. 2 does not constitute a limitation on the electronic device applicable to the present application, and the electronic device applicable to the embodiment of the present application may include more or less components than those shown in the drawings , or a combination of certain components, or a different arrangement of components.
  • the software system of the electronic device provided by this application may adopt a layered architecture, an event-driven architecture, a microkernel architecture, a microservice architecture, or a cloud architecture.
  • the embodiments of the present application take an Android system with a layered architecture as an example to illustrate the software structure of an electronic device.
  • FIG. 3 shows a block diagram of a software structure of an electronic device provided by an embodiment of the present application.
  • the software structure of the electronic device can be a layered architecture, for example, the software can be divided into several layers, and each layer has a clear role and division of labor. Layers communicate with each other through software interfaces.
  • the Android system is divided into four layers, which are, from top to bottom, an application layer, a framework layer (framework, FWK), an Android runtime (Android runtime) and a system library, and a kernel layer.
  • the application layer can include a series of applications. As shown in FIG. 3, the application layer may include a camera application, a voice assistant application, an AR application, a music application, a video application, a map application, a third-party application, and the like.
  • the third-party applications may include WeChat applications, Aiqi applications, and the like.
  • the framework layer provides an application programming interface (API) and a programming framework for applications in the application layer.
  • the application framework layer can include some predefined functions. As shown in FIG. 3, the application framework layer may include: system service (System Service), view system (View System), web page service (Web Service), phone manager, resource manager and so on.
  • the system service may include a window manager service (WMS) and an activity manager service (activity manager service, AMS).
  • WMS window manager service
  • AMS activity manager service
  • the system service may further include a system-level service—the AR management service. Each service in the system service is described below.
  • the window management service provides window management services for windows, specifically controlling the display and hiding of all windows and the position of windows on the display screen.
  • the window management service can specifically be responsible for the following functions: 1. Allocate a display plane (surface) for each window; 2. Manage the display order, size, and position of the surface; 3. By calling management functions (such as the surface control function (SurfaceControl.Transaction) ), adjust the transparency, stretch factor, position and size of the window to realize the animation effect of the window; 4.
  • management functions such as the surface control function (SurfaceControl.Transaction)
  • the electronic device can provide the user with a window management service. Appropriate window to display or handle this message.
  • the activity management service provides management services for the activities in the application.
  • the activity management service may, but is not limited to, be responsible for the following functions: 1. Unified scheduling of the life cycle of all applications' activities; 2. Start or end the process of the application; 3. Start and schedule the life cycle of the service; 4. Register the broadcast receiver (Broadcast Receiver), and receive and distribute broadcasts (Broadcast); 5. Query the current operating status of the system; 6. Schedule tasks (task).
  • the AR management service is used to implement the AR interaction method provided by the embodiments of this application, and provide AR interaction services.
  • the AR management service may, but is not limited to, be responsible for the following functions: 1. Constructing a world coordinate system; 2. Determining the position and attitude information of a real object in the world coordinate system; 3. Identifying the real sound source in the world coordinate system 4. Determine the position and attitude information of the virtual object in the world coordinate system; 5. According to the position and attitude information of the real object, the position of the real sound source, and at least one of the position and attitude information of the virtual object, Generate 3D audio data; 6. Generate an AR image containing the virtual object according to at least one of the position and attitude information of the real object, the position of the real sound source, and the position and attitude information of the virtual object.
  • the view system includes visual controls, such as controls for displaying text, controls for displaying pictures, and so on. View systems can be used to build applications.
  • An interface can consist of one or more controls.
  • the interface including the SMS notification icon may include controls for displaying text and controls for displaying pictures.
  • a web service is an API that can be called through a web page.
  • the phone manager is used to provide the communication function of the electronic device. For example, the management of call status (including connecting, hanging up, etc.).
  • the resource manager provides various resources for the application, such as localization strings, icons, pictures, layout files, video files and so on.
  • Android runtime includes core library (Kernel Library) and virtual machine.
  • Android runtime is responsible for scheduling and management of the Android system.
  • the core library consists of two parts: one part is the functional function that the java language needs to call, and the other part is the core library of the Android system, which is used to provide the Android system with Input/Output Service (Input/Output Service) and core service (Kernel Service) .
  • the application layer and framework layer can run in virtual machines.
  • the virtual machine executes the java files of the application layer and the framework layer as binary files.
  • the virtual machine is used to perform functions such as object lifecycle management, stack management, thread management, safety and exception management, and garbage collection.
  • a system library can include multiple functional modules. For example: virtual object model library, media library (media library), image processing library, etc.
  • Virtual object model library for managing multiple virtual object models. Each virtual object model is used to generate a virtual object. Each virtual object model can not only set the image of the corresponding virtual object, but also set its sound characteristics, action characteristics, etc., as well as auditory response packets, visual response packets and other contents.
  • the media library supports playback and recording of audio and video in multiple formats, and supports opening of still images in multiple formats.
  • the media library can support a variety of audio and video encoding formats, such as: MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, etc.
  • the kernel layer is the layer between hardware and software.
  • the kernel layer at least includes display drivers, sensor drivers, processor drivers, camera drivers, audio drivers, etc., which are used to drive the hardware in the hardware layer.
  • the hardware layer can include various sensors, displays, processors, input devices, memory, cameras, etc.
  • the embodiments of the present application provide an AR interaction method, which can realize the virtual-real audio-visual interaction as shown in FIG. 1 .
  • the solution will be described in detail below with reference to FIG. 4 based on the hardware structure of the electronic device shown in FIG. 2 and the software structure of the electronic device shown in FIG. 3 .
  • the software implementing the AR interaction method in the electronic device can be divided into the following modules: a real object position and attitude calculation module, a storage module, a virtual object model library, and an AR synthesis module .
  • the method also needs to be implemented with other devices inside or outside of some electronic devices.
  • these devices can be divided into two categories: acquisition devices and output devices.
  • the collection device may include: a camera, a motion sensor (only the IMU in the motion sensor is used as an example for description), and a microphone;
  • the output device may include: a display screen, a speaker or an earphone.
  • the camera is used to shoot a real scene to obtain an image of the real scene (hereinafter referred to as the real scene image).
  • the IMU is used to measure the posture information of the electronic device, wherein the posture information of the electronic device is used to characterize the motion posture of the electronic device, which may include: the posture of the electronic device on three direction axes angle and acceleration. Optionally, two of the three direction axes are orthogonal to form a world coordinate system.
  • the microphone is used to collect the sound in the real scene to obtain the sound information of the real scene.
  • the real object position and attitude calculation module specifically includes the following functions: acquiring the real scene image collected by the camera and acquiring the attitude information of the electronic device measured by the IMU;
  • the device is the world coordinate system of the coordinate origin; identify the real object in the real scene image, and determine the position of the real object in the world coordinate system (hereinafter referred to as the position of the real object) and attitude information;
  • the posture information of the real object is used to represent the motion posture of the real object, which may include: the real object on the three direction axes of the world coordinate system Attitude angle and acceleration.
  • the physical position of the real sound source can represent the positional relationship between the real sound source and the microphone (ie, the electronic device), for example, in which direction the real sound source is located in the electronic device, the real The distance between the sound source and the electronic device.
  • the real object position and attitude calculation module can use a synchronous positioning and map construction (simultaneous localization and mapping, SLAM) algorithm or other algorithms to construct the world coordinates according to the obtained real scene image and the attitude information of the electronic device.
  • SLAM synchronous positioning and map construction
  • the real object position and attitude calculation module may take a corner of the display screen of the electronic device as the coordinate origin, or the center point of one side of the display screen as the coordinate origin, or the center point of the display screen. is the origin of the coordinates, or the set position on the display screen is the origin of the coordinates, which is not limited in this embodiment of the present application.
  • the world coordinate system may be as shown in FIG. 5A , where the origin of the coordinates is the center point of one side of the display screen, the virtual world can be displayed in the display screen, and the real world can be displayed outside the display screen.
  • the real object position and attitude calculation module can determine the position and attitude information of the real object in various ways. Due to the large volume of the real object, in order to ensure the accuracy, the real object position and attitude calculation module can use the position and attitude information of the key parts in the real object as the position and attitude information of the real object.
  • the real object position and attitude calculation module can identify the key part of the real object in the real scene image, collide the key part in the real object with the SLAM point cloud, and calculate the position of the key part according to the collision value with gesture information.
  • the SLAM point cloud collision result is shown in FIG. 5B , where (a) in FIG. 5B is a real object in a real scene image, and (b) in FIG. 5B is a simulation diagram of the SLAM point cloud collision result.
  • the real object position and posture calculation module can identify the key parts of the real object in the real scene image, and determine the feature information of the key parts in the real object through image recognition technology; The feature information of the real object is determined, and the position and attitude information of the real object is determined.
  • the key part may be the head
  • the real object position and posture calculation module may identify the characteristic information of the human or animal head (for example, including eyes, nose, ears, mouth).
  • the real object position and posture calculation module may adopt various methods to identify the key parts of the real object in the real scene image.
  • the real object position and posture calculation module may recognize the head of the real object through face recognition technology or skeleton feature recognition technology.
  • the real object position and posture calculation module may use a 3D physical recognition technology to identify key parts of the real object in the real scene image.
  • the real object position and attitude calculation module can determine the positional relationship between the real sound source and the microphone (that is, the physical position of the real sound source) through the traditional sound source localization technology, and finally determine the real sound source.
  • the positional relationship between the source and the microphone is converted to the position of the real sound source in the world coordinate system.
  • the storage module is used for storing the position and attitude information of the real object calculated by the real object position and attitude calculation module; and storing the position of the real sound.
  • the virtual object module library is used to store at least one virtual object model.
  • Each virtual object model may include, but is not limited to, at least one of the following: an image model, a sound feature, an action feature, a sound response package, and an action response package.
  • the image model is used to set the image of the virtual object.
  • the sound feature is used to set the sound emitted by the virtual object, and may specifically include: volume, timbre, tone, and the like.
  • the action feature is used to set actions that the virtual object can perform, and may specifically include: action type, action range, and the like.
  • the sound response package contains the corresponding relationship between the sensing information of multiple virtual objects and the original sound to be emitted.
  • the action response packet contains the correspondence between the sensing information of multiple virtual objects and the actions to be made.
  • the sensing information of the virtual object may include, but is not limited to, the following two categories: visual sensing information and auditory sensing information.
  • the visual sensing information includes: the position and posture information of the real object calculated by the real object position and attitude calculation module, the action of the real object, and the like.
  • the auditory sensing information includes: the position of the real sound source, the sound command issued by the real sound source, etc.
  • the sensing information of the virtual object may further include touch operations performed by the user on the virtual object through the display screen, and these touch operations may also be referred to as tactile sensing information.
  • the voice command issued by the real voice may be obtained by voice recognition of real scene voice information through a voice recognition technology.
  • the following takes a virtual object model in which the virtual object is a person as an example to describe the virtual object model in detail:
  • the image model can be specifically set: gender, height, weight, body proportions, facial features, clothing features, and other visual features (position of facial spots) and so on.
  • the sound feature can specifically set the volume, timbre, and tone, so that the sound feature emitted by the virtual object conforms to the image of the virtual object.
  • the action feature can specifically set the basic actions that the virtual object can perform, such as walking, running, jumping, shaking hands, waving, and so on. Any one of these basic actions can independently become an action response of the virtual object, or can be repeated multiple times or combined with other basic actions to form an action response of the virtual object.
  • the sound response package can include: when the sensing information of the virtual object indicates that the real object walks in, the corresponding original sound is "Hello”; when the sensing information of the virtual object indicates that the real object leaves, the corresponding original sound is "Hello” Goodbye”; when the user clicks the virtual object through the display screen, the corresponding original sound is laughter.
  • the action response package may include: when the sensing information of the virtual object indicates that the real object walks in, the corresponding action is walking in - reaching out - shaking hands; when the sensing information of the virtual object indicates that the real object leaves, the corresponding action is lifting. Hand-waving; when the user clicks the virtual object through the display screen, the corresponding action is shaking.
  • the AR synthesis module is used to synthesize AR images and 3D audio data through AR technology, display the AR images through a display screen, and play the 3D audio data through speakers, headphones, and the like.
  • the AR synthesis module may specifically include an AR image generation module and a 3D sound generation module.
  • the AR image generation module can synthesize AR images through the following steps:
  • the AR image generation module determines the position and posture information of the virtual object.
  • the AR image generation module may determine the position and posture information of the virtual object in various ways.
  • the AR image generation module determines the position and posture information of the virtual object in real time according to at least one of the following:
  • the posture information of the virtual object is used to represent the motion posture of the virtual object, which may include: the three direction axes of the virtual object in the world coordinate system attitude angle and acceleration.
  • the position and posture information of the virtual object may be set by the electronic device, or set by the user, or set for the virtual object model.
  • the AR image generation module can synthesize an AR image according to the position and posture information of the virtual object. For this process, reference may be made to the traditional AR image synthesis technology, which will not be repeated here. Optionally, in this step, the AR image generation module may further synthesize the AR image by using a real scene image.
  • the 3D sound generation module can also specifically synthesize 3D audio data through the following steps:
  • the 3D sound generation module may determine the original sound data of the virtual object according to at least one of the following:
  • the 3D sound generation module generates 3D audio data according to the position and attitude information of the real object, the position and attitude information of the virtual object, and the original sound data.
  • step B2 may specifically include the following steps:
  • the 3D sound generation module calculates the distance between the ears of the virtual object and the real object according to the position and attitude information of the real object, and the position and attitude information of the virtual object;
  • the 3D sound generation module calculates the volume difference and the time difference between the ears of the real object according to the distance between the virtual object and the ears of the real object;
  • the 3D sound generation module generates 3D audio data according to the obtained binaural volume difference and time difference of the real object and the original sound data.
  • the 3D sound generation module may further filter the 3D audio data according to the pose information of the real object, so as to further improve the authenticity of the 3D sound.
  • the display screen in the output device is used for displaying AR images.
  • the speaker or earphone in the output device is used to play the 3D audio data, so that the user can hear the 3D sound and perceive the location of the virtual sound source (ie, the virtual object) through sound source localization.
  • the electronic device can acquire real-time or periodic images of the real scene, gesture information of the electronic device, and real-world sound information, so that the audio-visual responses of the virtual objects can be updated in real time or periodically, so that the virtual objects can Quickly make corresponding audio-visual responses according to the sensed real world, thereby improving the realism of virtual objects.
  • the real object position and attitude calculation module, storage module, and AR synthesis module in the above AR interaction scheme can pass through the system service in the electronic device framework layer.
  • the AR management service of the system is realized, and the virtual object model library in this solution can be realized by the virtual object model located in the system library.
  • the interaction between each module in this solution and the acquisition device and the output device can be realized through the client, such as the AR application in the application layer.
  • this solution can be realized by combining the background system service and the foreground client.
  • the AR interaction solution provided by the embodiments of the present application can realize virtual and real interaction combining visual and auditory, thereby bringing a better immersive experience to users of AR applications.
  • the virtual character can perceive the position and posture of the real object, as well as the position of the real sound source, and make corresponding action and sound responses based on the above information, thereby improving the intelligence of the virtual object. level, so as to improve the experience of virtual-real interaction.
  • the solution can be implemented using the architecture of the server and the client, making the client more operable, which is beneficial for developers to develop a variety of immersive AR applications.
  • the present application also provides multiple embodiments, which can respectively implement different virtual-real audio-visual interactions shown in FIG. 6 .
  • virtual-real audio-visual interactions such as virtual viewing of reality, real-to-virtual listening, and virtual-to-reality listening can be realized respectively.
  • the virtual look at the reality that is, the virtual object can make an action response or sound response according to the position and attitude information of the real object within its field of view
  • the position and posture information of the virtual object can emit 3D sound, so that the real object can hear the 3D sound, so as to locate the position of the virtual object through sound source positioning
  • virtual listening reality that is, the virtual object can listen to the real scene sound collected by the microphone according to the sound source. information, determine the physical location of the actual sound source, and make an action response or sound response based on the location of the actual sound source.
  • the AR interaction method provided in this embodiment will be described in detail by taking a real object as a human as an example.
  • the electronic device acquires a real scene image collected by a camera, and acquires attitude information of the electronic device collected by a motion sensor (IMU).
  • IMU motion sensor
  • the electronic device constructs a world coordinate system with the electronic device as the origin of coordinates according to the real scene image and the posture information of the electronic device.
  • the electronic device may use a SLAM algorithm to construct the world coordinate system.
  • the electronic device may directly use the world coordinate system until the world coordinate system fails (for example, the electronic device moves); or periodically update the world coordinate system, the The length of the update period is not limited in this application, and the update period may be equal to or greater than the acquisition period of the real scene image.
  • S703 The electronic device determines whether the distance between the real object and the electronic device is greater than a set threshold, and if so, executes S704; otherwise, executes S705.
  • the skeletal feature recognition technology is based on the entire skeleton feature of the human body. Even if the real object is far away from the electronic device, it will not affect the accuracy of the recognition result.
  • the electronic device may judge the distance between the real object and the electronic device in various ways.
  • the electronic device may detect the distance between the real object and the electronic device through an internal infrared sensor, an ultrasonic sensor, and the like.
  • S706 The electronic device collides the head of the real object in the real scene image with the SLAM point cloud, and calculates the position and attitude information of the head of the real object in the world coordinate system according to the collision value, for example shown in Figure 5B.
  • the specific process of this step can refer to the traditional SLAM point cloud collision method, which will not be repeated here.
  • the electronic device may also determine the position and attitude information of the head of the real object in the world coordinate system by other methods. For example, when the distance between the real object and the electronic device is less than or equal to the set threshold, the electronic device may determine the feature information (such as eyes, eyes, eyes, etc.) of the head in the real object through image recognition technology. feature information of at least one item of nose, ears, and mouth); finally, determine the position and posture information of the real object according to the feature information of the head.
  • the feature information such as eyes, eyes, eyes, eyes, etc.
  • the visual sensing information of the virtual object includes: the position and attitude information of the real object calculated by the real object position and attitude calculation module, and the action of the real object.
  • the electronic device may repeatedly execute the above S703-S707 in the update cycle of the real scene image (ie, execute S703-S707 frame by frame), so that the virtual object can respond according to the real-time visual sensing information.
  • the electronic device can construct a world coordinate system with the electronic device as the coordinate origin based on the existing SLAM technology according to the real scene image captured by the camera and the attitude information of the electronic device collected by the IMU;
  • the field of view is used as the field of view of the virtual object, so as to determine the position and attitude information of the real object within the field of view of the virtual object in the world coordinate system, and then according to the position and attitude information of the real object, the virtual object can make corresponding Actions.
  • the gaze of the virtual object adaptively follows the real person, and/or, follows the movement of the real person.
  • the AR interaction method provided in this embodiment is described in detail by taking a real object as a human as an example.
  • the electronic device can update the position and attitude information of the head of the real object in the world coordinate system and the position and attitude information of the virtual object in the world coordinate system with the update cycle of the real scene image, which specifically includes steps S8011 and S8012.
  • S8011 The electronic device updates the position and attitude information of the head of the real object in the world coordinate system. For the specific process, refer to the methods described in S701-S706 in the embodiment shown in FIG. 7, which will not be repeated here. .
  • the electronic device updates the position and attitude information of the virtual object in the world coordinate system.
  • the position and attitude information of the virtual object in the world coordinate system may be set by the electronic device, or set by the user, or may be determined by the electronic device in real time.
  • the electronic device may refer to step A1 performed by the AR image generation module in the embodiment shown in FIG. 4 to determine the position and posture information of the virtual object in the world coordinate system.
  • the electronic device may perform the following steps S802-S804 when the AR application sets the visual monitoring of the virtual object (that is, the virtual object performs a sound response according to the visual sensing information).
  • the electronic device determines the original sound data of the virtual object. Similar to the position and attitude information of the virtual object in the world coordinate system determined by the electronic device in S8012, the original sound data of the virtual object can be set by the electronic device, or set by the user, and can also be real-time by the electronic device. definite.
  • the electronic device may refer to step B1 performed by the 3D sound generation module in the embodiment shown in FIG. 4 to determine the original sound data of the virtual object.
  • the electronic device generates 3D audio data according to the position and attitude information of the head of the real object in the world coordinate system, the position and attitude information of the virtual object in the world coordinate system, and the original sound data of the virtual object , which may specifically include steps S8031-S8034.
  • the electronic device According to the position and posture information of the head of the real object in the world coordinate system, and the position and posture information of the virtual object in the world coordinate system, the electronic device respectively determines the distance between the ears of the virtual object and the real object. (ie, the distance between the left ear of the virtual object and the real object, and the distance between the virtual object and the right ear of the real object).
  • S8032 The electronic device calculates a volume difference and a time difference between the ears of the real object according to the distance between the virtual object and the ears of the real object.
  • S8033 The electronic device generates 3D audio data according to the obtained binaural volume difference and time difference of the real object and the original sound data.
  • the electronic device may filter the 3D audio data according to the posture information of the head of the real object, so as to further improve the authenticity of the 3D sound. Since sound may undergo various reflections and refractions in the process of transmission in the transmission medium, by filtering the 3D audio data, the reflection and refraction of the 3D audio data in the transmission process can be simulated, so that the The 3D sound heard by real objects is more realistic.
  • the electronic device plays the 3D audio data through a speaker or an earphone. Real objects can hear 3D sound, and localize the sound source according to the 3D sound to perceive the location of the virtual object.
  • the electronic device can update the position and posture information of the virtual object and the real object in real time, so that 3D audio data can be synthesized according to the position and posture relationship between the two, so that the emitted 3D sound is more realistic Spend.
  • a virtual musical instrument in a virtual world can produce 3D sound through the above method, and a person in the real world can determine the specific position of the virtual musical instrument in the virtual world through the 3D sound produced by the headset.
  • each virtual object in a virtual world with flowers, birds, fish, insects and landscapes, each virtual object can emit 3D sounds through the above methods, so that people in the real world can use the 3D sound emitted by headphones to determine the virtual objects that exist in the virtual world, and The location of virtual objects, etc., makes people in the real world more immersive.
  • a virtual object (virtual cat) in the virtual world can send 3D sound (cat meow) through the above method, and a person in the real world can determine that the virtual cat is located on the sofa through the 3D sound emitted by the headset. In the middle, not on the coffee table, so that what a person hears through the headphones is the same as what they see through the electronic device display.
  • the electronic device can implement the AR interaction method shown in FIG. 8 through the following functions, but not limited to: original sound calculation function, pose calculation function, 3D sound synthesis function .
  • the 3D sound synthesis function can call the original sound calculation function to obtain the original sound data of the virtual object; and call the pose calculation function to update the position and attitude information of the virtual object and the position and attitude information of the real object in real time; 3D audio data is generated according to the original sound data of the virtual object, the position and attitude information of the virtual object, and the position and attitude information of the real object.
  • S1101 The electronic device constructs a world coordinate system with the electronic device as the coordinate origin.
  • S701-S702 in the embodiment shown in FIG. 7, which will not be repeated here.
  • the electronic device collects real-time scene sound information through a microphone, and determines the position of the real sound source in the world coordinate system through sound source localization, which may specifically include steps S11021 and S11022.
  • the electronic device collects real-time scene sound information through a microphone, and determines the physical location of the real sound source through sound source localization.
  • the physical position of the real sound source can represent the positional relationship between the real sound source and the microphone (that is, the electronic device), for example, in which direction the real sound source is located in the electronic device, so the Describe the distance between the real sound source and the electronic device.
  • S11022 The electronic device converts the physical position of the real sound source into the position of the real sound source in the world coordinate system.
  • S1103 In the case where the AR application sets the auditory monitoring of the virtual object (that is, the virtual object performs an action response and/or a sound response according to auditory sensing information), the electronic device updates the virtual object according to the position of the real sound source in the world coordinate system The object's motion response and/or sound response.
  • the auditory sensing information of the virtual object includes: the position of the real sound source in the world coordinate system, the voice command issued by the real sound source, and the like.
  • the virtual object in the virtual world can identify the position of the real sound source, and make an action response or a sound response according to the position of the real sound source. For example, when a person in the real world makes a blowing sound, the seeds of a dandelion that is closer to the real sound source in the virtual world can spread out and make a movement of blowing in the wind. For another example, when a person in the real world shouts, a flock of birds in the virtual world that is closer to the real sound source can fly into the distance.
  • Embodiment 1 and Embodiment 3 provided in this application may be implemented independently, or may be implemented in combination, which is not limited in this application.
  • the present application also provides an AR interaction method, as shown in FIG. 12 , the method includes the following steps:
  • the electronic device determines pose information of a real object and/or a position of a real sound source in a real scene, where the pose information of the real object is used to represent the position and pose of the real object.
  • the electronic device may determine the pose information of the real object through the following steps:
  • the electronic device acquires the second real scene image collected by the camera, and identifies the key parts of the real object in the second real scene image; the electronic device constructs the SLAM point cloud collision technology according to the simultaneous positioning and map, and determines the 2.
  • the electronic device may determine the location of the real sound source through the following steps:
  • the electronic device acquires the sound information of the real scene collected by the microphone, and determines the positional relationship between the real sound source and the microphone through sound source localization; The positional relationship determines the position of the real sound source in the real scene.
  • the electronic device determines the pose information of the virtual object according to the pose information of the real object and/or the position of the real sound source, wherein the pose information of the virtual object is used to represent the position and attitude.
  • a world coordinate system with the electronic device as a coordinate origin may be established before the electronic device executes S1201.
  • the electronic device may establish the world coordinate system through the following steps: acquiring a first real scene image collected by a camera, and acquiring attitude information of the electronic device measured by an inertial measurement unit; according to the first reality The scene image, and the posture information of the electronic device, are established in the world coordinate system.
  • the pose information of the real object is specifically used to represent the position of the real object in the world coordinate system and attitude; the position of the real sound source is the position of the real sound source in the world coordinate system; the pose information of the virtual object is specifically used to represent the virtual object in the world coordinate system position and attitude.
  • the electronic device may determine the pose information of the virtual object according to the pose information of the real object and/or the position of the real sound source, and at least one of the following: The action of the real object identified by the information, the voice command issued by the real sound source, the image model, the action feature, the action response package in the virtual object model corresponding to the virtual object, and the user's response to the virtual object through the display screen. Object's touch operation.
  • the electronic device generates an AR image including the virtual object according to the pose information of the virtual object; displays the AR image; and/or according to the pose information of the real object and the virtual object pose information, generate 3D audio data of the virtual object; play the 3D audio data.
  • the electronic device may generate the 3D audio data of the virtual object through the following steps:
  • the electronic device respectively determines the distance between the ears of the virtual object and the real object according to the pose information of the real object and the pose information of the virtual object;
  • the electronic device calculates a volume difference and a time difference between the ears of the real object according to the distance between the virtual object and the ears of the real object;
  • the electronic device generates 3D audio data according to the volume difference and time difference of the real object and the original sound data of the virtual object.
  • the original sound data is set, or determined according to at least one of the following: the position and posture information of the real object, and the posture information of the virtual object, which is obtained through The action of the real object identified by the pose information of the real object, the voice command issued by the real sound source, the image model, the action feature, the action response package in the virtual object model corresponding to the virtual object, and the user Touch operation on the virtual object through the display screen.
  • the electronic device can also filter the 3D audio data according to the pose information of the real object, so as to simulate the transmission process of the 3D audio data. reflections and refractions, making the 3D sound heard by real objects more realistic.
  • the embodiment of the present application provides an AR interaction method, in which the electronic device can determine the pose information of the virtual object according to the pose information of the real object or the position of the real sound source, so that the AR image can be generated and displayed,
  • 3D sound data of the virtual object can be generated and played according to the pose information of the real object and the pose information of the virtual object.
  • the virtual object can perceive the pose of the real object and the position of the real sound source, and make corresponding action or sound responses based on the above information, so that the virtual object can be like a real object. Therefore, the method realizes the virtual-real interaction in which the visual and auditory of the virtual object and the real object are combined, which can improve the intelligence level of the virtual object, thereby improving the experience of virtual interaction.
  • the present application also provides an electronic device, which is used to implement the methods provided by the above embodiments.
  • the electronic device 1300 may include: a determination unit 1301 , an AR synthesis unit 1302 , a display unit 1303 , and an audio unit 1304 .
  • the display unit 1303 is used to present a user interface (for example, an AR image) to realize human-computer interaction.
  • a user interface for example, an AR image
  • the display unit 1303 includes a display panel, which is also called a display screen.
  • the audio unit 1304 is used to collect sound signals and play audio data. Similar to the audio circuit 206 in the electronic device shown in FIG. 2 , the audio unit 1304 may include a speaker and a microphone.
  • a determination unit 1301 configured to determine pose information of a real object and/or a position of a real sound source in a real scene, wherein the pose information of the real object is used to characterize the position and pose of the real object; and Describe the pose information of the real object and/or the position of the real sound source, and determine the pose information of the virtual object, wherein the pose information of the virtual object is used to represent the position and attitude of the virtual object;
  • AR synthesis unit 1302 configured to generate an AR image including the virtual object according to the pose information of the virtual object; and/or, according to the pose information of the real object and the pose information of the virtual object, generating 3D 3D audio data of the virtual object;
  • a display unit 1303, configured to display the AR image
  • the audio unit 1304 is used for playing the 3D audio data.
  • the determining unit 1301 is further configured to:
  • the pose information of the real object is specifically used to represent the position and pose of the real object in the world coordinate system
  • the position of the real sound source is the position of the real sound source in the world coordinate system
  • the pose information of the virtual object is specifically used to represent the position and pose of the virtual object in the world coordinate system.
  • the determining unit 1301 when establishing a world coordinate system with the electronic device as a coordinate origin, is specifically configured to:
  • the world coordinate system is established according to the first real scene image and the posture information of the electronic device.
  • the determining unit 1301 when determining the pose information of the real object in the real scene, is specifically configured to:
  • the pose information of the key parts of the real object in the second real scene image is determined
  • the pose information of the key parts of the real object is used as the pose information of the real object.
  • the determining unit 1301 when determining the position of the real sound source in the real scene, is specifically configured to:
  • the position of the real sound source in the real scene is determined.
  • the AR synthesis unit 1302 when determining the pose information of the virtual object according to the pose information of the real object and/or the position of the real sound source, is specifically configured to:
  • the pose information of the virtual object is determined according to the pose information of the real object and/or the position of the real sound source, and at least one of the following: the action of the real object identified by the pose information of the real object , the sound command issued by the real sound source, the image model, action feature, action response package in the virtual object model corresponding to the virtual object, and the user's touch operation on the virtual object through the display screen.
  • the AR synthesis unit 1302 when generating the 3D audio data of the virtual object according to the pose information of the real object and the pose information of the virtual object, is specifically configured to:
  • the pose information of the real object and the pose information of the virtual object respectively determine the distance between the ears of the virtual object and the real object;
  • 3D audio data is generated according to the volume difference and time difference of the real object and the original sound data of the virtual object.
  • the original sound data is set, or determined according to at least one of the following:
  • the position and attitude information of the real object, the pose information of the virtual object, the motion of the real object identified by the pose information of the real object, the sound command issued by the real sound source, the virtual object The image model, action feature, action response package in the virtual object model corresponding to the object, and the user's touch operation on the virtual object through the display screen.
  • the AR synthesis unit 1302 is further configured to:
  • the 3D audio data is filtered according to the pose information of the real object.
  • each functional unit in each embodiment of the present application It can be integrated in one processing unit, or it can exist physically alone, or two or more units can be integrated in one unit.
  • the above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.
  • the integrated unit if implemented in the form of a software functional unit and sold or used as an independent product, may be stored in a computer-readable storage medium.
  • the technical solutions of the present application can be embodied in the form of software products in essence, or the parts that contribute to the prior art, or all or part of the technical solutions, and the computer software products are stored in a storage medium , including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage medium includes: U disk, removable hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk and other media that can store program codes .
  • an embodiment of the present application further provides an electronic device, which is used to implement the AR interaction method provided by the above embodiments, and has the functions of the electronic device as shown in FIG. 13 .
  • the electronic device 1400 may include: a processor 1401 , a memory 1402 , a camera 1403 , a display screen 1404 , and an audio circuit 1405 .
  • the electronic device 1400 may also have various peripheral or internal hardware shown in FIG. 2 .
  • the processor 1401 is connected with other components.
  • the processor 1401 and other components may be connected to each other by a bus.
  • the bus may be a peripheral component interconnect standard (peripheral component interconnect, PCI) bus or an extended industry standard architecture (extended industry standard architecture, EISA) bus or the like.
  • PCI peripheral component interconnect
  • EISA extended industry standard architecture
  • the bus can be divided into an address bus, a data bus, a control bus, and the like. For ease of presentation, only one thick line is shown in FIG. 14, but it does not mean that there is only one bus or one type of bus.
  • the processor 1401 is configured to implement the AR interaction method provided by the above embodiments, including:
  • the pose information of the real object is used to characterize the position and pose of the real object
  • an AR image containing the virtual object is generated; the AR image is displayed through the display screen 1404; and/or according to the pose information of the real object and the virtual object pose information to generate 3D 3D audio data of the virtual object; play the 3D audio data through the audio circuit 1405 .
  • the memory 1402 is used to store program instructions and data.
  • the program instructions may include program code, which includes instructions for computer operation.
  • the memory 1402 may include random access memory (RAM), and may also include non-volatile memory (non-volatile memory), such as at least one disk storage.
  • the processor 1401 executes the program stored in the memory 1402, and implements the above functions through the above components, thereby finally implementing the methods provided by the above embodiments.
  • the embodiments of the present application further provide a computer program, when the computer program runs on a computer, the computer can execute the methods provided by the above embodiments.
  • the embodiments of the present application further provide a computer storage medium, where a computer program is stored in the computer storage medium, and when the computer program is executed by a computer, the computer executes the methods provided by the above embodiments.
  • an embodiment of the present application further provides a chip, where the chip is used to read a computer program stored in a memory to implement the methods provided by the above embodiments.
  • the embodiments of the present application provide a chip system, where the chip system includes a processor for supporting a computer apparatus to implement the functions involved in the electronic device in the methods provided by the above embodiments.
  • the chip system further includes a memory for storing necessary programs and data of the computer device.
  • the chip system may be composed of chips, or may include chips and other discrete devices.
  • the present application provides an AR interaction method and electronic device.
  • the electronic device can determine the pose information of the virtual object according to the pose information of the real object or the position of the real sound source, so that the AR image can be generated and displayed.
  • the pose information of the object generates and plays the 3D sound data of the virtual object.
  • the virtual object can perceive the pose of the real object and the position of the real sound source, and make corresponding action or sound responses based on the above information, so that the virtual object can be like a real object. Therefore, the method realizes the virtual-real interaction in which the visual and auditory of the virtual object and the real object are combined, which can improve the intelligence level of the virtual object, thereby improving the experience of virtual interaction.
  • the embodiments of the present application may be provided as a method, a system, or a computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
  • computer-usable storage media including, but not limited to, disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions
  • the apparatus implements the functions specified in the flow or flow of the flowcharts and/or the block or blocks of the block diagrams.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Computer Graphics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

本申请提供了一种增强现实交互方法及电子设备。在该方案中,电子设备可以根据现实对象的位姿信息或现实声源的位置,确定虚拟对象的位姿信息,从而可以生成并显示AR图像,另外还可以根据现实对象的位姿信息和虚拟对象的位姿信息,生成并播放虚拟对象的3D声音数据。显然,通过该方案,虚拟对象可以感知到现实对象的位姿、以及现实声源的位置,并基于以上信息做出相应的动作响应或声音响应,从而使虚拟对象可以像现实对象一样。因此,该方法实现虚拟对象与现实对象的视觉听觉相结合的虚实交互,可以提升虚拟对象的智能水平,进而提高虚拟交互的体验。

Description

一种增强现实交互方法及电子设备
相关申请的交叉引用
本申请要求在2020年12月31日提交中国专利局、申请号为202011627072.0、申请名称为“一种增强现实交互方法及电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及终端技术领域,特别涉及一种增强现实(augmented reality,AR)交互方法及电子设备。
背景技术
AR技术能够实现虚拟世界和现实世界的融合,但是如何提高虚拟对象的真实度缺很少被关注。因此,目前AR技术生成的虚拟角色真实度不高,导致用户体验不高。
发明内容
本申请提供一种AR交互方法及电子设备,用于提高虚拟对象的真实度,提高用户体验。
第一方面,本申请实施例提供了一种AR交互方法,该方法包括以下步骤:
电子设备确定现实场景中现实对象的位姿信息和/或现实声源的位置,其中,所述现实对象的位姿信息用于表征所述现实对象的位置和姿态;然后,所述电子设备根据所述现实对象的位姿信息和/或现实声源的位置,确定虚拟对象的位姿信息,其中,所述虚拟对象的位姿信息用于表征所述虚拟对象的位置和姿态;最后,所述电子设备根据所述虚拟对象的位姿信息,生成包含所述虚拟对象的AR图像;显示所述AR图像;和/或所述电子设备根据所述现实对象的位姿信息和所述虚拟对象的位姿信息,生成所述虚拟对象的3维3D音频数据;播放所述3D音频数据。
通过该方案,虚拟对象可以感知到现实对象的位姿,以及现实声源的位置,并基于以上信息做出相应的动作响应或声音响应,从而使虚拟对象可以像现实对象一样。因此,该方法实现虚拟对象与现实对象的视觉听觉相结合的虚实交互,可以提升虚拟对象的智能水平,进而提高虚拟交互的体验。
在一种可能的设计中,所述电子设备还可以建立以所述电子设备为坐标原点的世界坐标系;在该设计中,所述现实对象的位姿信息具体用于表征所述现实对象在所述世界坐标系中的位置和姿态;所述现实声源的位置为所述现实声源在所述世界坐标系中的位置;所述虚拟对象的位姿信息具体用于表征所述虚拟对象在所述世界坐标系中的位置和姿态。
通过该设计,可以提高所述电子设备确定的现实对象的位姿信息、现实声源的位置,虚拟对象的位姿信息的准确性。
在一种可能的设计中,所述电子设备可以通过以下步骤,建立以所述电子设备为坐标原点的世界坐标系:
所述电子设备获取摄像头采集的第一现实场景图像,以及获取惯性测量单元测量的所述电子设备的姿态信息;然后所述电子设备根据所述第一现实场景图像,以及所述电子设备的姿态信息,采用设定的坐标系构建算法(例如同步定位与地图构建(simultaneous localization and mapping,SLAM)算法)建立以所述世界坐标系。
在一种可能的设计中,所述电子设备可以通过以下步骤,确定现实场景中的现实对象的位姿信息:
所述电子设备获取摄像头采集的第二现实场景图像,识别所述第二现实场景图像中的现实对象的关键部位;然后,所述电子设备根据同步定位与地图构建SLAM点云碰撞技术,确定所述第二现实场景图像中所述现实对象的关键部位(例如头部)的位姿信息;最后,所述电子设备将所述现实对象的关键部位的位姿信息作为所述现实对象的位姿信息。
通过该设计,当所述现实对象的体积较大时,所述电子设备可以将现实对象的关键部位的位姿信息作为所述现实对象的位姿信息。
在一种可能的设计中,所述电子设备可以通过如下步骤,确定现实场景中现实声源的位置:
所述电子设备获取麦克风采集的现实场景声音信息,通过声源定位确定所述现实声源与所述麦克风之间的位置关系;然后,所述电子设备可以根据所述现实声源与所述麦克风之间的位置关系,确定所述现实场景中现实声源的位置。例如,根据所述现实声源与所述麦克风之间的位置关系,确定所述现实声源在所述世界坐标系中的位置。
在一种可能的设计中,所述电子设备可以根据所述现实对象的位姿信息和/或现实声源的位置,以及以下至少一项确定所述虚拟对象的位姿信息:通过所述现实对象的位姿信息识别的所述现实对象的动作,所述现实声源发出的声音指令,所述虚拟对象对应的虚拟对象模型中的形象模型、动作特征、动作响应包,以及用户通过显示屏对所述虚拟对象的触控操作。
其中,电子设备确定的现实对象的位姿信息可以视为所述虚拟对象看到的现实对象的位姿信息,所述现实声源的位置可以视为所述虚拟对象的听到的现实声源的位置。通过该设计中,电子设备可以利用虚拟对象感知到的现实对象的位姿和现实声源的位置,确定虚拟对象的位置,从而实现虚拟对象可以根据感知到现实对象的位姿,以及现实声源的位置,做出相应的动作响应,从而使虚拟对象可以像现实对象一样。另外,在确定所述动作响应时,所述电子设备还可以考虑一下其他因素,从而提高确定所述虚拟对象的位姿信息的灵活性。
在一种可能的设计中,所述电子设备可以通过以下步骤,根据所述现实对象的位姿信息和所述虚拟对象的位姿信息,生成所述虚拟对象的3D音频数据:
所述电子设备根据所述现实对象的位姿信息和所述虚拟对象的位姿信息,分别确定所述虚拟对象与所述现实对象的双耳之间的距离;然后,所述电子设备根据所述虚拟对象与所述现实对象的双耳之间的距离,计算所述现实对象的双耳的音量差和时间差;最后,所述电子设备根据所述现实对象的音量差和时间差,以及所述虚拟对象的原始声音数据,生成3D音频数据。
通过该设计,所述现实设备可以根据所述现实对象的音量差和时间差,生成3D音频数据,从而可以提高播放所述3D音频数据发出的3D声音的真实性。
在一种可能的设计中,所述原始声音数据为设定的,或者为根据以下至少一项确定的:
所述现实对象的位置以及姿态信息,所述虚拟对象的位姿信息,通过所述现实对象的位姿信息识别的所述现实对象的动作,所述现实声源发出的声音指令,所述虚拟对象对应的虚拟对象模型中的形象模型、动作特征、动作响应包,以及用户通过显示屏对所述虚拟对象的触控操作。
在一种可能的设计中,所述电子设备在播放所述3D音频数据之前,还可以根据所述现实对象的位姿信息,对所述3D音频数据进行滤波,从而可以模拟所述3D音频数据在传输过程中的反射和折射,从而使现实对象听到的3D声音更加真实。
第二方面,本申请实施例还提供了一种电子设备,包括用于执行上述第一方面各个步骤的单元或模块。
第三方面,本申请提供一种电子设备,包括至少一个处理元件和至少一个存储元件,其中所述至少一个存储元件用于存储程序和数据,所述至少一个处理元件用于执行所述存储器中存储的程序,以使得本申请第一方面中提供的各个设计均可以实现。可选的,所述电子设备还可以包括显示屏、摄像头、音频电路等部件。
第四方面,本申请实施例中还提供一种计算机存储介质,该存储介质中存储软件程序,该软件程序在被一个或多个处理器读取并执行时可实现第一方面中各个设计所提供的方法。
第五方面,本申请实施例还提供一种包含指令的计算机程序,当该指令在计算机上运行时,使得计算机可以执行上述第一方面中各个设计所提供的所述的方法。
第六方面,本申请实施例还提供了一种芯片***,该芯片***包括处理器,用于支持电子设备实现上述第一方面中各个设计所涉及的功能。在一种可能的设计中,所述芯片***还包括存储器,所述存储器,用于保存电子设备必要的程序指令和数据。该芯片***,可以由芯片构成,也可以包含芯片和其他分立器件。
附图说明
图1为本申请实施例提供的虚实视听交互示意图;
图2为本申请实施例提供的一种电子设备的结构图;
图3为本申请实施例提供的一种电子设备的软件架构图;
图4为本申请实施例提供的一种AR交互方法的实现框图;
图5A为本申请实施例提供的一种电子设备的世界坐标系示意图;
图5B为本申请实施例提供的一种SLAM点云碰撞结果仿真图;
图6为本申请实施例提供的一种虚实视听交互示意图;
图7为本申请实施例提供的一种AR交互流程图;
图8为本申请实施例提供的另一种AR交互流程图;
图9为本申请实施例提供的一种AR交互实例示意图;
图10为本申请实施例提供的一种AR交互业务实现流程图;
图11为本申请实施例提供的又一种AR交互流程图;
图12为本申请实施例提供的再一种AR交互流程图;
图13为本申请实施例提供的另一种电子设备的结构图;
图14为本申请实施例提供的再一种电子设备的结构图。
具体实施方式
本申请提供一种AR交互方法及电子设备,用于提高虚拟对象的真实度,提高用户体验。其中,方法和电子设备是基于同一技术构思的,由于方法及电子设备解决问题的原理相似,因此电子设备与方法的实施可以相互参见,重复之处不再赘述。
以下,对本申请中的部分用语进行解释说明,以便于本领域技术人员理解。
1)、电子设备,为具有数据连通功能、数据计算和处理功能的设备或装置。例如,所述电子设备可以为手机、平板电脑、笔记本电脑、上网本、车载设备,智能家居设备(如智能电视),以及商务智能终端(包括:可视电话、会议桌面智能终端等)、个人数字助理(personal digital assistant,PDA)、增强现实(augmented reality,AR)\虚拟现实(virtual reality,VR)设备等,本申请对所述电子设备的具体形态不作限定。
2)、3维(3dimensions,3D)声音,能够使听者实现声源定位的声音。我们知道3D显示是利用用户双眼的视差,构建出一个3D的观感。与视差的概念类似的,用户的双耳可以利用同一声源发出的声音到达双耳的音量差和时间差来进行最基本的声源定位。
其中,音量差为同一声音到达用户双耳的音量差别。用户可以根据该音量差,感知出声源的位置位于左侧还是右侧。单纯靠音量差用户无法实现准确地声源定位,因此,用户还需要通过时间差进行定位,也就是哈斯效应(HAAS effect)。一般而言,实现哈斯效应的条件为同一声音到达用户的双耳的时间差需要在30ms以内,否则会在人脑内产生两个声像,形成回声(echo)。
3)、虚拟对象,是使用AR技术利用计算机等设备生成的一种逼真的,具有视、听、力、触、动等感觉的虚拟世界中的事物。示例性的,虚拟对象可以为人物、动物、景物、静物,还可以文字等;其可以产生各种动画效果,例如,姿势变化、发出声音、动作表情、与现实对象互动等。
每个虚拟对象一般是计算机根据对应的虚拟对象模型生成的。每个虚拟对象模型不仅可以对对应的虚拟对象的形象进行设定,还可以设置其声音特征、动作特征等,以及声音响应包、动作响应包等内容。
其中,虚拟对象的声音响应即为了对现实世界中的现实对象做出反应而发出声音,又称为听觉响应;虚拟对象的动作响应即为了对现实世界中的现实对象做出反应而做出动作,又称为视觉响应。
具体的,虚拟对象可以基于监听模式对现实对象做出反应(动作响应和/或声音响应)。所述监听模式可以具体包括:视觉监听和/或听觉监听。其中,视觉监听,即虚拟对象可以根据在其视野范围(电子设备的摄像头视野范围)内现实对象的位置与姿态信息,做出反应。听觉监听,即虚拟对象可以根据听到的现实场景声音(通过麦克风获取的现实场景声音信息)做出反应。
4)、现实对象,与虚拟对象对应的,为现实世界中现实存在的事物。示例性的,现实对象也可以为人物、动作、景物、景物等。当现实对象发出声音的情况下,其可以作为现实声源。
5)、多个,是指两个或两个以上。至少一个是指一个和多个。
6)、“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。字符“/”一般表示前后关联对象是一种“或”的关系。
另外,需要理解的是,在本申请的描述中,“第一”、“第二”等词汇,仅用于区分描述的目的,而不能理解为指示或暗示相对重要性,也不能理解为指示或暗示顺序。
下面结合附图,对本申请实施例进行详细说明。
AR技术是利用计算机生成一种逼真的虚拟世界(虚拟环境),能够使用户沉浸到该虚拟世界中,实现用户和虚拟世界的自然交互。目前,AR技术主要关注如何提升虚拟世界和现实世界的融合,却对提高虚拟世界中虚拟对象的真实度缺少关注。
在现实世界中,第一现实对象可以在看到或听到第二现实对象时,会直接做出相应的反应(动作响应和/或声音响应)。当然在第二现实对象能够做出动作响应和/或声音响应的情况下,该第二现实对象也可以根据第一现实对象的动作响应和/或声音响应,做出动作响应和/或声音响应。
因此,为了提高虚拟对象的真实度,在本申请实施例中,虚拟对象可以像现实对象一样,与现实对象实现上述视听交互,参阅图1所示。这样,现实对象不仅能够看到和听到虚拟对象,虚拟对象也可以看到和/或听到现实对象,并能够做出一些相应的动作响应和/或声音响应,以便现实对象或用户能够感知到虚拟对象能够具有视听功能。
通过本申请实施例提供的方案,虚拟对象可以像现实对象一样能够看到和/或听到现实对象,并做出一些相应的动作响应和/或声音响应,从而可以提高虚拟对象的真实度,使用户可以沉浸到虚拟世界中,最终提高用户体验。
本申请实施例提供的AR交互方法可以适用于如图2所示的电子设备中,下面先对该电子设备的结构进行说明。图2示出了电子设备的一种可能的结构。参阅图2所示,电子设备200中包含:通信单元201、处理器202、存储器203、显示单元204、输入单元205、音频电路206、传感器207、摄像头208等部件。下面结合图2对所述电子设备200的各个构成部件进行具体的介绍。
通信单元201用于实现所述电子设备200的功能,实现与其他设备的数据通信。可选的,所述通信单元201中可以包含无线通信模块2011和移动通信模块2012。除了所述通信单元201,所述电子设备200还需要配合天线、处理器202中的调制解调处理器和基带处理器等部件实现通信功能。
无线通信模块2011可以提供应用在电子设备上的包括无线局域网(wireless local area networks,WLAN)(如无线保真(wireless fidelity,Wi-Fi)网络),蓝牙(Bluetooth,BT),全球导航卫星***(global navigation satellite system,GNSS),调频(frequency modulation,FM),近距离无线通信技术(near field communication,NFC),红外技术(infrared,IR)等无线通信的解决方案。无线通信模块2011可以是集成至少一个通信处理模块的一个或多个器件。无线通信模块2011经由天线接收电磁波,将电磁波进行信号调频以及滤波处理,将处理后的信号发送到处理器202。无线通信模块2011还可以从处理器202接收待发送的信号,对其进行调频、放大,经天线转为电磁波辐射出去。通过所述无线通信模块2011,所述电子设备200可以连接一些周边设备,例如连接接入点、连接无线耳机或无线音响等。
移动通信模块2012可以提供应用在电子设备上的包括2G/3G/4G/5G等移动通信的解决方案。移动通信模块2012可以包括至少一个滤波器,开关,功率放大器,低噪声放大器(low noise amplifier,LNA)等。移动通信模块2012可以由天线接收电磁波,并对接收的 电磁波进行滤波,放大等处理,传送至调制解调处理器进行解调。移动通信模块2012还可以对经调制解调处理器调制后的信号放大,经天线转为电磁波辐射出去。在一些实施例中,移动通信模块2012的至少部分功能模块可以被设置于处理器202中。在一些实施例中,移动通信模块2012的至少部分功能模块可以与处理器202的至少部分模块被设置在同一个器件中。
所述电子设备200可以根据所述移动通信模块2012与移动通信***中的基站建立无线连接,并通过所述移动通信模块2012接受移动通信***的服务。
所述通信单元201中还可以包括通信接口,用于所述电子设备200与其他设备实现物理连接。所述通信接口可以与所述其他设备的通信接口通过电缆连接,实现所述电子设备200和其他设备之间的数据传输。示例性的,通过所述通信接口所述电子设备200可以连接耳机,音响等设备。
所述存储器203可用于存储软件程序以及数据。所述处理器202通过运行存储在所述存储器203的软件程序以及数据,从而执行所述电子设备200的各种功能以及数据处理。在本申请实施例中,所述软件程序可以包括实现AR交互方法的AR应用程序。
可选的,所述存储器203可以主要包含存储程序区和存储数据区。其中,存储程序区可存储操作***、各种软件程序等;存储数据区可存储用户输入或者所述电子设备200在运行软件程序过程中创建的数据等。其中,所述操作***可以为
Figure PCTCN2021140320-appb-000001
Figure PCTCN2021140320-appb-000002
等。此外,所述存储器203可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。例如,在本申请实施例中,实现控制方法的AR应用程序可以存储在存储程序区中,而虚拟对象模型、虚拟对象的位置和姿态、现实对象的位置和姿态等数据可以存储在存储数据区中。
所述输入单元205可用于接收用户输入的字符信息以及信号。可选的,输入单元205可包括触控面板2051以及其他输入设备(例如功能键)。其中,所述触控面板2051,也称为触摸屏,可收集用户在其上或附近的触摸操作,生成相应的触摸信息发送给处理器202,以使处理器202执行该触摸信息对应的命令。触控面板2051可以采用电阻式、电容式、红外线以及表面声波等多种类型实现。
所述显示单元204用于呈现用户界面,实现人机交互。例如,所述显示单元204可以显示由用户输入的信息,或提供给用户的信息,以及所述电子设备200的各种菜单、各个主界面(包含各种应用的图标),各个应用的窗口,摄像头拍摄的图像,AR应用生成的包含虚拟对象的AR图像等内容。
所述显示单元204可以包括显示面板2041,所述显示面板2041又称为显示屏,可以采用液晶显示屏(liquid crystal display,LCD)、有机发光二极管(organic light-emitting diode,OLED)等形式来配置。
需要说明的是,所述触控面板2051可覆盖所述显示面板2041,虽然在图2中,所述触控面板2051与所述显示面板2041是作为两个独立的部件来实现所述电子设备200的输入和输入功能,但是在本申请实施例中,可以将所述触控面板2051与所述显示面板2041集成(即触摸显示屏)而实现所述电子设备200的输入和输出功能。
所述处理器202是所述电子设备200的控制中心,利用各种接口和线路连接各个部件,通过运行或执行存储在所述存储器203内的软件程序和/或模块,以及调用存储在所述存储器203内的数据,执行所述电子设备200的各种功能和处理数据,从而实现所述电子设备 200的多种业务。例如,所述处理器202可以运行存储在所述存储器203中的AR应用程序,实现本申请实施例提供的AR交互方法。
可选的,所述处理器202可包括一个或多个处理单元。所述处理器202可集成应用处理器、调制解调处理器、基带处理器,图形处理器(graphics processing unit,GPU)等,其中,应用处理器主要处理操作***、用户界面和应用程序等,调制解调处理器主要处理无线通信。可以理解的是,上述调制解调处理器也可以不集成到所述处理器202中。
所述音频电路206(包括扬声器2061,麦克风2062)可提供用户与所述电子设备200之间的音频接口。音频电路206可将接收到的音频数据转换后的电信号,传输到所述扬声器2061,由所述扬声器2061转换为声音信号输出。另一方面,所述麦克风2062将收集的声音信号转换为电信号,由所述音频电路206接收后转换为音频数据,以进行传输或存储等进一步处理。在本申请实施例中,电子设备200中可以通过麦克风2062采集声源发出的声音信号,从而可以根据采集的声音信号进行声源定位。另外,所述电子设备200还可以在生成3D音频数据后,通过所述扬声器2061输出。
所述电子设备200还可以包括一种或多种传感器207,比如光传感器、运动传感器、超声波传感器以及其他传感器。所述电子设备200可以根据所述传感器207采集的实时传感器数据,实现各种功能。
其中,所述运动传感器中可以包含惯性测量单元(inertial measurement unit,IMU)。所述IMU用于测量所述电子设备200的姿态信息的装置。其中,所述电子设备200的姿态信息用于表征所述电子设备200的运动姿态,具体可以包括:所述电子设备200在三个方向轴上的姿态角及加速度。可选的,所述IMU中可以包含三个加速度传感器和三个陀螺仪。每个加速度传感器用于测量一个方向轴的加速度,每个陀螺仪用于测量一个方向轴的姿态角。
所述电子设备200内部还可以包括至少一个摄像头208,以采集现实场景的图像。当电子设备200包括多个摄像头208的情况下,这些摄像头208中包含位于电子设备200正面的前置摄像头,还包含位于电子设备200背面的后置摄像头。
本领域技术人员可以理解,图2中示出的电子设备200的结构并不构成对本申请适用的电子设备的限定,本申请实施例适用的电子设备可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。
本申请提供的电子设备的软件***可以采用分层架构、事件驱动架构,微核架构、微服务架构,或云架构。本申请实施例以分层架构的安卓(Android)***为例,示例性说明电子设备的软件结构。
图3示出了本申请实施例提供的电子设备的软件结构框图。如图3所示,电子设备的软件结构可以是分层架构,例如可以将软件分成若干个层,每一层都有清晰的角色和分工。层与层之间通过软件接口通信。在一些实施例中,将Android***分为四层,从上至下分别为应用程序层,框架层(framework,FWK),安卓运行时(Android runtime)和***库,以及内核层。
应用程序层可以包括一系列应用程序。如图3中所示,应用程序层可以包括相机应用、语音助手应用、AR应用、音乐应用、视频应用、地图应用,以及第三方应用程序等。其中,第三方应用程序可以包括微信应用、爱奇异应用等。
框架层为应用程序层中的应用程序提供应用编程接口(application programming interface,API)和编程框架。应用程序框架层可以包括一些预先定义的函数。如图3所示,应用程序框架层可以包括:***服务(System Service)、视图***(View System)、网页服务(Web Service),电话管理器,资源管理器等。
其中,***服务中可以包含窗口管理服务(window manager service,WMS)、活动管理服务(activity manager service,AMS)。其中,在本申请实施例中,所述***服务中还可以包含一个***级服务——AR管理服务。下面分别对***服务中的各个服务进行说明。
窗口管理服务,为窗口(window)提供窗口管理服务,具体控制所有窗口的显示、隐藏以及窗口在显示屏中的位置。窗口管理服务具体可以负责以下功能:1、为每个窗口分配显示平面(surface);2、管理surface的显示顺序、尺寸、位置;3、通过调用管理函数(例如surface控制函数(SurfaceControl.Transaction)),调节窗口的透明度、拉伸系数、位置和尺寸,实现窗口的动画效果;4、与输入***相关,例如当电子设备接收到一个触摸事件时,电子设备可以通过窗口管理服务为用户提供一个合适的窗口来显示或处理这个消息。
活动管理服务,为应用中的活动(activity)提供管理服务。所述活动管理服务可以但不限于负责以下功能:1、统一调度所有应用的活动的生命周期;2、启动或结束应用的进程;3、启动并调度服务的生命周期;4、注册广播接收器(Broadcast Receiver),并接收和分发广播(Broadcast);5、查询***当前运行状态;6、调度任务(task)。
AR管理服务,用于实现本申请实施例提供的AR交互方法,提供AR交互服务。所述AR管理服务可以但不限于负责以下功能:1、构建世界坐标系;2、确定现实对象在该世界坐标系中的位置和姿态信息;3、识别现实声源在该世界坐标系中的位置;4、确定虚拟对象在该世界坐标系中的位置和姿态信息;5、根据现实对象的位置和姿态信息,现实声源的位置,以及虚拟对象的位置和姿态信息中的至少一项,生成3D音频数据;6、根据现实对象的位置和姿态信息,现实声源的位置,以及虚拟对象的位置和姿态信息中的至少一项,生成包含虚拟对象的AR图像。
视图***中包括可视控件,例如显示文字的控件,显示图片的控件等。视图***可用于构建应用程序。界面可以由一个或多个控件组成的。例如,包括短信通知图标的界面,可以包括显示文字的控件以及显示图片的控件。
网页服务(Web Service),为能够通过网页进行调用的API。电话管理器用于提供电子设备的通信功能。例如通话状态的管理(包括接通,挂断等)。资源管理器为应用程序提供各种资源,比如本地化字符串,图标,图片,布局文件,视频文件等等。
Android runtime包括核心库(Kernel Library)和虚拟机。Android runtime负责安卓***的调度和管理。其中,核心库包含两部分:一部分是java语言需要调用的功能函数,另一部分是安卓***的核心库,用于为安卓***提供输入/输出服务(Input/Output Service)和核心服务(Kernel Service)。应用程序层和框架层可以运行在虚拟机中。虚拟机将应用程序层和框架层的java文件执行为二进制文件。虚拟机用于执行对象生命周期的管理,堆栈管理,线程管理,安全和异常的管理,以及垃圾回收等功能。
***库可以包括多个功能模块。例如:虚拟对象模型库、媒体库(media libraries),图像处理库等。
虚拟对象模型库,用于可以管理多个虚拟对象模型。每个虚拟对象模型用于生成一个 虚拟对象。每个虚拟对象模型不仅可以对对应的虚拟对象的形象进行设定,还可以设置其声音特征、动作特征等,以及听觉响应包、视觉响应包等内容。
媒体库支持多种格式的音频、视频的回放和录制,以及支持打开多种格式的静态图像等。媒体库可以支持多种音视频编码格式,例如:MPEG4,H.264,MP3,AAC,AMR,JPG,PNG等。
内核层是硬件和软件之间的层。内核层至少包含显示驱动,传感器驱动、处理器驱动、摄像头驱动,音频驱动等,用于驱动硬件层中的硬件。
硬件层可以包括各类传感器、显示屏、处理器、输入设备、内存、摄像头等。
为了提高AR技术中虚拟对象的真实度,本申请实施例提供了一种AR交互方法,该方法可以实现如图1所示的虚实视听交互。下面基于图2所示的电子设备的硬件结构和图3所示的电子设备的软件结构,参考图4对该方案进行详细说明。
参阅图4所示,按照电子设备的逻辑功能的不同,所述电子设备中实现AR交互方法的软件可以划分为以下模块:现实对象位置姿态计算模块、存储模块、虚拟对象模型库、AR合成模块。另外,该方法还需要配合一些电子设备内部或外部的其他器件实现,如图4中所示,这些器件可以分为两类:采集器件和输出器件。其中,采集器件可以包含:摄像头、运动传感器(以下仅以运动传感器中的IMU为例进行说明),麦克风;所述输出器件可以包括:显示屏,扬声器或耳机。
所述摄像头用于对现实场景进行拍摄,得到现实场景的图像(以下简称现实场景图像)。所述IMU用于测量所述电子设备的姿态信息,其中,所述电子设备的姿态信息用于表征所述电子设备的运动姿态,其中可以包括:所述电子设备在三个方向轴上的姿态角及加速度。可选的,所述三个方向轴中两两正交,用于构成世界坐标系。所述麦克风用于采集现实场景中的声音,得到现实场景声音信息。
所述现实对象位置姿态计算模块,具体包括以下功能:获取摄像头采集的现实场景图像和获取IMU测量的电子设备的姿态信息;根据获取的现实场景图像和电子设备的姿态信息,构建以所述电子设备为坐标原点的世界坐标系;识别所述现实场景图像中的现实对象,并确定所述现实对象在所述世界坐标系中的位置(以下简称为所述现实对象的位置)以及姿态信息;获取麦克风采集的现实场景声音信息;根据所述现实场景声音信息计算现实声源的物理位置;将所述现实声源的物理位置转换为该现实声源在该世界坐标系中的位置(以下简称为所述现实声源的位置)。
其中,与所述电子设备的姿态信息类似的,所述现实对象的姿态信息用于表征所述现实对象的运动姿态,其中可以包括:所述现实对象在世界坐标系的三个方向轴上的姿态角及加速度。所述现实声源的物理位置能够表征所述现实声源与所述麦克风(即所述电子设备)之间的位置关系,例如所述现实声源位于所述电子设备的哪个方向,所述现实声源与所述电子设备之间的距离。
可选的,所述现实对象位置姿态计算模块可以根据获取的现实场景图像和电子设备的姿态信息,利用同步定位与地图构建(simultaneous localization and mapping,SLAM)算法或其它算法,构建所述世界坐标系。需要说明的是,所述现实对象位置姿态计算模块可以以所述电子设备显示屏的一个角为坐标原点,或者以显示屏的一个边的中心点为坐标原点,又或者以显示屏的中心点为坐标原点,再或者以显示屏中设定位置为坐标原点,本申请实 施例对此不作限定。示例性的,所述世界坐标系可以如图5A所示,其中,坐标原点为显示屏一条边的中心点,在显示屏中能够显示虚拟世界,显示屏外为现实世界。
另外,所述现实对象位置姿态计算模块可以通过多种方式,确定现实对象的位置以及姿态信息。由于现实对象的体积较大,为了保证准确性,所述现实对象位置姿态计算模块可以将现实对象中关键部位的位置和姿态信息作为现实对象的位置和姿态信息。
例如,所述现实对象位置姿态计算模块可以识别所述现实场景图像中的现实对象中的关键部位,并将该现实对象中的关键部位与SLAM点云碰撞,根据碰撞值计算该关键部位的位置与姿态信息。示例性的,SLAM点云碰撞结果如图5B所示,其中,图5B中的(a)为现实场景图像中的现实对象,图5B中的(b)为SLAM点云碰撞结果仿真图。
又例如,所述现实对象位置姿态计算模块可以识别所述现实场景图像中的现实对象中的关键部位,并通过图像识别技术确定所述现实对象中关键部位的特征信息;最后根据所述关键部位的特征信息,确定所述现实对象的位置与姿态信息。示例性的,当该现实对象为人或动物时,关键部位可以为头部,所述现实对象位置姿态计算模块可以识别人或动物头部(例如包括眼睛、鼻子、耳朵、嘴巴)的特征信息。
还需要说明的是,在本申请实施例中,所述现实对象位置姿态计算模块可以采用多种方式,识别现实场景图像中的现实对象的关键部位。示例性的,当所述现实对象为人或动物时,所述现实对象位置姿态计算模块可以通过脸部识别技术或骨骼特征识别技术,识别出现实对象的头部。又例如,所述现实对象位置姿态计算模块可以采用3D物理识别技术,识别现实场景图像中的现实对象的关键部位。
所述现实对象位置姿态计算模块可以通过传统的声源定位技术,确定所述现实声源与所述麦克风之间的位置关系(即所述现实声源的物理位置),并最终将该现实声源与所述麦克风之间的位置关系转换为所述现实声源在世界坐标系中的位置。
所述存储模块用于存储所述现实对象位置姿态计算模块计算处的现实对象的位置以及姿态信息;以及存储现实声音的位置。
所述虚拟对象模块库用于存储至少一个虚拟对象模型。每个虚拟对象模型可以但不限于包含以下至少一项:形象模型、声音特征、动作特征、声音响应包,动作响应包。其中,形象模型用于对虚拟对象的形象进行设定。所述声音特征用于对虚拟对象发出的声音进行设定,具体可以包括:音量、音色、音调等。动作特征用于对虚拟对象能够做出的动作进行设定,具体可以包括:动作类型、动作幅度等。声音响应包包含多个虚拟对象的感应信息与需要发出的原始声音之间的对应关系。动作响应包包含多个虚拟对象的感应信息与需要做出的动作之间的对应关系。
其中,虚拟对象的感应信息中可以但不限于包括以下两类:视觉类感应信息、听觉类感应信息。其中,视觉类感应信息包含:现实对象位置姿态计算模块计算的现实对象的位置以及姿态信息、现实对象的动作等。听觉类感应信息包含:现实声源的位置,现实声源发出的声音指令等。可选的,所述虚拟对象的感应信息中还可以包括用户通过显示屏对虚拟对象进行的触控操作,这些触控操作还可以称为触觉类感应信息。可选的,所述现实声音发出的声音指令可以通过语音识别技术对现实场景声音信息进行语音识别而得到的。
示例性的,下面以虚拟对象为人的虚拟对象模型为例,对虚拟对象模型进行具体说明:
形象模型可以具体设定:性别、身高、体重、身材比例、五官特征,衣物特征,以及其他视觉特征(面部斑点的位置)等等。
声音特征可以具体设定音量、音色、音调,使该虚拟对象发出的声音特征符合虚拟对象的形象。
动作特征可以具体设定虚拟对象能够执行的基础动作,例如,走路、跑步、跳、握手、挥手等。这些基础动作中的任一个可以独立成为虚拟对象的一个动作响应,也可以多次重复或与其他基础动作进行组合形成虚拟对象的一个动作响应。
声音响应包中可以包含:在虚拟对象的感应信息表征现实对象走进时,其对应的原始声音为“您好”;在虚拟对象的感应信息表征现实对象离开时,其对应的原始声音为“再见”;在用户通过显示屏对虚拟对象进行点击的触控操作时,其对应的原始声音为笑声。
动作响应包中可以包括:在虚拟对象的感应信息表征现实对象走进时,其对应的动作为走进-伸手-握手;在虚拟对象的感应信息表征现实对象离开时,其对应的动作为抬手-挥手;在用户通过显示屏对虚拟对象进行点击的触控操作时,其对应的动作为抖动。
所述AR合成模块用于通过AR技术合成AR图像和3D音频数据,并将AR图像通过显示屏显示,将3D音频数据通过扬声器、耳机等播放。具体的,所述AR合成模块中具体可以包括AR图像生成模块和3D声音生成模块。
具体地,所述AR图像生成模块可以具体通过以下步骤合成AR图像:
A1:所述AR图像生成模块确定虚拟对象的位置以及姿态信息。
可选的,所述AR图像生成模块可以通过多种方式确定所述虚拟对象的位置以及姿态信息。
可选的,所述AR图像生成模块根据以下至少一项,实时确定虚拟对象的位置以及姿态信息:
现实对象的位置以及姿态信息、现实对象的动作、现实声源的位置,现实声源发出的声音指令、虚拟对象模型中的形象模型、动作特征、动作响应包,以及用户通过显示屏对虚拟对象的触控操作。其中,与所述现实对象的姿态信息类似的,所述虚拟对象的姿态信息用于表征所述虚拟对象的运动姿态,其中可以包括:所述虚拟对象在所述世界坐标系的三个方向轴上的姿态角及加速度。
又例如,所述虚拟对象的位置以及姿态信息可以是电子设备设定的,或者用户设定的,或者为虚拟对象模型设定的。
A2:所述AR图像生成模块可以根据虚拟对象的位置以及姿态信息,合成AR图像。其中,该过程可以参考传统的AR图像合成技术,此处不再赘述。可选的,在本步骤中,所述AR图像生成模块还可以利用现实场景图像合成该AR图像。
另外,所述3D声音生成模块还可以具体通过以下步骤合成3D音频数据:
B1:所述3D声音生成模块可以根据以下至少一项,确定虚拟对象的原始声音数据:
现实对象的位置以及姿态信息、现实对象的动作、现实声源的位置,现实声源发出的声音指令、声音特征、声音响应包,以及用户通过显示屏对虚拟对象的触控操作。
B2:所述3D声音生成模块根据所述现实对象的位置以及姿态信息,所述虚拟对象的位置以及姿态信息,以及所述原始声音数据,生成3D音频数据。
示例性的,步骤B2中可以具体包括以下步骤:
C1:所述3D声音生成模块根据所述现实对象的位置以及姿态信息,所述虚拟对象的位置以及姿态信息,计算虚拟对象与现实对象双耳之间的距离;
C2:所述3D声音生成模块根据虚拟对象与现实对象双耳之间的距离,计算现实对象 双耳的音量差和时间差;
C3:所述3D声音生成模块根据得到的现实对象双耳的音量差和时间差,以及所述原始声音数据,生成3D音频数据。可选的,所述3D声音生成模块在将所述3D音频数据播放之前,还可以根据现实对象的姿态信息,对所述3D音频数据进行滤波,以进一步提升3D声音的真实度。
所述输出器件中的显示屏用于显示AR图像。所述输出器件中的所述扬声器或耳机用于播放所述3D音频数据,这样,用户即可听到3D声音,并通过声源定位感知虚拟声源(即虚拟对象)所在的位置。
注意的是,所述电子设备可以实时或周期性地获取现实场景图像、电子设备的姿态信息,以及现实场景声音信息,以便可以实时或周期性地更新虚拟对象的视听反应,从而使虚拟对象能够根据感应到的现实世界快速地做出相应的视听反应,进而提高虚拟对象的真实度。
还需要说明的是,当所述电子设备的软件结构如图3所示时,以上AR交互方案中的现实对象位置姿态计算模块、存储模块、AR合成模块可以通过电子设备框架层中***服务内的AR管理服务实现,本方案中的虚拟对象模型库可以通过位于***库中的虚拟对象模型实现。另外,本方案中各个模块与采集器件和输出器件的交互可以通过客户端实现,例如应用程序层中的AR应用实现。显然,该方案可以通过后台***服务和前台客户端相结合的方式实现。
通过本申请实施例提供的AR交互方案,能够实现视觉听觉相互结合的虚实交互,从而给AR应用的用户带来更好的沉浸式体验。另外,通过结合各种识别技术,使虚拟角色可以感知到现实对象的位置及姿态,以及现实声源的位置,并基于以上信息做出相应的动作响应和声音响应,从而可以提升虚拟对象的智能水平,从而提高虚实交互的体验。另外,该方案可以采用服务端和客户端的架构实现,使客户端的可操作性更高,从而有利于开发者开发出丰富多彩的沉浸式AR应用。
基于图2和图3所示的电子设备,以及图4所示的AR交互方案,本申请还提供了多个实施例,能够分别实现图6所示的不同虚实视听交互。如图6所示,在以下实施例中,能够分别实现虚拟看现实、现实听虚拟、虚拟听现实等虚实视听交互。其中,虚拟看现实,即虚拟对象可以根据在其视野范围内的现实对象的位置与姿态信息,做出动作响应或声音响应;现实听虚拟,即虚拟对象可以根据现实对象的位置与姿态信息,虚拟对象的位置与姿态信息,发出3D声音,以便现实对象能够听到3D声音,从而通过声源定位安置该虚拟对象所在的位置;虚拟听现实,即虚拟对象可以根据通过麦克风采集的现实场景声音信息,确定现实声源的物理位置,并根据该现实声源的位置,做出动作响应或声音响应。
实施例一:视觉交互-虚拟看现实
下面参阅图7所示的AR交互流程图,以现实对象为人为例,对本实施例提供的AR交互方法进行详细说明。
S701:电子设备获取摄像头采集的现实场景图像,并获取通过运动传感器(IMU)采集的电子设备的姿态信息。
S702:所述电子设备根据现实场景图像、所述电子设备的姿态信息,构建以所述电子 设备为坐标原点的世界坐标系。示例性的,所述电子设备可以利用SLAM算法构建所述世界坐标系。
需要说明的是,所述电子设备可以在构建世界坐标系之后,后续直接使用该世界坐标系,直至该世界坐标系失效(例如,电子设备移动);或者周期性更新所述世界坐标系,该更新周期的长度本申请不作限定,该更新周期可以等于或大于现实场景图像的采集周期。
S703:所述电子设备判断现实对象与所述电子设备之间的距离是否大于设定阈值,若是,则执行S704;否则执行S705。
由于脸部识别技术需要看脸部细节,因此若现实对象与电子设备距离较远,那么现实场景图像中脸部细节无法展示不清楚,有可能导致识别结果不准确。而骨骼特征识别技术是基于人体整个骨骼特征实现的,即使现实对象与电子设备距离较远,也依然不会影响识别结果的准确性。
另外,所述电子设备可以采用多种方式判断所述现实对象与所述电子设备之间的距离。示例性的,所述电子设备可以通过内部的红外传感器、超声波传感器等期间检测所述现实对象与所述电子设备之间距离。
S704:当所述现实对象与所述电子设备之间的距离大于所述设定阈值时,所述电子设备采用骨骼特征识别技术,识别所述现实对象的头部。
S705:当所述现实对象与所述电子设备之间的距离小于或等于所述设定阈值时,所述电子设备采用脸部识别技术,识别所述现实对象的头部。
S706:所述电子设备将现实场景图像中的所述现实对象的头部与SLAM点云碰撞,根据碰撞值计算所述现实对象的头部在所述世界坐标系中的位置与姿态信息,例如图5B所示。本步骤的具体过程可以参考传统的SLAM点云碰撞方法,此处不再赘述。
可选的,所述电子设备还可以通过其他方法,确定所述现实对象的头部在所述世界坐标系中的位置与姿态信息。例如,当所述现实对象与所述电子设备之间的距离小于或等于所述设定阈值时,所述电子设备可以通过图像识别技术确定所述现实对象中头部的特征信息(例如眼睛、鼻子、耳朵、嘴巴中至少一项的特征信息);最后根据头部的特征信息,确定所述现实对象的位置与姿态信息。
S707:在AR应用设置虚拟对象视觉监听(即虚拟对象根据视觉类感应信息进行动作响应和/或声音响应)的情况下,所述电子设备根据现实对象的头部在世界坐标系中的位置与姿态信息,更新虚拟对象的动作响应或声音响应。
其中,所述虚拟对象的视觉类感应信息中包含:现实对象位置姿态计算模块计算的现实对象的位置以及姿态信息、现实对象的动作。
可选的,所述电子设备可以以现实场景图像的更新周期重复执行上述S703-S707(即按帧执行S703-S707),以实现虚拟对象可以根据实时的视觉类感应信息做出反应。
本步骤的具体过程可以参考图4所示的实施例中AR合成模块生成AR图像和生成3D音频数据的过程,此处不再赘述。
在本申请实施例中,电子设备可以根据摄像头拍摄的现实场景图像与IMU采集的电子设备的姿态信息,基于已有的SLAM技术构建以所述电子设备为坐标原点的世界坐标系;并将摄像头视野范围作为虚拟对象的视野范围,从而确定在虚拟对象的视野范围内的现实对象在该世界坐标系的位置与姿态信息,进而可以根据该现实对象的位置与姿态信息,使虚拟对象做出相应的动作。例如,真实的人在虚拟对象的可视范围内出现时,虚拟对象的 目光适应性追随的真实的人,和/或,跟随着真实的人运动。
实施例二:视觉交互-现实听虚拟
下面参阅图8所示的AR交互流程图,以现实对象为人为例,对本实施例提供的AR交互方法进行详细说明。
S801:电子设备可以以现实场景图像的更新周期更新现实对象的头部在世界坐标系中的位置与姿态信息,以及虚拟对象在世界坐标系中的位置与姿态信息,具体包括步骤S8011和S8012。
S8011:所述电子设备更新所述现实对象的头部在世界坐标系中的位置与姿态信息,具体过程可以参考图7所示的实施例中的S701-S706中描述的方法,此处不在赘述。
S8012:所述电子设备更新所述虚拟对象在世界坐标系中的位置与姿态信息。可选的,所述虚拟对象在世界坐标系中的位置与姿态信息可以是电子设备设定的,或者用户设定的,还可以是电子设备实时确定的。例如,所述电子设备可以参考图4所示的实施例中的AR图像生成模块执行的步骤A1,确定所述虚拟对象在世界坐标系中的位置以及姿态信息。
所述电子设备可以在AR应用设置虚拟对象的视觉监听(即虚拟对象根据视觉类感应信息进行声音响应)的情况下,执行以下步骤S802-S804。
S802:所述电子设备确定虚拟对象的原始声音数据。与S8012所述电子设备确定虚拟对象在世界坐标系中的位置与姿态信息类似的,所述虚拟对象的原始声音数据可以为电子设备设定的,或者用户设定的,还可以是电子设备实时确定的。例如,所述电子设备可以参考图4所示的实施例中3D声音生成模块执行的步骤B1,确定所述虚拟对象的原始声音数据。
S803:所述电子设备根据所述现实对象的头部在世界坐标系中的位置与姿态信息、虚拟对象在世界坐标系中的位置与姿态信息,以及虚拟对象的原始声音数据,生成3D音频数据,具体可以包括步骤S8031-S8034。
S8031:所述电子设备根据所述现实对象的头部在世界坐标系中的位置以及姿态信息,所述虚拟对象在世界坐标系中的位置以及姿态信息,分别确定虚拟对象与现实对象双耳之间的距离(即虚拟对象与现实对象的左耳之间的距离、虚拟对象与现实对象的右耳之间的距离)。
S8032:所述电子设备根据虚拟对象与现实对象双耳之间的距离,计算现实对象的双耳的音量差和时间差。
S8033:所述电子设备根据得到的现实对象双耳的音量差和时间差,以及所述原始声音数据,生成3D音频数据。
S8034:所述电子设备可以根据现实对象的头部的姿态信息,对所述3D音频数据进行滤波,以进一步提升3D声音的真实度。由于声音在传输媒介中传输的过程中,可以会经过各种反射、折射,因此,通过对所述3D音频数据进行滤波,可以模拟所述3D音频数据在传输过程中的反射和折射,从而使现实对象听到的3D声音更加真实。
S804:所述电子设备通过扬声器或耳机播放所述3D音频数据。现实对象可以听到3D声音,并根据3D声音进行声源定位从而感知虚拟对象的所在的位置。
在本申请实施例中,电子设备可以实时更新虚拟对象和现实对象的位置与姿态信息,从而可以根据二者之间的位置与姿态关系,合成3D音频数据,以使发出的3D声音更具有 真实度。例如,虚拟世界中的虚拟乐器可以通过以上方法发出3D声音,现实世界中的人可以通过耳机发出的3D声音,确定虚拟乐器在虚拟世界中的具***置。又例如,虚拟世界中存在多个虚拟人物,某个虚拟人物通过以上方法发出3D声音后,现实世界中的人可以通过耳机发出的3D声音判断声音来自哪个虚拟人物。再例如,在存在花鸟鱼虫山水的虚拟世界中,各个虚拟对象可以通过以上方法发出3D声音,这样现实世界中的人可以通过耳机发出的3D声音,确定虚拟世界中具体存在的虚拟对象,以及虚拟对象的位置等等,使现实世界中的人更加身临其境。
再例如,参阅图9所示,虚拟世界中的虚拟对象(虚拟猫)可以通过以上方法发送3D声音(猫叫声),现实世界中的人可以通过耳机发出的3D声音,确定虚拟猫位于沙发中间,而非茶几上,从而保证人通过耳机听到的与通过电子设备显示屏看到的一致。
参阅图10所示的AR交互业务实现流程图所示,所述电子设备可以但不限于通过以下函数实现图8所示的AR交互方法:原始声音计算函数、位姿计算函数、3D声音合成函数。其中,所述3D声音合成函数可以调用原始声音计算函数,得到虚拟对象的原始声音数据;并调用位姿计算函数,实时更新虚拟对象的位置及姿态信息、现实对象的位置及姿态信息;最终可以根据虚拟对象的原始声音数据、虚拟对象的位置及姿态信息、现实对象的位置及姿态信息,生成3D音频数据。
实施例三:听觉交互-虚拟听现实
下面参阅图11所示的AR交互流程图,对本申请实施例提供的AR交互方法进行详细说明。
S1101:电子设备构建以所述电子设备为坐标原点的世界坐标系,具体过程可以参考图7所示的实施例中的S701-S702中的描述,此处不再赘述。
S1102:所述电子设备通过麦克风实时采集现实场景声音信息,并通过声源定位确定现实声源在世界坐标系中的位置,具体可以包括步骤S11021和S11022。
S11021:所述电子设备通过麦克风实时采集现实场景声音信息,通过声源定位确定现实声源的物理位置。
其中,所述现实声源的物理位置能够表征所述现实声源与所述麦克风(即所述电子设备)之间的位置关系,例如所述现实声源位于所述电子设备的哪个方向,所述现实声源与所述电子设备之间的距离。
S11022:所述电子设备将所述现实声源的物理位置转换为所述现实声源在世界坐标系中的位置。
S1103:在AR应用设置虚拟对象听觉监听(即虚拟对象根据听觉类感应信息进行动作响应和/或声音响应)的情况下,所述电子设备根据现实声源在世界坐标系中的位置,更新虚拟对象的动作响应和/或声音响应。
其中,所述虚拟对象的听觉类感应信息中包含:现实声源在世界坐标系中的位置,现实声源发出的声音指令等。
本步骤的具体过程可以参考图4所示的实施例中AR合成模块生成AR图像和生成3D音频数据的过程,此处不再赘述。
通过本申请实施例,在现实世界中的现实声源发出声音时,虚拟世界中的虚拟对象可以识别现实声源的位置,并根据现实声源的位置做出动作响应或声音响应。例如,在现实 世界中的人发出吹气的声音时,虚拟世界中距离现实声源的位置较近的蒲公英的种子可以散开并做出随风飘荡的动作。又例如,在现实世界中的人发出呼喊声时,虚拟世界中距离现实声源的位置较近的鸟群可以做出飞向远方的动作。
需要说明的是,本申请提供的实施例一和实施例三可以单独实现,也可以结合实现,本申请对此不作限定。
示例性的,基于以上实施例,本申请还提供了一种AR交互方法,参阅图12所示,该方法包括以下步骤:
S1201:电子设备确定现实场景中现实对象的位姿信息和/或现实声源的位置,其中,所述现实对象的位姿信息用于表征所述现实对象的位置和姿态。
在一种实施方式中,所述电子设备可以通过以下步骤,确定所述现实对象的位姿信息:
所述电子设备获取摄像头采集的第二现实场景图像,识别所述第二现实场景图像中的现实对象的关键部位;所述电子设备根据同步定位与地图构建SLAM点云碰撞技术,确定所述第二现实场景图像中所述现实对象的关键部位的位姿信息;所述电子设备将所述现实对象的关键部位的位姿信息作为所述现实对象的位姿信息。
在一种实施方式中,所述电子设备可以通过以下步骤,确定所述现实声源的位置:
所述电子设备获取麦克风采集的现实场景声音信息,通过声源定位确定所述现实声源与所述麦克风之间的位置关系;所述电子设备根据所述现实声源与所述麦克风之间的位置关系,确定所述现实场景中现实声源的位置。
以上确定过程的详细描述可以参考以上实施例中的描述,此处不再赘述。
S1202:所述电子设备根据所述现实对象的位姿信息和/或现实声源的位置,确定虚拟对象的位姿信息,其中,所述虚拟对象的位姿信息用于表征所述虚拟对象的位置和姿态。
在一种实施方式中,所述电子设备在执行S1201之前,还可以建立以所述电子设备为坐标原点的世界坐标系。示例性的,所述电子设备可以通过以下步骤建立所述世界坐标系:获取摄像头采集的第一现实场景图像,以及获取惯性测量单元测量的所述电子设备的姿态信息;根据所述第一现实场景图像,以及所述电子设备的姿态信息,建立以所述世界坐标系。
在该实施方式下,为了提高S1201和S1202中所述电子设备确定的各项信息的准确性,所述现实对象的位姿信息具体用于表征所述现实对象在所述世界坐标系中的位置和姿态;所述现实声源的位置为所述现实声源在所述世界坐标系中的位置;所述虚拟对象的位姿信息具体用于表征所述虚拟对象在所述世界坐标系中的位置和姿态。
可选的,所述电子设备可以根据所述现实对象的位姿信息和/或现实声源的位置,以及以下至少一项确定所述虚拟对象的位姿信息:通过所述现实对象的位姿信息识别的所述现实对象的动作,所述现实声源发出的声音指令,所述虚拟对象对应的虚拟对象模型中的形象模型、动作特征、动作响应包,以及用户通过显示屏对所述虚拟对象的触控操作。
S1203:所述电子设备根据所述虚拟对象的位姿信息,生成包含所述虚拟对象的AR图像;显示所述AR图像;和/或根据所述现实对象的位姿信息和所述虚拟对象的位姿信息,生成所述虚拟对象的3D音频数据;播放所述3D音频数据。
在一种实施方式中,所述电子设备可以通过以下步骤,生成所述虚拟对象的3D音频数据:
所述电子设备根据所述现实对象的位姿信息和所述虚拟对象的位姿信息,分别确定所述虚拟对象与所述现实对象的双耳之间的距离;
所述电子设备根据所述虚拟对象与所述现实对象的双耳之间的距离,计算所述现实对象的双耳的音量差和时间差;
所述电子设备根据所述现实对象的音量差和时间差,以及所述虚拟对象的原始声音数据,生成3D音频数据。
可选的,在本实施方式中,所述原始声音数据为设定的,或者为根据以下至少一项确定的:所述现实对象的位置以及姿态信息,所述虚拟对象的位姿信息,通过所述现实对象的位姿信息识别的所述现实对象的动作,所述现实声源发出的声音指令,所述虚拟对象对应的虚拟对象模型中的形象模型、动作特征、动作响应包,以及用户通过显示屏对所述虚拟对象的触控操作。
可选的,所述电子设备在播放所述3D音频数据之前,还可以根据所述现实对象的位姿信息,对所述3D音频数据进行滤波,从而可以模拟所述3D音频数据在传输过程中的反射和折射,从而使现实对象听到的3D声音更加真实。
本申请实施例提供了一种AR交互方法,在该方法中,电子设备可以根据现实对象的位姿信息或现实声源的位置,确定虚拟对象的位姿信息,从而可以生成并显示AR图像,另外还可以根据现实对象的位姿信息和虚拟对象的位姿信息,生成并播放虚拟对象的3D声音数据。显然,通过该方案,虚拟对象可以感知到现实对象的位姿,以及现实声源的位置,并基于以上信息做出相应的动作响应或声音响应,从而使虚拟对象可以像现实对象一样。因此,该方法实现虚拟对象与现实对象的视觉听觉相结合的虚实交互,可以提升虚拟对象的智能水平,进而提高虚拟交互的体验。
基于相同的技术构思,本申请还提供了一种电子设备,所述电子设备用于实现以上实施例提供的方法。参阅图13所示,所述电子设备1300可以包括:确定单元1301,AR合成单元1302、显示单元1303,音频单元1304。
所述显示单元1303,用于呈现用户界面(例如AR图像),实现人机交互。与图2所示的电子设备中的显示单元204相同,所述显示单元1303包括显示面板,该显示面板又称为显示屏。
所述音频单元1304,用于采集声音信号和播放音频数据。与图2所示的电子设备中的音频电路206类似的,所述音频单元1304中可以包括扬声器和麦克风。
下面对在实现以上实施例提供的方法时各个单元的功能进行描述。
确定单元1301,用于确定现实场景中现实对象的位姿信息和/或现实声源的位置,其中,所述现实对象的位姿信息用于表征所述现实对象的位置和姿态;以及根据所述现实对象的位姿信息和/或现实声源的位置,确定虚拟对象的位姿信息,其中,所述虚拟对象的位姿信息用于表征所述虚拟对象的位置和姿态;
AR合成单元1302,用于根据所述虚拟对象的位姿信息,生成包含所述虚拟对象的AR图像;和/或,根据所述现实对象的位姿信息和所述虚拟对象的位姿信息,生成所述虚拟对象的3维3D音频数据;
显示单元1303,用于显示所述AR图像;
音频单元1304,用于播放所述3D音频数据。
在一种实施方式中,所述确定单元1301,还用于:
建立以所述电子设备为坐标原点的世界坐标系;
所述现实对象的位姿信息具体用于表征所述现实对象在所述世界坐标系中的位置和姿态;
所述现实声源的位置为所述现实声源在所述世界坐标系中的位置;
所述虚拟对象的位姿信息具体用于表征所述虚拟对象在所述世界坐标系中的位置和姿态。
在一种实施方式中,在所述电子设备1300还包含摄像头和惯性测量单元的情况下,所述确定单元1301,在建立以所述电子设备为坐标原点的世界坐标系时,具体用于:
获取摄像头采集的第一现实场景图像,以及获取惯性测量单元测量的所述电子设备的姿态信息;
根据所述第一现实场景图像,以及所述电子设备的姿态信息,建立以所述世界坐标系。
在一种实施方式中,在所述电子设备1300还包含摄像头的情况下,所述确定单元1301,在确定现实场景中的现实对象的位姿信息时,具体用于:
获取摄像头采集的第二现实场景图像,识别所述第二现实场景图像中的现实对象的关键部位;
根据同步定位与地图构建SLAM点云碰撞技术,确定所述第二现实场景图像中所述现实对象的关键部位的位姿信息;
将所述现实对象的关键部位的位姿信息作为所述现实对象的位姿信息。
在一种实施方式中,在所述音频单元1304中还包含麦克风的情况下,所述确定单元1301,在确定现实场景中现实声源的位置时,具体用于:
获取麦克风采集的现实场景声音信息,通过声源定位确定所述现实声源与所述麦克风之间的位置关系;
根据所述现实声源与所述麦克风之间的位置关系,确定所述现实场景中现实声源的位置。
在一种实施方式中,所述AR合成单元1302,在根据所述现实对象的位姿信息和/或现实声源的位置,确定虚拟对象的位姿信息时,具体用于:
根据所述现实对象的位姿信息和/或现实声源的位置,以及以下至少一项确定所述虚拟对象的位姿信息:通过所述现实对象的位姿信息识别的所述现实对象的动作,所述现实声源发出的声音指令,所述虚拟对象对应的虚拟对象模型中的形象模型、动作特征、动作响应包,以及用户通过显示屏对所述虚拟对象的触控操作。
在一种实施方式中,所述AR合成单元1302,在根据所述现实对象的位姿信息和所述虚拟对象的位姿信息,生成所述虚拟对象的3D音频数据时,具体用于:
根据所述现实对象的位姿信息和所述虚拟对象的位姿信息,分别确定所述虚拟对象与所述现实对象的双耳之间的距离;
根据所述虚拟对象与所述现实对象的双耳之间的距离,计算所述现实对象的双耳的音量差和时间差;
根据所述现实对象的音量差和时间差,以及所述虚拟对象的原始声音数据,生成3D音频数据。
在一种实施方式中,所述原始声音数据为设定的,或者为根据以下至少一项确定的:
所述现实对象的位置以及姿态信息,所述虚拟对象的位姿信息,通过所述现实对象的位姿信息识别的所述现实对象的动作,所述现实声源发出的声音指令,所述虚拟对象对应的虚拟对象模型中的形象模型、动作特征、动作响应包,以及用户通过显示屏对所述虚拟对象的触控操作。
在一种实施方式中,所述AR合成单元1302,还用于:
在所述音频单元1304播放所述3D音频数据之前,根据所述现实对象的位姿信息,对所述3D音频数据进行滤波。
需要说明的是,本申请实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或处理器(processor)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
基于相同的技术构思,本申请实施例还提供了一种电子设备,所述电子设备用于实现以上实施例提供的AR交互方法,具有如图13所示的电子设备的功能。参阅图14所示,所述电子设备1400中可以包括:处理器1401、存储器1402、摄像头1403、显示屏1404,音频电路1405。当然所述电子设备1400中还可以具有图2所示的各个***或内部硬件。
其中,所述处理器1401与其他部件之间相互连接。可选的,所述处理器1401与其他部件之间可以总线相互连接。所述总线可以是外设部件互连标准(peripheral component interconnect,PCI)总线或扩展工业标准结构(extended industry standard architecture,EISA)总线等。所述总线可以分为地址总线、数据总线、控制总线等。为便于表示,图14中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。
所述处理器1401,用于实现如以上实施例提供的AR交互方法,包括:
确定现实场景中现实对象的位姿信息和/或现实声源的位置,其中,所述现实对象的位姿信息用于表征所述现实对象的位置和姿态;
根据所述现实对象的位姿信息和/或现实声源的位置,确定虚拟对象的位姿信息,其中,所述虚拟对象的位姿信息用于表征所述虚拟对象的位置和姿态;
根据所述虚拟对象的位姿信息,生成包含所述虚拟对象的AR图像;通过所述显示屏1404显示所述AR图像;和/或根据所述现实对象的位姿信息和所述虚拟对象的位姿信息,生成所述虚拟对象的3维3D音频数据;通过所述音频电路1405播放所述3D音频数据。
以上步骤中的具体过程可以参见上述实施例中的描述,此处不再赘述。
所述存储器1402,用于存放程序指令和数据等。具体地,程序指令可以包括程序代码,该程序代码包括计算机操作的指令。存储器1402可能包含随机存取存储器(random access  memory,RAM),也可能还包括非易失性存储器(non-volatile memory),例如至少一个磁盘存储器。所述处理器1401执行所述存储器1402所存放的程序,并通过上述各个部件,实现上述功能,从而最终实现以上实施例提供的方法。
基于以上实施例,本申请实施例还提供了一种计算机程序,当所述计算机程序在计算机上运行时,使得所述计算机执行以上实施例提供的方法。
基于以上实施例,本申请实施例还提供了一种计算机存储介质,该计算机存储介质中存储有计算机程序,所述计算机程序被计算机执行时,使得计算机执行以上实施例提供的方法。
基于以上实施例,本申请实施例还提供了一种芯片,所述芯片用于读取存储器中存储的计算机程序,实现以上实施例提供的方法。
基于以上实施例,本申请实施例提供了一种芯片***,该芯片***包括处理器,用于支持计算机装置实现以上实施例提供的方法中电子设备所涉及的功能。在一种可能的设计中,所述芯片***还包括存储器,所述存储器用于保存该计算机装置必要的程序和数据。该芯片***,可以由芯片构成,也可以包含芯片和其他分立器件。
综上所示,本申请提供了一种AR交互方法及电子设备。在该方案中,电子设备可以根据现实对象的位姿信息或现实声源的位置,确定虚拟对象的位姿信息,从而可以生成并显示AR图像,另外还可以根据现实对象的位姿信息和虚拟对象的位姿信息,生成并播放虚拟对象的3D声音数据。显然,通过该方案,虚拟对象可以感知到现实对象的位姿、以及现实声源的位置,并基于以上信息做出相应的动作响应或声音响应,从而使虚拟对象可以像现实对象一样。因此,该方法实现虚拟对象与现实对象的视觉听觉相结合的虚实交互,可以提升虚拟对象的智能水平,进而提高虚拟交互的体验。
本领域内的技术人员应明白,本申请的实施例可提供为方法、***、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本申请是参照根据本申请的方法、设备(***)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
显然,本领域的技术人员可以对本申请进行各种改动和变型而不脱离本申请的精神和范围。这样,倘若本申请的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包含这些改动和变型在内。

Claims (21)

  1. 一种增强现实AR交互方法,应用于电子设备,其特征在于,包括:
    确定现实场景中现实对象的位姿信息和/或现实声源的位置,其中,所述现实对象的位姿信息用于表征所述现实对象的位置和姿态;
    根据所述现实对象的位姿信息和/或现实声源的位置,确定虚拟对象的位姿信息,其中,所述虚拟对象的位姿信息用于表征所述虚拟对象的位置和姿态;
    根据所述虚拟对象的位姿信息,生成包含所述虚拟对象的AR图像;显示所述AR图像;和/或
    根据所述现实对象的位姿信息和所述虚拟对象的位姿信息,生成所述虚拟对象的3维3D音频数据;播放所述3D音频数据。
  2. 如权利要求1所述的方法,其特征在于,所述方法还包括:
    建立以所述电子设备为坐标原点的世界坐标系;
    所述现实对象的位姿信息具体用于表征所述现实对象在所述世界坐标系中的位置和姿态;
    所述现实声源的位置为所述现实声源在所述世界坐标系中的位置;
    所述虚拟对象的位姿信息具体用于表征所述虚拟对象在所述世界坐标系中的位置和姿态。
  3. 如权利要求2所述的方法,其特征在于,建立以所述电子设备为坐标原点的世界坐标系,包括:
    获取摄像头采集的第一现实场景图像,以及获取惯性测量单元测量的所述电子设备的姿态信息;
    根据所述第一现实场景图像,以及所述电子设备的姿态信息,建立以所述世界坐标系。
  4. 如权利要求1-3任一项所述的方法,其特征在于,确定现实场景中的现实对象的位姿信息,包括:
    获取摄像头采集的第二现实场景图像,识别所述第二现实场景图像中的现实对象的关键部位;
    根据同步定位与地图构建SLAM点云碰撞技术,确定所述第二现实场景图像中所述现实对象的关键部位的位姿信息;
    将所述现实对象的关键部位的位姿信息作为所述现实对象的位姿信息。
  5. 如权利要求1-4任一项所述的方法,其特征在于,确定现实场景中现实声源的位置,包括:
    获取麦克风采集的现实场景声音信息,通过声源定位确定所述现实声源与所述麦克风之间的位置关系;
    根据所述现实声源与所述麦克风之间的位置关系,确定所述现实场景中现实声源的位置。
  6. 如权利要求1-5任一项所述的方法,其特征在于,根据所述现实对象的位姿信息和/或现实声源的位置,确定虚拟对象的位姿信息,包括:
    根据所述现实对象的位姿信息和/或现实声源的位置,以及以下至少一项确定所述虚拟对象的位姿信息:通过所述现实对象的位姿信息识别的所述现实对象的动作,所述现实声 源发出的声音指令,所述虚拟对象对应的虚拟对象模型中的形象模型、动作特征、动作响应包,以及用户通过显示屏对所述虚拟对象的触控操作。
  7. 如权利要求1-6任一项所述的方法,其特征在于,根据所述现实对象的位姿信息和所述虚拟对象的位姿信息,生成所述虚拟对象的3D音频数据,包括:
    根据所述现实对象的位姿信息和所述虚拟对象的位姿信息,分别确定所述虚拟对象与所述现实对象的双耳之间的距离;
    根据所述虚拟对象与所述现实对象的双耳之间的距离,计算所述现实对象的双耳的音量差和时间差;
    根据所述现实对象的音量差和时间差,以及所述虚拟对象的原始声音数据,生成3D音频数据。
  8. 如权利要求7所述的方法,其特征在于,所述原始声音数据为设定的,或者为根据以下至少一项确定的:
    所述现实对象的位置以及姿态信息,所述虚拟对象的位姿信息,通过所述现实对象的位姿信息识别的所述现实对象的动作,所述现实声源发出的声音指令,所述虚拟对象对应的虚拟对象模型中的形象模型、动作特征、动作响应包,以及用户通过显示屏对所述虚拟对象的触控操作。
  9. 如权利要求1-8任一项所述的方法,其特征在于,在播放所述3D音频数据之前,所述方法还包括:
    根据所述现实对象的位姿信息,对所述3D音频数据进行滤波。
  10. 一种电子设备,其特征在于,包括:
    确定单元,用于确定现实场景中现实对象的位姿信息和/或现实声源的位置,其中,所述现实对象的位姿信息用于表征所述现实对象的位置和姿态;以及根据所述现实对象的位姿信息和/或现实声源的位置,确定虚拟对象的位姿信息,其中,所述虚拟对象的位姿信息用于表征所述虚拟对象的位置和姿态;
    AR合成单元,用于根据所述虚拟对象的位姿信息,生成包含所述虚拟对象的AR图像;和/或,根据所述现实对象的位姿信息和所述虚拟对象的位姿信息,生成所述虚拟对象的3维3D音频数据;
    显示单元,用于显示所述AR图像;
    音频单元,用于播放所述3D音频数据。
  11. 如权利要求10所述的电子设备,其特征在于,所述确定单元,还用于:
    建立以所述电子设备为坐标原点的世界坐标系;
    所述现实对象的位姿信息具体用于表征所述现实对象在所述世界坐标系中的位置和姿态;
    所述现实声源的位置为所述现实声源在所述世界坐标系中的位置;
    所述虚拟对象的位姿信息具体用于表征所述虚拟对象在所述世界坐标系中的位置和姿态。
  12. 如权利要求11所述的电子设备,其特征在于,所述确定单元,在建立以所述电子设备为坐标原点的世界坐标系时,具体用于:
    获取摄像头采集的第一现实场景图像,以及获取惯性测量单元测量的所述电子设备的姿态信息;
    根据所述第一现实场景图像,以及所述电子设备的姿态信息,建立以所述世界坐标系。
  13. 如权利要求10-12任一项所述的电子设备,其特征在于,所述确定单元,在确定现实场景中的现实对象的位姿信息时,具体用于:
    获取摄像头采集的第二现实场景图像,识别所述第二现实场景图像中的现实对象的关键部位;
    根据同步定位与地图构建SLAM点云碰撞技术,确定所述第二现实场景图像中所述现实对象的关键部位的位姿信息;
    将所述现实对象的关键部位的位姿信息作为所述现实对象的位姿信息。
  14. 如权利要求10-13任一项所述的电子设备,其特征在于,所述确定单元,在确定现实场景中现实声源的位置时,具体用于:
    获取麦克风采集的现实场景声音信息,通过声源定位确定所述现实声源与所述麦克风之间的位置关系;
    根据所述现实声源与所述麦克风之间的位置关系,确定所述现实场景中现实声源的位置。
  15. 如权利要求10-14任一项所述的电子设备,其特征在于,所述AR合成单元,在根据所述现实对象的位姿信息和/或现实声源的位置,确定虚拟对象的位姿信息时,具体用于:
    根据所述现实对象的位姿信息和/或现实声源的位置,以及以下至少一项确定所述虚拟对象的位姿信息:通过所述现实对象的位姿信息识别的所述现实对象的动作,所述现实声源发出的声音指令,所述虚拟对象对应的虚拟对象模型中的形象模型、动作特征、动作响应包,以及用户通过显示屏对所述虚拟对象的触控操作。
  16. 如权利要求10-15任一项所述的电子设备,其特征在于,所述AR合成单元,在根据所述现实对象的位姿信息和所述虚拟对象的位姿信息,生成所述虚拟对象的3D音频数据时,具体用于:
    根据所述现实对象的位姿信息和所述虚拟对象的位姿信息,分别确定所述虚拟对象与所述现实对象的双耳之间的距离;
    根据所述虚拟对象与所述现实对象的双耳之间的距离,计算所述现实对象的双耳的音量差和时间差;
    根据所述现实对象的音量差和时间差,以及所述虚拟对象的原始声音数据,生成3D音频数据。
  17. 如权利要求16所述的电子设备,其特征在于,所述原始声音数据为设定的,或者为根据以下至少一项确定的:
    所述现实对象的位置以及姿态信息,所述虚拟对象的位姿信息,通过所述现实对象的位姿信息识别的所述现实对象的动作,所述现实声源发出的声音指令,所述虚拟对象对应的虚拟对象模型中的形象模型、动作特征、动作响应包,以及用户通过显示屏对所述虚拟对象的触控操作。
  18. 如权利要求10-17任一项所述的电子设备,其特征在于,所述AR合成单元,还用于:
    在所述音频单元播放所述3D音频数据之前,根据所述现实对象的位姿信息,对所述3D音频数据进行滤波。
  19. 一种电子设备,其特征在于,所述电子设备包括:显示屏、处理器,以及存储器; 其中,所述存储器存储有计算机程序,所述计算机程序包括指令,当所述指令被所述处理器执行时,使得所述电子设备执行如权利要求1-9任一项所述的方法。
  20. 一种计算机存储介质,其特征在于,所述计算机存储介质中存储有计算机程序,当计算机程序在电子设备上运行时,使得所述电子设备执行如权利要求1-9任一项所述的方法。
  21. 一种芯片,其特征在于,所述芯片用于读取存储器中存储的计算机程序,执行如权利要求1-9任一项所述的方法。
PCT/CN2021/140320 2020-12-31 2021-12-22 一种增强现实交互方法及电子设备 WO2022143322A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP21914061.3A EP4254353A4 (en) 2020-12-31 2021-12-22 INTERACTION METHOD FOR AUGMENTED REALITY AND ELECTRONIC DEVICE
US18/344,299 US20230345196A1 (en) 2020-12-31 2023-06-29 Augmented reality interaction method and electronic device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011627072.0 2020-12-31
CN202011627072.0A CN114693890A (zh) 2020-12-31 2020-12-31 一种增强现实交互方法及电子设备

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/344,299 Continuation US20230345196A1 (en) 2020-12-31 2023-06-29 Augmented reality interaction method and electronic device

Publications (1)

Publication Number Publication Date
WO2022143322A1 true WO2022143322A1 (zh) 2022-07-07

Family

ID=82135246

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/140320 WO2022143322A1 (zh) 2020-12-31 2021-12-22 一种增强现实交互方法及电子设备

Country Status (4)

Country Link
US (1) US20230345196A1 (zh)
EP (1) EP4254353A4 (zh)
CN (1) CN114693890A (zh)
WO (1) WO2022143322A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112860072A (zh) * 2021-03-16 2021-05-28 河南工业职业技术学院 一种虚拟现实多人交互协作方法与***
CN116610282A (zh) * 2023-07-18 2023-08-18 北京万物镜像数据服务有限公司 一种数据处理方法、装置及电子设备

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117579804B (zh) * 2023-11-17 2024-05-14 广东筠诚建筑科技有限公司 一种基于ar的装配式建筑构件预布局体验方法及装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030032484A1 (en) * 1999-06-11 2003-02-13 Toshikazu Ohshima Game apparatus for mixed reality space, image processing method thereof, and program storage medium
CN108681402A (zh) * 2018-05-16 2018-10-19 Oppo广东移动通信有限公司 识别交互方法、装置、存储介质及终端设备
CN111880659A (zh) * 2020-07-31 2020-11-03 北京市商汤科技开发有限公司 虚拟人物控制方法及装置、设备、计算机可读存储介质
CN111897431A (zh) * 2020-07-31 2020-11-06 北京市商汤科技开发有限公司 展示方法及装置、显示设备、计算机可读存储介质
CN111949112A (zh) * 2019-05-14 2020-11-17 Oppo广东移动通信有限公司 对象交互方法及装置、***、计算机可读介质和电子设备

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9037468B2 (en) * 2008-10-27 2015-05-19 Sony Computer Entertainment Inc. Sound localization for user in motion
US9901828B2 (en) * 2010-03-30 2018-02-27 Sony Interactive Entertainment America Llc Method for an augmented reality character to maintain and exhibit awareness of an observer
EP3116616B1 (en) * 2014-03-14 2019-01-30 Sony Interactive Entertainment Inc. Gaming device with volumetric sensing
US9998847B2 (en) * 2016-11-17 2018-06-12 Glen A. Norris Localizing binaural sound to objects
JP7252965B2 (ja) * 2018-02-15 2023-04-05 マジック リープ, インコーポレイテッド 複合現実のための二重聴取者位置
US20200367008A1 (en) * 2019-05-15 2020-11-19 Dts, Inc. System and method for rendering virtual sound sources

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030032484A1 (en) * 1999-06-11 2003-02-13 Toshikazu Ohshima Game apparatus for mixed reality space, image processing method thereof, and program storage medium
CN108681402A (zh) * 2018-05-16 2018-10-19 Oppo广东移动通信有限公司 识别交互方法、装置、存储介质及终端设备
CN111949112A (zh) * 2019-05-14 2020-11-17 Oppo广东移动通信有限公司 对象交互方法及装置、***、计算机可读介质和电子设备
CN111880659A (zh) * 2020-07-31 2020-11-03 北京市商汤科技开发有限公司 虚拟人物控制方法及装置、设备、计算机可读存储介质
CN111897431A (zh) * 2020-07-31 2020-11-06 北京市商汤科技开发有限公司 展示方法及装置、显示设备、计算机可读存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4254353A4 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112860072A (zh) * 2021-03-16 2021-05-28 河南工业职业技术学院 一种虚拟现实多人交互协作方法与***
CN116610282A (zh) * 2023-07-18 2023-08-18 北京万物镜像数据服务有限公司 一种数据处理方法、装置及电子设备
CN116610282B (zh) * 2023-07-18 2023-11-03 北京万物镜像数据服务有限公司 一种数据处理方法、装置及电子设备

Also Published As

Publication number Publication date
EP4254353A4 (en) 2024-05-08
US20230345196A1 (en) 2023-10-26
EP4254353A1 (en) 2023-10-04
CN114693890A (zh) 2022-07-01

Similar Documents

Publication Publication Date Title
JP7366196B2 (ja) 広範囲同時遠隔ディジタル提示世界
WO2022143322A1 (zh) 一种增强现实交互方法及电子设备
CN112379812B (zh) 仿真3d数字人交互方法、装置、电子设备及存储介质
CN105378801B (zh) 全息图快照网格
JP2020537849A (ja) 複合現実空間オーディオ
WO2020186988A1 (zh) 资讯的展示方法、装置、终端及存储介质
US10841534B2 (en) Real-world awareness for virtual reality
CN108694073B (zh) 虚拟场景的控制方法、装置、设备及存储介质
KR20160113666A (ko) 오디오 탐색 지원
CN109725956A (zh) 一种场景渲染的方法以及相关装置
JP6656382B2 (ja) マルチメディア情報を処理する方法及び装置
WO2023207174A1 (zh) 显示方法、装置、显示设备、头戴式设备及存储介质
WO2021254113A1 (zh) 一种三维界面的控制方法和终端
CN114630135A (zh) 一种直播互动方法及装置
US20130215010A1 (en) Portable electronic equipment and method of visualizing sound
US20230199420A1 (en) Real-world room acoustics, and rendering virtual objects into a room that produce virtual acoustics based on real world objects in the room
US20230154445A1 (en) Spatial music creation interface
KR102177734B1 (ko) 가상 현실에서의 홀드된 객체 안정화
US10325408B2 (en) Method and device for presenting multimedia information
CN112912822A (zh) 在混合现实环境中控制支持音频的连接设备的***
KR102405385B1 (ko) 3d 컨텐츠를 위한 여러 오브젝트를 생성하는 방법 및 시스템
CN116437284A (zh) 空间音频合成方法、电子设备及计算机可读存储介质
US20240179291A1 (en) Generating 3d video using 2d images and audio with background keyed to 2d image-derived metadata
CN109447896B (zh) 一种图像处理方法及终端设备
US20240236608A1 (en) Transforming computer game audio using impulse response of a virtual 3d space generated by nerf input to a convolutional reverberation engine

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21914061

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021914061

Country of ref document: EP

Effective date: 20230626

NENP Non-entry into the national phase

Ref country code: DE