CN115309301A - Android mobile phone end-side AR interaction system based on deep learning - Google Patents

Android mobile phone end-side AR interaction system based on deep learning Download PDF

Info

Publication number
CN115309301A
CN115309301A CN202210541388.0A CN202210541388A CN115309301A CN 115309301 A CN115309301 A CN 115309301A CN 202210541388 A CN202210541388 A CN 202210541388A CN 115309301 A CN115309301 A CN 115309301A
Authority
CN
China
Prior art keywords
depth
model
mobile phone
image
phone end
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210541388.0A
Other languages
Chinese (zh)
Inventor
戴玉超
朱睿杰
项末初
卢馨悦
徐智鸿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwestern Polytechnical University
Original Assignee
Northwestern Polytechnical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwestern Polytechnical University filed Critical Northwestern Polytechnical University
Priority to CN202210541388.0A priority Critical patent/CN115309301A/en
Publication of CN115309301A publication Critical patent/CN115309301A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0481Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
    • G06F3/04815Interaction with a metaphor-based environment or interaction object displayed as three-dimensional, e.g. changing the user viewpoint with respect to the environment or object
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an Android Mobile phone end face AR interaction system based on deep learning, which comprises a Mobile phone with a camera, wherein the Mobile phone camera collects original color image data, processes an image stream in real time by calling an API (application program interface) of the camera, trains an efficient and robust lightweight depth estimation neural network model by using a Pythrch Mobile deep learning framework, runs neural network reasoning on the Mobile phone end face by using the limited computing power of the Mobile phone, and generates a prediction depth map corresponding to the original image data. And combining the original image and the predicted Depth map, and realizing the Android mobile phone end-side AR interaction system independent of the Depth API by utilizing the AR interaction function and the Unity development example of the ARCore Depth Lab.

Description

Android mobile phone end-side AR interaction system based on deep learning
Technical Field
The invention relates to the field of three-dimensional scene perception, in particular to an Android mobile phone end-side AR interaction system based on deep learning.
Background
In recent years, with the rapid development of deep learning and neural network technology, the related applications in the field of computer vision have been advanced dramatically. Meanwhile, people have an increasing need for entertainment of vision-related mobile phone applications. People are no longer satisfied with interacting with scenes in simple two-dimensional images and begin to expect deeper interaction with stereoscopic three-dimensional scenes. In the process of realizing the interaction with the three-dimensional scene, the depth estimation is used as a key ring of three-dimensional perception, and plays a vital role. When the traditional camera equipment shoots images and videos, only limited 2D image information can be obtained, depth information in a real three-dimensional world is lacked, and the defects of high cost, large size and the like exist when distance measuring equipment such as a radar, an RGBD camera and the like is adopted. In addition, the current monocular depth estimation algorithm with higher precision generally depends on a high-performance computational power environment, a better depth estimation effect is difficult to obtain in a non-ideal experimental environment, and the algorithm cannot be well deployed to a mobile terminal, so that the popularization and application limitations of the algorithm are exposed. Therefore, an interactive system which does not depend on a high-performance computing environment and distance measuring equipment and can be directly deployed on a mobile terminal to realize a real-time 3D scene interactive function has a great application prospect.
The existing two-dimensional video special effect technology such as the special effect technology on short video editors like Tik Tok has certain limitation on the effect of video secondary creation. For example, when a user wants to add a special effect of a specific scene to a video (such as snow), the conventional two-dimensional video technology can only stitch a static two-dimensional picture with a character, which is hard and disadvantageous to the effect of the video. The invention can directly construct a 3D scene according to the depth estimation result, and add a simulated special effect, thereby better reflecting the depth level change of the environment in the video, leading the video to be more real, vivid and improving the film watching experience of the video.
The invention aims to utilize a light monocular depth estimation network to calculate the scene depth in real time in an AR scene at the mobile phone end side under the condition of limited calculation power at the mobile phone end, and restore the real scene to the maximum extent. On the basis, the Unity rendering engine and the like are used for manufacturing the special effect, and the invention can realize the interaction effect between people and the environment by setting the virtual object in the real environment.
Disclosure of Invention
The invention aims to obtain more accurate depth information from simple 2D video input by applying a more mature algorithm training model, solve the depth estimation problem under a monocular camera system, solve the defects of precision and efficiency of monocular depth estimation under the traditional method, provide a light monocular depth estimation network with good robustness, high precision and high efficiency, break through the dependence of the current high-precision monocular depth estimation algorithm on a high-performance example environment, pay attention to practical application, and explore the possibility of applying the method to AR and VR scenes at the mobile phone end. Besides meeting the entertainment requirements, the novel medical robot has wide application prospects in future automatic driving, intelligent medical treatment and military operation.
In order to achieve the purpose, the invention provides the following technical scheme: and finally, based on the assistance of the depth information, a Unity software is used for making a three-dimensional special effect so as to realize the generation of a virtual object at an accurate position and man-machine interaction, so as to face an AR/VR practical application scene. The method and the system deploy the algorithm to the mobile phone side through android development and by combining a Pythrch mobile framework, and realize real-time interaction at the mobile phone side.
Specifically, the method comprises the following steps:
a) Acquisition of training/test data: performing large-scale network training by adopting an open source data set such as NYU-Depth V2, shooting videos indoors by adopting a Kinect DK camera, automatically generating a Depth map as supervision information, and taking the videos shot by a monocular camera as an input test sample;
b) Designing a monocular depth estimation algorithm: and constructing and applying by adopting an AR Core framework, using parameters returned by the AR Core as initial values of camera parameters, and adjusting corresponding parameters by combining a network to obtain a camera pose as a basis of geometric constraint of interframe depth estimation. Designing a loss function of the network on the basis of a main network for deep prediction by using a pre-trained lightweight network EfficientNet, and training on a data set;
c) Evaluating a monocular depth estimation algorithm: the real depth value of the training data set is used as a supervision signal of the model, the prediction result of the model is compared, the minimum loss function of the model is constructed, meanwhile, the capability of providing reasonable regularization at the part with less constraint is reserved, and accurate depth information is obtained to achieve the interaction effect;
d) Deployment algorithm on the end side: the Unity is used as an auxiliary tool for development, after the depth information of the neural network is inferred, the information is imported into a Unity module, a scene is reconstructed through an algorithm, unity software is used for adding a special effect, and the Unity module is deployed on a Mobile terminal of the Mobile phone by utilizing Pythroch Mobile.
Preferably, the mobile phone system is an Android system and the version is Android 8 or more.
Preferably, the mobile end side chip is of a high-pass Snapdragon 865 model or above, and a CPU or a GPU can be used to complete neural network reasoning, thereby realizing high-frame-rate operation.
Preferably, the lightweight depth estimation model deployed at the Mobile phone end is a method for creating a serializable and optimizable model from pytorque codes through torchscript after training at the server end is completed, model conversion and model optimization are performed, the converted model is in a ptl format and comprises model weights and a model interpreter, and meanwhile, through model optimization of a PyTorch Mobile module, the average inference speed of the optimized model is improved by 60% compared with that before optimization.
Preferably, the lightweight depth estimation method for deployment of the lightweight depth estimation model at the mobile phone end side includes the following implementation steps:
s1.1: training a model on a server, and training model weights by adopting a depth data set;
s1.2: model inference device is converted by Pythrch Mobile and stored
Figure BDA0003648480860000031
S1.3: model reasoner on Android Studio software through Java programming
Figure BDA0003648480860000032
Leading into an ARCore module;
s1.4: call handset camera API, acquiring an image stream I = { I = { [ I ] 1 ,I 2 ,…,I n And extract the current frame I n As input I of RGB image RGB
S1.5: running model inference device at mobile phone end side
Figure BDA0003648480860000034
Outputting a predicted depth map
Figure BDA0003648480860000033
S1.6: depth map I to be predicted Depth And adding the data into the data stream to realize the encapsulation of the module.
Preferably, the lightweight depth estimation neural network model algorithm specifically comprises the following steps:
s2.1: the method comprises the steps that a lightweight depth estimation model of a depth map is predicted at a mobile phone end side, color RGB images (the image format is YUV 420) shot by a camera and pose parameters of the camera (the camera pose parameters returned in an ARCore frame of Google are required to be used as initial values of the camera parameters) are input into the lightweight depth estimation model, and the lightweight depth estimation model is output into a predicted depth image in a RAW format and a predicted confidence image;
s2.2: the depth estimation neural network model is a monocular depth estimation model, single inference completed by the model does not depend on information of front and back image frames or multiple images, and single depth estimation can be completed by inputting a single image;
s2.3: the depth estimation neural network model is a lightweight network model, a model inference device deployed at a mobile phone end is smaller than 150M, and the depth map prediction with FPS of 30 frames per second is realized on a mobile phone platform with high pass Snapdagon 865 and above.
S2.4: inputting an image I by taking EfficientNet as a backbone network of a depth prediction algorithm encoder RGB Extracting features at different resolutions (one half, one fourth, one eighth and one sixteenth) through EfficientNet to construct an image feature pyramid { S } 1/2 ,S 1/4 ,S 1/8 ,S 1/16 In the present invention, the model backbone network can be constructed by a similar lightweight model (e.g., mobileNet)Replacement;
s2.5: the multi-scale fusion structure is adopted as a decoder of the depth prediction algorithm, as shown in fig. 3, a decoder module receives a feature branch under the current resolution and a feature branch under the upper resolution, and the features of the upper resolution are spliced and fused with the features of the current resolution through a residual convolution module. The residual convolution module is formed by combining two Relu activation layers and two convolution modules with convolution kernels of 3x3 in a cross-serial mode. Inputting the fused features into a residual convolution module with the same structure, and outputting the features of the current branch through a resampling module and a convolution module with a convolution kernel size of 1x 1;
s2.6: using the multi-scale loss as a loss function of the neural network model, and calculating the formula as follows:
Figure BDA0003648480860000041
the gradient difference of the predicted depth and the real depth in the data set in the directions of the x axis and the y axis is calculated respectively by a formula, and the gradient difference are added and fused under different scale resolutions.
S2.7: for better robustness and generalization capability of the model on different data sets, the model uses affine-invariant depth prediction, i.e. d * = ds + μ. And s and mu are scales and offsets in affine transformation, and affine transformation parameters between the predicted depth and the real depth are obtained through a global least square method.
S2.8: the model is trained on a plurality of public depth data sets such as NyuDepthv2, KITTI, scanNet, ETH3D and the like, so that the model learns enough data prior, and the generalization capability of the model is improved.
Preferably, the step of implementing the AR interactive function by using the AR interactive function of the ARCore Depth Lab and Unity is as follows:
s3.1, after the depth information prediction of the neural network is completed, replacing a depth image returned by an ARCore calling depth API with the generated depth prediction image, and calling ARCore in Unity;
s3.2: generating grid information of a scene through a depth map by using a rendering engine provided by unity, and rendering a pseudo color map representing the depth information;
s3.3: and adding a corresponding special effect to the depth scene by using a part of functions of the ARCore depth lab and utilizing a special effect component of the unity scene.
Compared with the prior art, the invention has the beneficial effects that:
1. the invention directly deploys the algorithm to the mobile phone end side and utilizes the mobile phone computing power to carry out neural network reasoning, thereby avoiding the serious dependence of the existing monocular depth estimation method on the computing power resource of a large-scale server. The existing deep estimation network is difficult to consider both precision and efficiency, a method with higher precision usually needs a long-time model reasoning process, and the model structure is usually more complex. Different from the existing large-scale deep learning network, the method provided by the invention achieves effective balance in precision and efficiency. The invention adopts a lightweight network structure model to realize the monocular depth estimation network inference frame by frame, the model structure of the network is simpler, the calculation power consumed during the network training is reduced, and meanwhile, the network inference is convenient to operate and the end-side deployment is convenient to carry out;
2. the method has the advantages that the development of the application at the depth estimation end side on an Android platform is realized, the method is different from the existing framework for operating neural network reasoning on a Mobile phone platform, the existing method generally uses a Pythroch model to train a model at a server end, obtains the model after parameter convergence, converts the model into an ONNX format, converts the model into a Tensiloflow framework for operation, and uses the Tensiloflow lite module to finish the model reasoning at the Mobile device end side, the method is independent of the Tensiloflow framework, uses the Pythroch Mobile direct conversion model, and uses the Pythroch framework to directly operate the model reasoning at the Mobile device end, so that the method is more convenient and faster, and avoids switching the operation model at different deep learning frameworks;
3. the depth estimation method provided by the invention avoids dependence on a depth API which is an interface (only supported by part of high-end mobile phone models) provided by an Android platform mobile phone system, is different from the existing software such as depthlab, the depth information in the method is obtained by deep learning model reasoning from RGB images, does not need additional hardware equipment (such as a laser radar, a millimeter wave radar and other depth sensors) to acquire the depth information, utilizes unity as a three-dimensional special effect development tool to realize the AR/VR interaction function, and has strong practical application value;
4. the invention carries out the reasoning of a depth estimation network on a framework of a Pythrch Mobile, the framework provides an end-to-end working process, the process from research to a production environment at the side of a Mobile device is simplified, and the framework is protected. The invention adopts a clear structural framework, and facilitates the subsequent modification and upgrading operation of each part of content.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a system model flow diagram of the present invention;
FIG. 2 is a diagram illustrating a depth estimation result of the system according to the present invention;
FIG. 3 is a diagram of an algorithm model of the system of the present invention;
FIG. 4 is a diagram illustrating AR interaction of the system of the present invention.
Detailed Description
For further understanding of the present invention, the objects, technical solutions and advantages of the present invention will become more apparent from the following detailed description of the present invention when taken in conjunction with the accompanying drawings and examples. It is to be understood that the present invention is illustrative only and not limiting. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Examples
Referring to fig. 1, the present invention provides a technical solution: an Android mobile phone end face AR interaction system based on deep learning comprises a mobile phone with a camera, original color image data are collected by the mobile phone through calling of a camera API, camera parameters, pose and the like are obtained, camera frames are extracted, and image streams are processed in real time. The method comprises the steps that an efficient and robust lightweight deep estimation neural network model is trained by the server side through a Pytrch Mobile deep learning framework, a serializable and optimizable model is established from Pytrch codes through a torchscript after training is finished, model conversion and model optimization are conducted, and the model is processed and stored in a ptl format including model weights and a model interpreter. The lightweight Depth estimation model imports a model file converted by means of the torchscript into an ARCore module through Java language and Android Studio software, operates reasoning at the mobile phone end side, and replaces a Depth API interface with a Depth map obtained through reasoning and prediction to realize input and output of data streams. And (4) running neural network reasoning at the side of the mobile phone by using the limited computing power of the mobile phone to generate a predicted depth map corresponding to the original image data. After the depth information prediction of the neural network is completed, replacing the depth image returned by the ARCore calling depth API with the generated depth prediction graph, and calling ARCore in Unity. Firstly, generating mesh information of a scene through a depth map by using a rendering engine provided by unity, and rendering a pseudo color map representing the depth information; and then, using partial functions of the ARCore-depth-lab, and utilizing the special effect component of the unity scene to add the corresponding special effect to the depth scene. And combining the original image and the predicted Depth map, and realizing the Android mobile phone end-side AR interaction system independent of the Depth API by utilizing the AR interaction function and the Unity development example of the ARCore Depth Lab.
Please refer to fig. 2, which shows the depth map effect tested by the network structure model. Fig. 2 is a diagram showing the effect of depth map construction of an indoor scene by a depth estimation framework and a lightweight depth estimation network model introduced by the present invention, wherein the first line and the third line are input RGB images, and the second line and the fourth line are corresponding depth maps predicted by using the network of the present invention. After a multi-scale fusion decoding frame is adopted, the estimation of the detail part of the model prediction graph is more accurate, and most three-dimensional information of a scene is recovered under limited calculation force.
Please refer to fig. 3, which is a schematic diagram of a depth estimation network model structure according to the present invention. The network model adopts EfficientNet as a backbone network of an encoder to extract image features, an image pyramid is constructed under different resolutions, a multi-scale fusion decoder is adopted to fuse the image features, and finally a depth map corresponding to a predicted image is decoded through a residual convolution module. The residual convolution module is formed by sequentially arranging and connecting a Relu activation layer, a convolution module with convolution kernel size of 3x3, a Relu activation layer and a convolution module with convolution kernel size of 3x3 in series; the multi-scale fusion module receives the feature graphs of the current feature branch and the previous feature branch, fuses the features of the previous feature branch after passing through the residual convolution module with the features of the current branch, and then sequentially connects a residual convolution module with the same structure, a resampling module and a convolution module with the convolution kernel size of 1x1 to output decoded features.
Please refer to fig. 4, which is a diagram illustrating an AR interaction effect actually measured at an Android mobile phone mobile terminal according to the technical solution of the present invention. After the depth map rendering is operated, the model can finish the rendering of the scene in a short time to generate a corresponding pseudo color map. According to the depth estimation result, the corresponding object can be aligned at the mobile phone end, and the virtual object can be placed. And moving the mobile phone, wherein the virtual object can move correspondingly along with the scene, so that the interaction of the three-dimensional information is realized.
The above-disclosed preferred embodiments of the present invention are merely illustrative of the technical solutions of the present invention and not restrictive. The preferred embodiments are not intended to be exhaustive or to limit the invention to the precise embodiments disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best utilize the invention. The invention is limited only by the claims and their full scope and equivalents, and modifications and equivalents may be made thereto without departing from the spirit and scope of the invention, which should be limited only by the claims and their full scope and equivalents.

Claims (7)

1. Android cell-phone terminal side AR interactive system based on degree of depth study, its characterized in that:
firstly, acquiring original color image data by a Mobile phone through calling a camera API, using a high-efficiency and robust lightweight depth estimation neural network model trained by a Pythrch Mobile deep learning framework, using the limited computing power of the Mobile phone at the Mobile phone end side, operating neural network inference, processing an image stream in real time, and then generating a prediction depth map corresponding to the original image data. And finally, combining the original image and the predicted Depth map, and realizing the AR interaction function by utilizing the AR interaction function and Unity of the ARCore Depth Lab.
2. The deep learning-based Android mobile phone end-side AR interaction system of claim 1, characterized in that: the mobile phone system is an Android system and the version is Android 8 or above.
3. The deep learning-based Android mobile phone end-side AR interaction system of claim 1, characterized in that: the mobile phone can use a CPU or a GPU to finish neural network reasoning, and recommends using a high-performance chip (such as a high-pass Snapdragon 865) to realize the operation of a high frame rate.
4. The deep learning-based Android mobile phone end-side AR interaction system of claim 1, characterized in that: the lightweight depth estimation model deployed at the mobile phone end is converted and optimized by a method for creating a serializable and optimizable model from a PyTorch code through a torchscript after training at a server end, and the stored model suffix is in a ptl format, and model file information comprises model weight and an interpreter of the model;
5. the deep learning-based Android mobile phone end-side AR interaction system of claim 1, characterized in that: the lightweight depth estimation method deployed at the mobile phone end side comprises the following implementation steps:
s1.1: training a model on a server, and training model weights by adopting a depth data set;
s1.2: model inference device is converted by Pythrch Mobile and stored
Figure FDA0003648480850000011
S1.3: model reasoner on Android Studio software through Java programming
Figure FDA0003648480850000012
Leading into an ARCore module;
s1.4: calling the API of the mobile phone camera to acquire an image stream I = { I = { (I) 1 ,I 2 ,…,I n And extract the current frame I n As input I of RGB image RGB
S1.5: running model reasoner at mobile phone end side
Figure FDA0003648480850000013
Outputting a predicted depth map
Figure FDA0003648480850000014
S1.6: depth map I to be predicted Depth Adding the data into a data stream to realize the encapsulation of the module;
6. the deep learning-based Android mobile phone end-side AR interaction system of claim 1, characterized in that: the lightweight depth estimation neural network model algorithm specifically comprises the following steps:
s2.1: the method comprises the steps that a lightweight depth estimation model of a depth map is predicted at a mobile phone end side, color RGB images (the image format is YUV 420) shot by a camera and pose parameters of the camera (the camera pose parameters returned in an ARCore frame of Google are required to be used as initial values of the camera parameters) are input into the lightweight depth estimation model, and the lightweight depth estimation model is output into a predicted depth image in a RAW format and a predicted confidence image;
s2.2: the depth estimation neural network model is a monocular depth estimation model, single inference completed by the model does not depend on information of front and back image frames or multiple images, and single depth estimation can be completed by inputting a single image;
s2.3: the depth estimation neural network model is a lightweight network model, a model inference device deployed at a mobile phone end is smaller than 150M, and the depth map prediction with FPS of 30 frames per second is realized on a mobile phone platform with high pass Snapdagon 865 and above.
S2.4: inputting an image I by taking EfficientNet as a backbone network of a depth prediction algorithm encoder RGB Extracting features at different resolutions (one-half, one-fourth, one-eighth and one-sixteenth) through EfficientNet to construct the image feature pyramid { S } 1/2 ,S 1/4 ,S 1/8 ,S 1/16 In the invention, the model backbone network can be replaced by a similar lightweight model (such as MobileNet);
s2.5: the multi-scale fusion structure is adopted as a decoder of the depth prediction algorithm, as shown in fig. 3, a decoder module receives a feature branch under the current resolution and a feature branch under the upper resolution, and the features of the upper resolution are spliced and fused with the features of the current resolution through a residual convolution module. The residual convolution module is formed by combining two Relu activation layers and two convolution modules with convolution kernels of 3x3 in a cross-serial mode. Inputting the fused features into a residual convolution module with the same structure, and outputting the features of the current branch through a resampling module and a convolution module with a convolution kernel size of 1x 1;
s2.6: using the multi-scale loss as a loss function of the neural network model, and calculating the formula as follows:
Figure FDA0003648480850000021
the gradient difference of the predicted depth and the real depth in the data set in the directions of the x axis and the y axis is calculated respectively by a formula, and the gradient difference are added and fused under different scale resolutions.
S2.7: for better robustness and generalization capability of the model on different data sets, the model uses affine-invariant depth prediction, i.e. d * = ds + μ. And s and mu are the scale and the offset in affine transformation, and affine transformation parameters between the predicted depth and the true depth are obtained through a global least square method.
S2.8: the model is trained on a plurality of public depth data sets such as NyuDepthv2, KITTI, scanNet, ETH3D and the like, so that the model learns enough data prior, and the generalization capability of the model is improved.
7. The deep learning-based Android mobile phone end-side AR interaction system of claim 1, characterized in that: the method for realizing the AR interaction function by utilizing the AR interaction function and Unity of the ARCore Depth Lab comprises the following steps:
s3.1, after the depth information prediction of the neural network is completed, replacing a depth image returned by an ARCore calling depth API with the generated depth prediction image, and calling ARCore in Unity;
s3.2: generating grid information of a scene through a depth map by using a rendering engine provided by unity, and rendering a pseudo color map representing the depth information;
s3.3: and adding a corresponding special effect to the depth scene by using a part of functions of the ARCore depth lab and utilizing a special effect component of the unity scene.
CN202210541388.0A 2022-05-17 2022-05-17 Android mobile phone end-side AR interaction system based on deep learning Pending CN115309301A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210541388.0A CN115309301A (en) 2022-05-17 2022-05-17 Android mobile phone end-side AR interaction system based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210541388.0A CN115309301A (en) 2022-05-17 2022-05-17 Android mobile phone end-side AR interaction system based on deep learning

Publications (1)

Publication Number Publication Date
CN115309301A true CN115309301A (en) 2022-11-08

Family

ID=83854804

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210541388.0A Pending CN115309301A (en) 2022-05-17 2022-05-17 Android mobile phone end-side AR interaction system based on deep learning

Country Status (1)

Country Link
CN (1) CN115309301A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116152323A (en) * 2023-04-18 2023-05-23 荣耀终端有限公司 Depth estimation method, monocular depth estimation model generation method and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110221689A (en) * 2019-05-10 2019-09-10 杭州趣维科技有限公司 A kind of space drawing method based on augmented reality
CN110716641A (en) * 2019-08-28 2020-01-21 北京市商汤科技开发有限公司 Interaction method, device, equipment and storage medium
CN111465962A (en) * 2018-10-04 2020-07-28 谷歌有限责任公司 Depth of motion for augmented reality of handheld user devices
CN114332666A (en) * 2022-03-11 2022-04-12 齐鲁工业大学 Image target detection method and system based on lightweight neural network model

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111465962A (en) * 2018-10-04 2020-07-28 谷歌有限责任公司 Depth of motion for augmented reality of handheld user devices
CN110221689A (en) * 2019-05-10 2019-09-10 杭州趣维科技有限公司 A kind of space drawing method based on augmented reality
CN110716641A (en) * 2019-08-28 2020-01-21 北京市商汤科技开发有限公司 Interaction method, device, equipment and storage medium
CN114332666A (en) * 2022-03-11 2022-04-12 齐鲁工业大学 Image target detection method and system based on lightweight neural network model

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
余方洁: "基于深度图的移动端点云分割方法研究", 《中国优秀硕士学位论文全文数据库信息科技辑》, no. 08, 15 August 2021 (2021-08-15), pages 138 - 273 *
刘强: "《构建企业级推荐*** 算法 工程实现与案例分析》", 13 July 2021, 机械工业出版社, pages: 169 *
马榕 等: "基于单目深度估计的低功耗视觉里程计", 《***仿真学报》, vol. 33, no. 12, 18 December 2021 (2021-12-18), pages 3001 - 3011 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116152323A (en) * 2023-04-18 2023-05-23 荣耀终端有限公司 Depth estimation method, monocular depth estimation model generation method and electronic equipment
CN116152323B (en) * 2023-04-18 2023-09-08 荣耀终端有限公司 Depth estimation method, monocular depth estimation model generation method and electronic equipment

Similar Documents

Publication Publication Date Title
Lee et al. From big to small: Multi-scale local planar guidance for monocular depth estimation
CN113572962B (en) Outdoor natural scene illumination estimation method and device
CN113706699B (en) Data processing method and device, electronic equipment and computer readable storage medium
CN113837938B (en) Super-resolution method for reconstructing potential image based on dynamic vision sensor
CN115690382B (en) Training method of deep learning model, and method and device for generating panorama
KR20200128378A (en) Image generation network training and image processing methods, devices, electronic devices, and media
CN113077505B (en) Monocular depth estimation network optimization method based on contrast learning
CN113284173B (en) End-to-end scene flow and pose joint learning method based on false laser radar
CN113034413B (en) Low-illumination image enhancement method based on multi-scale fusion residual error coder-decoder
CN111612878B (en) Method and device for making static photo into three-dimensional effect video
CN111652933B (en) Repositioning method and device based on monocular camera, storage medium and electronic equipment
CN116721207A (en) Three-dimensional reconstruction method, device, equipment and storage medium based on transducer model
CN113382275A (en) Live broadcast data generation method and device, storage medium and electronic equipment
CN115309301A (en) Android mobile phone end-side AR interaction system based on deep learning
CN112200817A (en) Sky region segmentation and special effect processing method, device and equipment based on image
CN109788270A (en) 3D-360 degree panorama image generation method and device
CN116524121A (en) Monocular video three-dimensional human body reconstruction method, system, equipment and medium
CN112750092A (en) Training data acquisition method, image quality enhancement model and method and electronic equipment
CN117218246A (en) Training method and device for image generation model, electronic equipment and storage medium
Ren et al. Efficient human pose estimation by maximizing fusion and high-level spatial attention
CN113793420A (en) Depth information processing method and device, electronic equipment and storage medium
CN113191301A (en) Video dense crowd counting method and system integrating time sequence and spatial information
CN116977547A (en) Three-dimensional face reconstruction method and device, electronic equipment and storage medium
CN116258756A (en) Self-supervision monocular depth estimation method and system
CN114926594A (en) Single-view-angle shielding human body motion reconstruction method based on self-supervision space-time motion prior

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination