CN115482280A - Visual positioning method based on adaptive histogram equalization - Google Patents

Visual positioning method based on adaptive histogram equalization Download PDF

Info

Publication number
CN115482280A
CN115482280A CN202211106319.3A CN202211106319A CN115482280A CN 115482280 A CN115482280 A CN 115482280A CN 202211106319 A CN202211106319 A CN 202211106319A CN 115482280 A CN115482280 A CN 115482280A
Authority
CN
China
Prior art keywords
image
depth
network
pose
histogram equalization
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211106319.3A
Other languages
Chinese (zh)
Inventor
张会清
杨永建
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN202211106319.3A priority Critical patent/CN115482280A/en
Publication of CN115482280A publication Critical patent/CN115482280A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/40Image enhancement or restoration using histogram techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/13Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/40Analysis of texture
    • G06T7/41Analysis of texture based on statistical description of texture
    • G06T7/44Analysis of texture based on statistical description of texture using image operators, e.g. filters, edge density metrics or local histograms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20172Image enhancement details
    • G06T2207/20192Edge enhancement; Edge preservation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a visual positioning method based on self-adaptive histogram equalization. Then, a binary mask is designed by utilizing the depth information of the image to eliminate the interference of dynamic targets and shielding in the scene. Then using a two-branch network of encoder-decoder network structure: the pose estimation network and the depth estimation network respectively extract the depth information of the image and the pose information of the adjacent image frame, and the pose estimation result of the network is supervised through luminosity consistency loss, edge smoothing loss and depth consistency loss in the training process. The invention can extract the space-time characteristics of main adjacent image frames, ensure the pose estimation of the network to have the characteristics of consistent scale through the image depth information and accurately estimate the position in real time.

Description

Visual positioning method based on adaptive histogram equalization
Technical Field
The invention belongs to the field of visual positioning, and relates to a method based on a two-branch network, which comprises the following steps: and the pose estimation network and the depth estimation network are used for estimating the visual positioning algorithm of the monocular camera motion trail. Aiming at the scenes with overexposure or underexposure of indoor scene images, the algorithm improves the problem of image overexposure by introducing a contrast enhancement algorithm based on a self-adaptive histogram equalization theory (CLAHE), and effectively improves the visual positioning precision.
Background
With the continuous development of robot technology, the accurate positioning of the robot position has a crucial influence on downstream tasks such as motion planning and navigation of the robot. The service scenes of the intelligent robot can be mainly divided into an indoor scene and an outdoor scene at present, in an outdoor environment, a Global Navigation Satellite System (GNSS) can provide accurate position service through satellite signals, in the indoor environment, due to interference of indoor buildings, positioning accuracy based on the GNSS is not satisfactory due to instability of wireless satellite signals, and a Visual odometer (Visual odometer) determines the relative position of a main body only by estimating motion between adjacent image frames without any external signals.
Visual odometers can be currently classified into monocular and binocular visual odometers according to the type of sensor. The monocular vision odometer is high in instantaneity, simple in system structure, but lack of absolute scale and has the problem of scale uncertainty, and the binocular vision odometer can acquire the image absolute scale through the fixed base line in triangulation. In recent years, deep learning gradually becomes mainstream in the field of computer vision, a visual mileage calculation method based on data driving replaces a visual feature extraction module in the traditional method through a convolutional neural network, higher visual features can be obtained, and final pose estimation is obtained by means of strong learning capacity of the neural network.
Although deep learning brings huge performance improvement to the visual odometer, at present, some problems still exist in the field of the visual odometer and need to be optimized. The image frame serves as an input to a visual odometer, where the texture and edge features contained therein are primarily affected by light, in outdoor scenes, the image may appear too bright due to overexposure, and in indoor scenes, the image may appear too dark due to underillumination. The lack of image texture and edge information can be caused by the over-brightness or over-darkness of the image, so that the learning of the subsequent neural network on the image visual characteristics is influenced, and the model reasoning precision is reduced.
In order to solve the problems, the invention provides a visual positioning algorithm based on a histogram equalization theory by recovering and enhancing the texture and edge detail characteristics of an overexposed or underexposed image through a fusion limited contrast self-adaptive histogram equalization algorithm (CLAHE).
Disclosure of Invention
Aiming at the influences of scene illumination conditions and weather factors, the method adopts an image enhancement algorithm based on histogram equalization to recover image texture detail information of an overexposed or underexposed image, and then adopts a two-branch prediction network based on a depth residual error neural network to calculate a camera pose result of an input video frame.
In order to achieve the purpose, the invention adopts the following technical scheme:
firstly, in an off-line stage, restoring and enhancing texture and edge structure information of an input image through an image enhancement module for an input time sequence image frame sequence, then removing a dynamic target and a shield in the image through an enhanced image input mask module, then feeding a preprocessed enhanced image into a Resnet encoder to extract image motion and depth information, finally restoring the image size through a decoder and obtaining a pose prediction result, and then updating network weight through back propagation according to a gradient obtained by a loss function to finally obtain a robust pose estimation network model. And in an online stage, feeding the input image frame sequence into a network model obtained by training to obtain a corresponding pose estimation result. A schematic diagram of the positioning method of the present invention is shown in fig. 1.
A visual positioning method based on adaptive histogram equalization is disclosed, wherein an algorithm flow chart is shown in figure 2, and the method mainly comprises two stages of off-line model construction and on-line real-time positioning:
the off-line model training stage mainly comprises the following modules:
(1) An image enhancement module: for an input image frame sequence, firstly, carrying out block filling on an image, then calculating a mapping relation for each block based on a histogram equalization strategy, carrying out contrast limitation based on the obtained mapping relation, and finally obtaining an enhanced image through bilinear interpolation. And enhancing the edge and texture information of the image by using a histogram equalization method for limiting the contrast, and improving the display of the target object in an excessively bright or excessively dark area in the image.
(2) Resnet based encoder: the time and space characteristics of the enhanced image are extracted by using a classical characteristic extraction network Resnet, and the steps are as follows:
the positioning model consists of a two-branch network: a depth prediction network and a pose estimation network. The depth prediction network is used for generating depth information of the image, and the problem of scale drift of long sequence pose estimation is relieved by means of the depth information of the image. For depth prediction networks, the decoder section uses the Resnet50 to extract image spatio-temporal features. For the pose estimation network, resnet18 is used as a decoder, and in order to input two images simultaneously for the input image, the Resnet first layer input channel is modified to 6-channel input.
Determining the number of layers of hidden layers, the number of neurons in each layer and the learning rate, and setting network parameters such as pre-training iteration times and fine-tuning iteration times of each layer; and setting the activation function of the output layer as a sigmoid function, and setting the activation function of other nonlinear activation layers as an ELU.
The two networks are jointly trained through the same loss function, and the loss function is obtained through three parts, namely luminosity consistency loss, edge smoothing loss and depth consistency loss.
(3) A decoder: for the depth prediction network, the dispesnet is used as a decoder to recover the spatiotemporal features of the image by up-sampling layer by layer as shown in fig. 3. For the pose estimation network, as shown in fig. 4, four convolutional layers are designed by using posereset to obtain a 6-degree-of-freedom predicted pose.
(4) Mask based on occlusion mask: dynamic objects and occlusions in the scene violate static scene assumptions. Due to the existence of the dynamic target and the occlusion, the depth information of the predicted image and the depth of the real image generate obvious depth inconsistency in the dynamic target area, and further the correct learning of the depth network and the pose network parameters is influenced. As shown in fig. 5, based on the predicted single-view depth information, the target view is flipped to the source view by view synthesis principle to obtain a synthesized target depth map, then the target view depth map and the synthesized target depth map are subtracted to obtain a depth difference map, and the depth difference map is further calculated as a binary mask, the binary mask has a large difference value in a dynamic target area and a nearly zero difference in a static area, and the binary mask is used for eliminating the interference of the dynamic target and the occlusion before calculating the loss.
Drawings
Fig. 1 is a flow chart of the present visual positioning method.
Figure 2 is a flow chart of the CLAHE combined Resnet positioning algorithm of the present invention at the off-line stage.
Fig. 3 is a basic configuration diagram of a depth estimation network.
Fig. 4 is a basic configuration diagram of a pose estimation network.
Fig. 5 is a basic configuration diagram of a mask generation network.
Detailed Description
The method of the invention is illustrated in flow diagram form in FIG. 1. In the off-line training stage, the final pose estimation is obtained through the collected video data and the image enhancement module, the Resnet-based image feature coding module, the PosRenet decoder module, the DispesNet decoder module and the mask generation module. In the training process, the pose with scale consistency is obtained through the learning of the camera pose by a luminosity consistency loss, an edge smoothing loss and a depth consistency loss supervision network. In an online stage, a trained network model is imported, and the pose and the position relative to the scene of the camera are estimated in real time through image frames input by the camera.
The specific implementation steps are as follows:
(1) An image enhancement module: the method comprises the steps that under the influence of weather and scenes, changes of illumination conditions have important influence on image frames obtained by a camera, and in order to improve the negative influence of insufficient illumination and overexposure on image texture information, an image enhancement algorithm based on histogram equalization is adopted to enhance the edge and texture information of an original input image.
For an input image frame, based on the locality principle of human vision, firstly, the image is partitioned, and local image blocks are used as basic units of histogram equalization, so that the phenomenon that the histogram equalization amplifies noise in an excessively dark area is avoided.
For image slices after image blocking, because of the existence of too dark regions in an image, the probability density of the regions on a histogram is characterized to be too large on a certain pixel, and the larger pixel probability density amplification can be generated after a mapping function is transformed, so that the image has noise. The image contrast is calculated according to the following formula:
Figure BDA0003841781360000041
wherein, I sc_max Representing the maximum value of a pixel of the image, I sc_min Representing the minimum value of the image pixel.
After histogram equalization is completed on each image block, if the image blocks are directly spliced, blocking effect of the synthesized image blocks can be caused, and in order to avoid the blocking effect, the image blocks are spliced into an enhanced image through secondary linear interpolation. And for the pixel point P (i, j) in the image, obtaining the pixel value of the target pixel point through the bilinear quadratic difference.
p(x,y)=f(p 11 )w 11 +f(p 21 )w 21 +f(p 12 )w 12 +f(p 22 )w 22
Wherein p is 11 ,p 12 ,p 21 And p 22 Respectively represent P(i, j) four pixel coordinates in the vicinity.
(2) ResNet based encoder: considering that the network is a two-branch network, wherein the depth estimation network is mainly used for providing scale information with long-time sequence consistency for the pose network, and for the depth estimation network, the ResNet50 is adopted for extracting image spatio-temporal features. For the pose estimation network, resnet18 is used as a decoder, and in order to input two images simultaneously for the input image, the Resnet first layer input channel is modified to 6-channel input.
After the structure of the Resnet network is determined, determining the number of layers of hidden layers, the number of neurons of each layer and the learning rate according to the network structure, and setting network parameters such as pre-training iteration times and fine-tuning iteration times of each layer; and setting an activation function of an output layer as a sigmoid function, and setting an activation function of other nonlinear activation layers as an ELU.
In the training stage, the two networks are jointly trained through the same loss function, and the loss function is obtained through three parts, namely luminosity consistency loss, edge smoothing loss and depth consistency loss.
(3) A decoder: for depth prediction networks, dispesnet is used as a decoder to recover scale information of the image by layer-by-layer upsampling with a transposed convolution based on a 3x3 convolution kernel. For the pose estimation network, as shown in fig. 4, four convolutional layers are designed by using posereset to obtain a 6-degree-of-freedom predicted pose. In the feature decoding stage, the image features are decoded by adopting 1x1 and 3x3 two-dimensional convolution kernels, and the final pose with six degrees of freedom is obtained.
(4) Mask based on occlusion mask: taking into account that dynamic objects and occlusions in a scene violate static scene assumptions. Due to the existence of the dynamic target and the occlusion, the depth information of the predicted image and the depth of the real image generate obvious depth inconsistency in a dynamic target area, and further the correct learning of a depth network and a pose network parameter is influenced. Based on the predicted single-view depth information, the target view is turned into the source view through a view synthesis principle to obtain a synthesized target depth map, and the view synthesis principle can be characterized by the following formula:
Figure BDA0003841781360000051
wherein upsilon is ij Representing a target image I i (x) And flipping the source image
Figure BDA0003841781360000061
And turning the source image to obtain pose transformation information and depth information between the target image and the source image according to the pixel value difference, wherein the pose transformation information is obtained according to the following formula:
Figure BDA0003841781360000062
wherein K is the internal reference of the camera,
Figure BDA0003841781360000063
is a pose transformation matrix from a target image to a source image,
Figure BDA0003841781360000064
the depth information of the target image is obtained by a depth prediction network. Depth map D based on target image j (x j ) And a source image depth map D i (x i ) The mask based on scale uniformity can be calculated by the following formula:
Figure BDA0003841781360000065
where thre is empirically set to 0.25 i->j (x) Representing the difference in depth information between the two images, calculated by:
Figure BDA0003841781360000066
the binary mask based on the depth consistency is used for eliminating the interference of the dynamic target and the shielding before calculating the loss, and the difference value of the dynamic target area is large, and the difference of the static area is nearly zero.
(5) Loss function: in order to improve the prediction accuracy of a depth estimation network and a pose estimation network, the space-time characteristics of an image are considered, a loss function is composed of three parts, a two-branch network is jointly trained in a training stage, and an overall loss function is composed of three parts:
L c =αL photo +βL smooth +γL depth
the first part is the luminosity consistency loss, which is used to constrain the luminosity loss between adjacent image frames:
Figure BDA0003841781360000067
wherein I i (x) And
Figure BDA0003841781360000068
representing gray values of the reference image and the inverted source image.
The second part is edge smoothness loss, which is used to compensate the prediction accuracy of the scene in low texture or single plane areas:
Figure BDA0003841781360000069
wherein
Figure BDA0003841781360000071
Representing the first derivative of the spatial direction.
And the third part is scale consistency loss, and the scale consistency of the pose is kept in long-sequence pose estimation by means of a map depth information constraint network.
Figure BDA0003841781360000072
Wherein SSIM (I) i -I j ) Representing the difference in structural similarity of the two images.
(6) And in the on-line stage, video data of a test scene is collected, a trained model is imported, the inference model encodes the edge and texture features of the test scene, and a pose with 6 degrees of freedom is finally obtained through four layers of transposed convolution decoding layers.
The method of the invention fully utilizes the characteristic characteristics and the fusion mechanism of the depth residual error network to obtain a high-precision positioning method.

Claims (6)

1. A visual positioning method based on adaptive histogram equalization is characterized by comprising the following steps:
in the off-line positioning stage, the collected scene video is input into a positioning model to be trained; performing contrast-limit-based histogram equalization on the original input image to enhance the edge texture information of the image;
extracting motion information and depth information of consecutive video frames through a Resnet-based encoder network;
acquiring depth information difference of adjacent image frames based on a view synthesis principle through the depth information of the adjacent image frames, and generating a binary mask through the depth information difference;
based on visual characteristics obtained by an encoder, based on transposed convolution, obtaining a pose and a depth map with six degrees of freedom through layer-by-layer upsampling of four layers of convolution layers, and updating weights of a supervision model based on luminosity consistency loss, edge smoothing loss and depth consistency loss;
and in the on-line positioning stage, video data of a test scene is acquired, a trained model is imported, the model extracts scene edge texture information through scene information coding, and real-time pose estimation and depth map estimation with six degrees of freedom are obtained through different decoders.
2. The visual positioning method based on adaptive histogram equalization according to claim 1, wherein the image enhancement method based on histogram equalization comprises:
filling blocks of the image, calculating a mapping relation for each block based on a histogram equalization strategy, carrying out contrast limitation based on the obtained mapping relation, and finally obtaining an enhanced image through bilinear interpolation; and enhancing the edge and texture information of the image by using a histogram equalization method for limiting the contrast, and improving the display of the target object in an excessively bright or dark area in the image.
3. The visual positioning method based on adaptive histogram equalization according to claim 1, wherein the visual feature extraction method comprises:
for an input image, extracting advanced visual features of the image through an encoder-decoder network, extracting the visual features through four-layer downsampling convolution by using a classical network ResNet based on a 3x3 convolution kernel in a decoder part, wherein a pose estimation network decoder adopts ResNet-18, the encoder adopts PoseResNet to design four layers of convolution layers to obtain a 6-degree-of-freedom prediction pose, a depth estimation network decoder adopts ResNet-50, a DispesNet is used as the decoder, image spatio-temporal features are restored through layer-by-layer upsampling by adopting the DispesNet as the decoder, and the image spatio-temporal features are restored into a depth map through layer-by-layer upsampling.
4. The visual positioning method based on adaptive histogram equalization as claimed in claim 1, wherein said occlusion mask based binary mask comprises:
turning the target view into the source view based on a view synthesis principle to obtain a synthesized target depth map, wherein the view synthesis principle is characterized by the following formula:
Figure FDA0003841781350000021
wherein upsilon is ij Representing a target image I i (x) And flipping the source image
Figure FDA0003841781350000022
The difference of the pixel values between the target image and the source image, and the turning of the source image is obtained through the pose transformation information and the depth information between the target image and the source imageObtaining pose transformation information, wherein the pose transformation information is obtained by:
Figure FDA0003841781350000023
wherein K is the internal reference of the camera,
Figure FDA0003841781350000024
is a pose transformation matrix from a target image to a source image,
Figure FDA0003841781350000025
the depth information of the target image is obtained by a depth prediction network; depth map D based on target image j (x j ) And a source image depth map D i (x i ) The mask based on scale uniformity can be calculated by the following formula:
Figure FDA0003841781350000026
where thre is empirically set to 0.25 i->j (x) Representing the difference in depth information between the two images, calculated by:
Figure FDA0003841781350000027
5. the visual positioning method based on adaptive histogram equalization as claimed in claim 1, wherein said deep network loss function design comprises:
the loss function is composed of three parts, the two branch networks are jointly trained in the training stage, and the overall loss function is composed of three parts:
L c =αL photo +βL smooth +γL depth
the first part is the loss of photometric consistency, which is used to constrain the photometric loss between adjacent image frames:
Figure FDA0003841781350000028
wherein I i (x) And
Figure FDA0003841781350000031
representing gray values of the reference image and the inverted source image;
the second part is the loss of edge smoothness, which is used to compensate the prediction accuracy of the scene in low texture or single plane areas:
Figure FDA0003841781350000032
wherein ^ represents the first derivative of the spatial direction;
the third part is scale consistency loss, and the scale consistency of the pose is kept in long-sequence pose estimation by means of a map depth information constraint network;
Figure FDA0003841781350000033
wherein SSIM (I) i -I j ) Representing the difference in structural similarity of the two images.
6. The visual positioning method based on adaptive histogram equalization as claimed in claim 1, wherein the model comprises an online inference stage comprising:
the method comprises the steps of performing edge enhancement on an image based on histogram equalization, extracting image visual features through a Resnet encoder, calculating an image depth map based on a DispNet decoder, and obtaining a 6-degree-of-freedom pose based on an image pose and depth information through a PoseNet decoder.
CN202211106319.3A 2022-09-11 2022-09-11 Visual positioning method based on adaptive histogram equalization Pending CN115482280A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211106319.3A CN115482280A (en) 2022-09-11 2022-09-11 Visual positioning method based on adaptive histogram equalization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211106319.3A CN115482280A (en) 2022-09-11 2022-09-11 Visual positioning method based on adaptive histogram equalization

Publications (1)

Publication Number Publication Date
CN115482280A true CN115482280A (en) 2022-12-16

Family

ID=84424099

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211106319.3A Pending CN115482280A (en) 2022-09-11 2022-09-11 Visual positioning method based on adaptive histogram equalization

Country Status (1)

Country Link
CN (1) CN115482280A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117974721A (en) * 2024-04-01 2024-05-03 合肥工业大学 Vehicle motion estimation method and system based on monocular continuous frame images

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117974721A (en) * 2024-04-01 2024-05-03 合肥工业大学 Vehicle motion estimation method and system based on monocular continuous frame images

Similar Documents

Publication Publication Date Title
CN110335290B (en) Twin candidate region generation network target tracking method based on attention mechanism
WO2018000752A1 (en) Monocular image depth estimation method based on multi-scale cnn and continuous crf
CN110782490A (en) Video depth map estimation method and device with space-time consistency
CN113012172A (en) AS-UNet-based medical image segmentation method and system
CN113572962B (en) Outdoor natural scene illumination estimation method and device
CN114066831B (en) Remote sensing image mosaic quality non-reference evaluation method based on two-stage training
CN116402942A (en) Large-scale building three-dimensional reconstruction method integrating multi-scale image features
US20230281913A1 (en) Radiance Fields for Three-Dimensional Reconstruction and Novel View Synthesis in Large-Scale Environments
CN115187638B (en) Unsupervised monocular depth estimation method based on optical flow mask
CN113963117B (en) Multi-view three-dimensional reconstruction method and device based on variable convolution depth network
CN115018888A (en) Optical flow unsupervised estimation method based on Transformer
CN112270691A (en) Monocular video structure and motion prediction method based on dynamic filter network
CN112561996A (en) Target detection method in autonomous underwater robot recovery docking
CN115035171A (en) Self-supervision monocular depth estimation method based on self-attention-guidance feature fusion
CN115511759A (en) Point cloud image depth completion method based on cascade feature interaction
CN114972748A (en) Infrared semantic segmentation method capable of explaining edge attention and gray level quantization network
CN115482280A (en) Visual positioning method based on adaptive histogram equalization
CN116402851A (en) Infrared dim target tracking method under complex background
CN115147709A (en) Underwater target three-dimensional reconstruction method based on deep learning
CN113538527A (en) Efficient lightweight optical flow estimation method
CN113989612A (en) Remote sensing image target detection method based on attention and generation countermeasure network
CN116152442B (en) Three-dimensional point cloud model generation method and device
CN117372764A (en) Non-cooperative target detection method in low-light environment
CN112785629A (en) Aurora motion characterization method based on unsupervised deep optical flow network
CN110544216A (en) Video defogging system based on deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination