CN112861733B - Night traffic video significance detection method based on space-time double coding - Google Patents

Night traffic video significance detection method based on space-time double coding Download PDF

Info

Publication number
CN112861733B
CN112861733B CN202110183195.8A CN202110183195A CN112861733B CN 112861733 B CN112861733 B CN 112861733B CN 202110183195 A CN202110183195 A CN 202110183195A CN 112861733 B CN112861733 B CN 112861733B
Authority
CN
China
Prior art keywords
convolution
space
time
coding structure
blocks
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110183195.8A
Other languages
Chinese (zh)
Other versions
CN112861733A (en
Inventor
颜红梅
蒋莲芳
田晗
高港耀
吴江
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202110183195.8A priority Critical patent/CN112861733B/en
Publication of CN112861733A publication Critical patent/CN112861733A/en
Application granted granted Critical
Publication of CN112861733B publication Critical patent/CN112861733B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • G06T9/002Image coding using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • G06V20/54Surveillance or monitoring of activities, e.g. for recognising suspicious objects of traffic, e.g. cars on the road, trains or boats

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Image Analysis (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The invention discloses a night traffic video saliency detection method based on space-time double coding, which is applied to the technical field of computer vision and aims at solving the problem of the prior art in saliency detection of night traffic scenes, wherein a network model related to the invention comprises three parts, namely space-time double coding, attention fusion and a decoder; the time coding module learns the time information before and after the continuous time sequence of the night traffic video by adopting convolution LSTM, and highlights the motion characteristic in the traffic video; in a spatial coding module, extracting spatial features under different size receptive fields by using pyramid cavity convolution (PDC); the Attention module is used for enhancing the characteristics which greatly contribute to the driving task while fusing the time characteristics and the space characteristics; finally, important salient regions in the night driving task are accurately predicted by the decoder.

Description

Night traffic video significance detection method based on space-time double coding
Technical Field
The invention belongs to the technical field of computer vision, and particularly relates to a video saliency detection technology in a night traffic scene.
Background
The development of advanced driver assistance systems over the decades has made them play an increasingly important role in automobile driving, with a very good result in facilitating driving and ensuring safe driving. However, the traffic driving environment is a complex and varied dynamic scene and is flooded with a large amount of information. In a road traffic environment, there are not only driving-related information, such as traffic lights, signs, pedestrians, etc., but also driving-unrelated disturbances, such as billboards, neon lights, etc. The brain's ability to process information is limited and therefore must be highly focused when driving. If there is distracted driving, the chance of an accident is greatly increased. Under the conditions of dim light, mixed light and low visibility, the driver is easy to cause visual fatigue, distracted, overlooked or overlooked important targets when driving for a long time in the night traffic scene. The traffic accident rate and the death rate caused by night driving are high, so that the real-time reminding of important information in the driving process is particularly important. The visual saliency detection of traffic scenes is the calculation of the areas and objects that the driver should be interested in driving, which are important to the driving task. The attention distribution in the night driving process under the waking state of an experienced driver is learned to help the significance detection of the night traffic scene, so that the driving safety is improved.
Night driving scenes are more complex than daytime. The night scene is complicated by: 1. insufficient illumination and low contrast; 2. the lamp light is disordered, and the visual interference caused by uneven brightness is increased; 3. the noise interference is large, and the detail blurring is serious; 4. color distortion, etc. This greatly increases the difficulty of processing the nighttime images. Therefore, the detection of significance in night traffic scenarios is one of the challenges to be addressed.
In order to solve the problems faced by the method, a dual-coding neural network model is designed to predict the significance region of the visual search of the driver in the night traffic video scene, and the purpose of reminding the driver in real time to pay attention to important information which is useful for driving is achieved. The significance region predicted by the model has high matching degree with the real attention distribution of the driver.
Disclosure of Invention
In order to solve the technical problem, the invention provides a night traffic video saliency detection method based on space-time double coding.
The technical scheme adopted by the invention is as follows: a night traffic video significance detection method based on space-time double coding comprises the following steps:
s1, acquiring a standard fixation point saliency map;
s2, establishing a network model, wherein the network model is used for carrying out significance detection on the input standard fixation point significant map;
the network model includes: the system comprises a space-time coding structure, an Attention fusion module and a decoding module, wherein the space-time coding structure is used for extracting the spatial characteristics and the time characteristics of an input standard fixation point saliency map, the Attention fusion module is used for fusing the extracted spatial characteristics and the time characteristics, and the decoding module calculates to obtain the saliency map according to a fusion result;
and S3, training the network model, and detecting the image significance by adopting the trained network model.
The space-time coding structure comprises a space coding structure and a time coding structure, the space coding structure is used for extracting the space characteristics of the input standard viewpoint saliency map, and the time coding structure is used for extracting the time characteristics of the input standard viewpoint saliency map.
In the process of training the network model: the current frame is used for extracting spatial features, and the current frame and the previous 5 frames form a continuous sequence for extracting temporal features.
The spatial coding structure comprises: 4 groups of rolling blocks and a pyramid cavity rolling block;
each group of convolution blocks specifically includes: 2 convolution operation layers, wherein each convolution operation layer comprises a 3 multiplied by 3 convolution, a batch processing normalization unit and a correction linear unit; each group of volume blocks comprises a2 multiplied by 2 maximum pooling layer with the step size of 2;
the word tower hole convolution block acquires spatial features by adopting hole convolution parallel architectures with different hole rates.
The temporal coding structure comprises: 4 groups of convolution blocks and a convolution long-short term memory network;
each group of convolution blocks specifically includes: 2 convolution operation layers, wherein each convolution operation layer comprises a 3 multiplied by 3 convolution, a batch processing normalization unit and a correction linear unit; each group of volume blocks includes a2 x 2 maximum pooling layer of step size 2.
The time coding structure extracts the characteristics of a continuous sequence through 4 groups of convolution blocks, and then the extracted characteristics are input into the frame information before and after the learning of a convolution long-short term memory network.
The structure of the decoding module sequentially comprises: the system comprises 3 upsampling layers, 3 groups of convolution blocks, a layer of 1 multiplied by 1 convolution layer and a Sigmoid layer, wherein a multiplied by 2 upsampling layer is arranged in front of each group of convolution blocks;
each group of convolution blocks specifically includes: 2 convolution operation layers, wherein each convolution operation layer comprises a 3 multiplied by 3 convolution, a batch processing normalization unit and a correction linear unit; each group of volume blocks includes a2 x 2 maximum pooling layer of step size 2.
The invention has the beneficial effects that: the invention firstly provides significance detection based on a top-down night traffic scene, the model extracts time information and space information, and the spatio-temporal information is selectively enhanced through integration of an attention mechanism, so that the significance detection graph has better effect and the predicted region is more accurate.
Drawings
FIG. 1 is a flow chart of an eye movement experiment provided by the present invention;
FIG. 2 is a schematic diagram of a network architecture employed by the present invention;
fig. 3 is a diagram illustrating an example of a night traffic video image saliency prediction according to an embodiment of the present invention;
fig. 3(a) shows an input original image, fig. 3(b) shows a standard eye movement saliency map, and fig. 3(c) shows a model prediction map according to the present invention.
Detailed Description
In order to facilitate understanding of the technical contents of the present invention by those skilled in the art, the present invention will be further explained with reference to the accompanying drawings.
The method comprises two main steps of calculation of a standard eye movement saliency map and training of a network model:
A. calculation of standard eye movement saliency map:
step A1: eye movement data (including fixation point information of each frame) of 30 drivers with driving ages of two years and more are recorded by an eye tracker, and the experimental process is shown in fig. 1. And eliminating abnormal data and integrating all tested eye movement data to each frame.
Step A2: generating a blank matrix with the same size as the input image, and assigning the corresponding position of the gazing point of each frame in the blank matrix as 1 to obtain a binary image, namely a standard gazing point binary image. Next, 2-dimensional gaussian smoothing (δ being 30) is performed on the binary image to obtain a standard fixation point saliency map, which is used as a label for network training. (one standard gaze point saliency map for each input picture).
B. Training a network model:
b1, the model designed by the invention mainly comprises three parts: a space-time coding structure (divided into a space coding structure and a time coding structure), an Attention fusion module and a decoding module.
The spatial coding structure is used for extracting spatial features of an image, and specifically comprises the following steps:
the spatial features are very important features of the traffic scene, and the convolution operation can effectively extract the spatial characteristics of the image. The spatial coding structure consists of 4 sets of rolling blocks and a pyramid void rolling block (PDC). Each set of volume blocks consists of two convolution operation layers, each convolution operation layer comprising a 3 × 3 convolution, a batch normalization unit (BN), and a correction linear unit Relu. There is a2 x 2 maximum pooling layer of step size 2 between volume blocks.
The PDC module aims at solving the detection problem of areas with different sizes, and adopts a cavity convolution parallel framework with different cavity rates to obtain spatial characteristics. In this embodiment, 4 void convolutions with a void ratio of 1, 2, 4, and 8 are respectively used to obtain local information, and global features are obtained by using Global Average Pooling (GAP). And finally, a convolution operation layer is connected.
The time coding structure is used for extracting time characteristics of the image, and specifically comprises the following steps:
the temporal coding structure consists of 4 sets of volume blocks and a convLSTM (Convolitional Long Short-Term Memory Network Convolutional Long Short-Term Memory Network). The volume block and the spatial coding structure are the same. There is a2 x 2 maximum pooling layer of step 2 between the volume blocks. Compared with FC-LSTM, convLSTM effectively preserves the spatial structure of the image when learning temporal features, and is therefore better suited to processing video sequences.
Unlike spatial coding, the input to the temporal coding structure is a T frame (1)<T<10) A video sequence. Extraction of characteristics Z of T frames of a video sequence by 4 sets of convolutional blocks t ~Z t-T Then Z is t ~Z t-T Input to the convolutional LSTM learning front and rear frame information. In order to obtain maximum dynamic information of consecutive T frames, finally we retain the feature H of the last time series t-T
The Attention fusion module is used for fusing space-time characteristics, and specifically comprises:
the fusion module fuses time and space characteristics on a channel by applying an Attention mechanism. The method mainly calculates the weight of the channel through the correlation on the characteristic diagram channel, and then weights the weight to the image characteristic to update the characteristic, so that the channel which is more important to the detection result is more prominent. Splicing the output of the time code and the output of the space code to obtain a characteristic F, and deforming the characteristic F through a shape function to obtain F 1 Then transpose to obtain F 2 。F 1 And F 2 Multiplying the matrix and obtaining F through softmax 3 。F 3 And weighting to F as the channel weight to obtain the final fusion result.
The decoding module is configured to calculate a final saliency map, specifically:
the decoding structure consists of 3 sets of convolutional blocks, 3 upsampled layers, one layer of 1 x 1 convolutional layers, and one Sigmoid layer. Wherein the volume block is identical to that of the spatial coding structure. Each set of convolutional blocks is preceded by a x 2 upsampling. The last layer of Sigmoid function controls the output value to be in the range of [0,1 ]. The predicted driver saliency map is a grayscale map of the same size as the input image.
B2, dividing the data set into about 8: 2: 3 into training set, verification set and test set. In order to shorten the training time, the size of the input picture is changed to 320 × 192 × 3 (height H × width W × number of channels C).
B3, firstly, randomly initializing the parameters of the network model (see fig. 2 for the network model, the network characteristics are all represented by H × W × C). Will train set picture F t (320 x 192) is input to spatial coding, while the current frame and its previous t-5 frames, i.e., F, are input t ~F t-5 The frames constitute a continuous time sequence that is input into the time code. The BCE function is employed to calculate the loss value between the predicted saliency map and the corresponding label (standard gaze point saliency map). Usage learning rate of 10 -3 The Adam optimizer with a momentum value of 0.9 and an attenuation rate of 10-4 was trained to update the parameters and save the model parameters for each epoch.
And B4, verifying the model by using a verification set after each epoch is trained. And continuously repeating the step B2 to carry out iterative training until the calculated loss value fluctuation amplitude is basically stable, namely the parameters in the network are basically stable, so as to obtain the optimal model parameters.
The network model of the present invention is verified with specific data as follows:
step 1: and B4, importing the optimal model parameters in the step B4 into the model, and randomly inputting test set data to obtain a prediction result.
Step 2: in order to verify the performance of the model, the results are qualitatively analyzed and quantitatively calculated. In qualitative analysis, for more intuitive comparison and evaluation, the predicted gray-scale image is colorized and then superimposed on an original image, namely a standard eye movement saliency map. The qualitative effect is shown in fig. 3, where fig. 3(a) shows the input original image, and the distribution of the model prediction graph shown in fig. 3(c) is similar to the distribution of the standard eye movement saliency graph shown in fig. 3(b), indicating that the prediction performance of the model is better. In quantitative analysis, the main evaluation indexes include: AUC _ Borji value, AUC _ Judd value, NSS value (normalized scan path significance), CC (linear correlation coefficient), KLD (relative entropy), EMD (land mobile distance), SIM (similarity). The results of the quantitative analysis are shown in Table 1. The better the model effect, namely the more accurate the predicted region, the more the model effect is evaluated by using the indexes. Wherein, the lower the KLD and EMD values are, the better the effect of the model is. The higher the AUC _ Borji, AUC _ Judd, NSS, CC, and SIM values are, the better the model effect is.
Table 1: the method of the invention predicts the evaluation index result of the night traffic video image
Figure BDA0002942006030000051
Those skilled in the art should note that ↓ represents the higher the value is better, and ↓ represents the lower the value is better in table 1.
It will be appreciated by those of ordinary skill in the art that the embodiments described herein are intended to assist the reader in understanding the principles of the invention and are to be construed as being without limitation to such specifically recited embodiments and examples. Various modifications and alterations to this invention will become apparent to those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the claims of the present invention.

Claims (6)

1. A night traffic video saliency detection method based on space-time dual coding is characterized by comprising the following steps:
s1, acquiring a standard fixation point saliency map; step S1 specifically includes:
step A1: recording eye movement data of 30 drivers with driving ages of two years and more by using an eye movement instrument, wherein the eye movement data comprises fixation point information of each frame, eliminating abnormal data, and integrating all tested eye movement data to each frame;
step A2: generating a blank matrix with the same size as the input image, and assigning the corresponding position of the fixation point of each frame in the blank matrix as 1 to obtain a standard fixation point binary image; then, 2-dimensional Gaussian smoothing is carried out on the binary image to obtain a standard fixation point saliency map;
s2, establishing a network model, wherein the network model is used for carrying out significance detection on the input standard fixation point significant graph;
the network model includes: the system comprises a space-time coding structure, an Attention fusion module and a decoding module, wherein the space-time coding structure is used for extracting the spatial characteristics and the time characteristics of an input standard fixation point saliency map, the Attention fusion module is used for fusing the extracted spatial characteristics and the time characteristics, and the decoding module calculates to obtain the saliency map according to a fusion result;
s3, training the network model, and detecting the image significance by adopting the trained network model;
the space-time coding structure comprises a space coding structure and a time coding structure, the space coding structure is used for extracting the space characteristics of the input standard viewpoint saliency map, and the time coding structure is used for extracting the time characteristics of the input standard viewpoint saliency map.
2. The method for detecting the saliency of night traffic video based on space-time dual coding according to claim 1, wherein in the training process of the network model: the current frame is used for extracting the spatial features, and the current frame and the previous 5 frames form a continuous sequence for extracting the temporal features.
3. The method according to claim 2, wherein the spatial coding structure comprises: 4 groups of rolling blocks and a pyramid cavity rolling block;
each group of convolution blocks specifically includes: 2 convolution operation layers, wherein each convolution operation layer comprises a 3 multiplied by 3 convolution, a batch processing normalization unit and a correction linear unit; each group of volume blocks comprises a2 multiplied by 2 maximum pooling layer with the step size of 2;
the pyramid cavity convolution block acquires spatial features by adopting cavity convolution parallel architectures with different cavity rates.
4. The method according to claim 3, wherein the temporal coding structure comprises: 4 groups of convolution blocks and a convolution long-term and short-term memory network;
each group of convolution blocks specifically includes: 2 convolution operation layers, wherein each convolution operation layer comprises a 3 multiplied by 3 convolution, a batch processing normalization unit and a correction linear unit; each group of volume blocks includes a2 x 2 maximum pooling layer of step size 2.
5. The night traffic video saliency detection method based on space-time double coding as claimed in claim 4, characterized in that said time coding structure extracts features of continuous sequence through 4 groups of convolution blocks, and then inputs the extracted features to the convolution long-short term memory network learning previous and next frame information.
6. The method as claimed in claim 5, wherein the decoding module sequentially comprises: the system comprises 3 upsampling layers, 3 groups of convolution blocks, a layer of 1 multiplied by 1 convolution layer and a Sigmoid layer, wherein a multiplied by 2 upsampling layer is arranged in front of each group of convolution blocks;
each group of convolution blocks specifically includes: 2 convolution operation layers, wherein each convolution operation layer comprises a 3 multiplied by 3 convolution, a batch processing normalization unit and a correction linear unit; each group of volume blocks includes a2 x 2 maximum pooling layer of step size 2.
CN202110183195.8A 2021-02-08 2021-02-08 Night traffic video significance detection method based on space-time double coding Active CN112861733B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110183195.8A CN112861733B (en) 2021-02-08 2021-02-08 Night traffic video significance detection method based on space-time double coding

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110183195.8A CN112861733B (en) 2021-02-08 2021-02-08 Night traffic video significance detection method based on space-time double coding

Publications (2)

Publication Number Publication Date
CN112861733A CN112861733A (en) 2021-05-28
CN112861733B true CN112861733B (en) 2022-09-02

Family

ID=75988373

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110183195.8A Active CN112861733B (en) 2021-02-08 2021-02-08 Night traffic video significance detection method based on space-time double coding

Country Status (1)

Country Link
CN (1) CN112861733B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109376611A (en) * 2018-09-27 2019-02-22 方玉明 A kind of saliency detection method based on 3D convolutional neural networks
CN110705566A (en) * 2019-09-11 2020-01-17 浙江科技学院 Multi-mode fusion significance detection method based on spatial pyramid pool
CN111563418A (en) * 2020-04-14 2020-08-21 浙江科技学院 Asymmetric multi-mode fusion significance detection method based on attention mechanism

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101305735B1 (en) * 2012-06-15 2013-09-06 성균관대학교산학협력단 Method and apparatus for providing of tactile effect
CN103793925B (en) * 2014-02-24 2016-05-18 北京工业大学 Merge the video image vision significance degree detection method of space-time characteristic
CN108664981B (en) * 2017-03-30 2021-10-26 北京航空航天大学 Salient image extraction method and device
CN110909594A (en) * 2019-10-12 2020-03-24 杭州电子科技大学 Video significance detection method based on depth fusion
CN112308005A (en) * 2019-11-15 2021-02-02 电子科技大学 Traffic video significance prediction method based on GAN
CN111461043B (en) * 2020-04-07 2023-04-18 河北工业大学 Video significance detection method based on deep network
CN112040222B (en) * 2020-08-07 2022-08-19 深圳大学 Visual saliency prediction method and equipment
CN112016476B (en) * 2020-08-31 2022-11-01 山东大学 Method and system for predicting visual saliency of complex traffic guided by target detection

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109376611A (en) * 2018-09-27 2019-02-22 方玉明 A kind of saliency detection method based on 3D convolutional neural networks
CN110705566A (en) * 2019-09-11 2020-01-17 浙江科技学院 Multi-mode fusion significance detection method based on spatial pyramid pool
CN111563418A (en) * 2020-04-14 2020-08-21 浙江科技学院 Asymmetric multi-mode fusion significance detection method based on attention mechanism

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Visual tracking of objects for unmanned surface vehicle navigation;Kyung Hwa Chae等;《2016 16th International Conference on Control, Automation and Systems (ICCAS)》;20170126;全文 *
全局场景感知的眼动分布规律及注视区域预测模型研究;杨天;《中国优秀硕士学位论文全文数据库》;20210115;全文 *

Also Published As

Publication number Publication date
CN112861733A (en) 2021-05-28

Similar Documents

Publication Publication Date Title
Fang et al. DADA: Driver attention prediction in driving accident scenarios
Zheng et al. A novel background subtraction algorithm based on parallel vision and Bayesian GANs
Zhang et al. CDNet: A real-time and robust crosswalk detection network on Jetson nano based on YOLOv5
CN109753913B (en) Multi-mode video semantic segmentation method with high calculation efficiency
Fang et al. Traffic accident detection via self-supervised consistency learning in driving scenarios
CN111639524B (en) Automatic driving image semantic segmentation optimization method
Zhang et al. Lightweight and efficient asymmetric network design for real-time semantic segmentation
CN110363770B (en) Training method and device for edge-guided infrared semantic segmentation model
JP2012529110A (en) Semantic scene segmentation using random multinomial logit
CN112308005A (en) Traffic video significance prediction method based on GAN
CN116343144B (en) Real-time target detection method integrating visual perception and self-adaptive defogging
CN113269133A (en) Unmanned aerial vehicle visual angle video semantic segmentation method based on deep learning
Cheng et al. A highway traffic image enhancement algorithm based on improved GAN in complex weather conditions
CN115690750A (en) Driver distraction detection method and device
CN113673527B (en) License plate recognition method and system
CN109543519B (en) Depth segmentation guide network for object detection
Deng et al. Driving visual saliency prediction of dynamic night scenes via a spatio-temporal dual-encoder network
CN116740362B (en) Attention-based lightweight asymmetric scene semantic segmentation method and system
CN112861733B (en) Night traffic video significance detection method based on space-time double coding
CN111612803A (en) Vehicle image semantic segmentation method based on image definition
CN113343903B (en) License plate recognition method and system in natural scene
CN114283288B (en) Method, system, equipment and storage medium for enhancing night vehicle image
CN115171001A (en) Method and system for detecting vehicle on enhanced thermal infrared image based on improved SSD
Yuan et al. RM-IQA: A new no-reference image quality assessment framework based on range mapping method
CN112818858A (en) Rainy day traffic video saliency detection method based on double-channel visual mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant