CN111814755A - Multi-frame image pedestrian detection method and device for night motion scene - Google Patents

Multi-frame image pedestrian detection method and device for night motion scene Download PDF

Info

Publication number
CN111814755A
CN111814755A CN202010832374.5A CN202010832374A CN111814755A CN 111814755 A CN111814755 A CN 111814755A CN 202010832374 A CN202010832374 A CN 202010832374A CN 111814755 A CN111814755 A CN 111814755A
Authority
CN
China
Prior art keywords
network
frame
night
detection
pedestrian detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010832374.5A
Other languages
Chinese (zh)
Inventor
陈海波
罗志鹏
徐振宇
姚粤汉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenyan Technology Beijing Co ltd
Original Assignee
Shenyan Technology Beijing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenyan Technology Beijing Co ltd filed Critical Shenyan Technology Beijing Co ltd
Priority to CN202010832374.5A priority Critical patent/CN111814755A/en
Publication of CN111814755A publication Critical patent/CN111814755A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/23Recognition of whole body movements, e.g. for sport training
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a method and a device for detecting pedestrians by multi-frame images facing a night motion scene, wherein the method comprises the following steps: acquiring a data set containing a plurality of night multi-frame images, and performing enhancement processing on the night multi-frame images in the data set; constructing a neural network, wherein the neural network comprises a feature extraction network and a prediction network, the feature extraction network fuses a plurality of backbone networks and comprises a feature pyramid network, a deformable convolution network is fused in each backbone network, and the prediction network comprises a double-branch structure; training the neural network through the enhanced data set, and judging a pedestrian target according to the interframe IOU value of the multi-frame image in the training process to obtain a pedestrian detection model; and carrying out pedestrian detection on the night multi-frame image to be detected through the pedestrian detection model. The pedestrian detection method and the pedestrian detection system can realize pedestrian detection aiming at multi-frame images of scenes such as night, and are high in accuracy and robustness.

Description

Multi-frame image pedestrian detection method and device for night motion scene
Technical Field
The invention relates to the technical field of target detection, in particular to a method for detecting pedestrians by using a multi-frame image facing a night motion scene, a device for detecting pedestrians by using a multi-frame image facing a night motion scene, computer equipment, a non-transitory computer readable storage medium and a computer program product.
Background
With the great improvement of computer storage capacity and computing capacity, video information is increasingly used as an information medium in daily life, and therefore, video processing and analysis are very important. As a basic problem in video analysis, video target detection has been a research hotspot in the industry and the trade. The video pedestrian automatic detection technology has wide application in the fields of intelligent transportation, unmanned driving, intelligent video monitoring and the like, but the video pedestrian detection field faces huge challenges due to the problems of large deformation, different postures, shadow shielding and the like during the movement of pedestrians. Particularly, the night video sequence has the problems of weak illumination intensity, high image noise and the like, so that the research work is more difficult to obtain outstanding results.
Disclosure of Invention
The invention provides a method and a device for detecting pedestrians by multi-frame images facing a night motion scene, aiming at solving the technical problems, and the method and the device can realize the detection of the pedestrians by the multi-frame images of the night scene, and have higher accuracy and robustness.
The technical scheme adopted by the invention is as follows:
a multi-frame image pedestrian detection method facing a night motion scene comprises the following steps: acquiring a data set containing a plurality of night multi-frame images, and performing enhancement processing on the night multi-frame images in the data set; constructing a neural network, wherein the neural network comprises a feature extraction network and a prediction network, the feature extraction network fuses a plurality of backbone networks and comprises a feature pyramid network, a deformable convolution network is fused in each backbone network, and the prediction network comprises a double-branch structure; training the neural network through the enhanced data set, and judging a pedestrian target according to an inter-frame IOU (Intersection Over Unit) value of a plurality of frames of images in the training process to obtain a pedestrian detection model; and carrying out pedestrian detection on the night multi-frame image to be detected through the pedestrian detection model.
And carrying out spatial-level image enhancement on the night multi-frame images in the data set in a batch data mode.
The main network is ResNeXt, the double-branch structures are FC-head and Conv-head respectively, the FC-head is used as a classification network, and the Conv-head is used as a regression network.
The method for judging the pedestrian target according to the interframe IOU value of the multi-frame image in the training process comprises the following steps: filtering the detection frames obtained by training, leaving the detection frames with the category scores larger than a first threshold value theta, setting the detection frames as Box 1, for a current frame, firstly calculating IOU values of the detection frames Boxes1 of the current frame and the tracking frames of a tracking queue of a previous frame, judging the maximum IOU value of each detection frame, if the maximum IOU value is larger than a second threshold value sigma, considering that the detection of the detection frame is correct, otherwise, if the maximum IOU value is smaller than the second threshold value sigma, judging whether the maximum detection score of the tracking frame in a previous video frame is larger than a third threshold value, and whether the number of times of the tracking frame appearing in the previous frame is larger than a minimum occurrence threshold value T, if the maximum IOU value is larger than the corresponding threshold value, judging that the detection frame of the current frame is wrong.
Regression loss L in training a networklocUsing smoothed L1Loss, x is ROI, b is predicted coordinates for ROI, g is tag coordinate values, f represents regressor,
Figure BDA0002638454730000021
b=(bx,by,bw,bh)
to ensure invariance of regression operations to scale, location, LlocOperation-associated vector Δ ═ andx,y,w,h),
Figure BDA0002638454730000022
and (3) carrying out a regularization operation on delta:
x=(x-ux)/σx
detecting each Head in a networkiTotal loss of (i ═ 1,2, 3):
L(xt,g)=Lcls(ht(xt),yt)+λ[yt≥1]Lloc(ft(xt,bt),g)
Figure BDA0002638454730000031
bt=ft-1(xt-1,bt-1)
wherein T represents the total number of branches of Cascade RCNN superposition, T represents the current branch, and each branch f in Cascade RCNNtBy training data b on individual branchestOptimization, btDerived from b1As a result of the outputs of all the branches, λ is a weighting coefficient, λ is 1, [ y ═ y-t≥1]Means that the regression loss, y, is calculated only in the positive samplestIs xtAccording to the above formulaetThe calculated label.
A multi-frame image pedestrian detection device facing a night motion scene comprises: the enhancement module is used for acquiring a data set containing a plurality of night multi-frame images and enhancing the night multi-frame images in the data set; the device comprises a construction module and a prediction module, wherein the construction module is used for constructing a neural network, the neural network comprises a feature extraction network and a prediction network, the feature extraction network fuses a plurality of backbone networks and comprises a feature pyramid network, a deformable convolution network is fused in each backbone network, and the prediction network comprises a double-branch structure; the training module is used for training the neural network through the enhanced data set, and judging a pedestrian target according to the interframe IOU value of the multi-frame image in the training process to obtain a pedestrian detection model; the detection module is used for detecting pedestrians for the night multi-frame images to be detected through the pedestrian detection model.
A computer device comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein when the processor executes the program, the multi-frame image pedestrian detection method facing the night motion scene is realized.
A non-transitory computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the above-described method for detecting pedestrians with respect to multiple frames of images in a moving night scene.
A computer program product, wherein instructions are executed by a processor to execute the method for detecting pedestrians by using multi-frame images facing a night motion scene.
The invention has the beneficial effects that:
the method inputs the enhanced multi-frame images into the neural network for training, fuses a plurality of trunk networks in the characteristic extraction network of the neural network, fuses a deformable convolution network in each trunk network, sets a double-branch structure in the prediction network, and judges the pedestrian target according to the inter-frame IOU value of the multi-frame images in the training process, so that the obtained pedestrian detection model can realize pedestrian detection aiming at the multi-frame images in night scenes, and has high accuracy and robustness.
Drawings
FIG. 1 is a flowchart of a method for detecting pedestrians through multiple frames of images facing a night motion scene according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a feature extraction network according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of the RPN according to one embodiment of the present invention;
FIG. 4 is a schematic diagram of the structure of Cascade RCNN according to one embodiment of the present invention;
FIG. 5 is a schematic structural diagram of a Double Head according to an embodiment of the present invention
Fig. 6 is a block diagram of a multi-frame image pedestrian detection device facing a night motion scene according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1, the method for detecting pedestrians by using multi-frame images facing a moving scene at night in the embodiment of the present invention includes the following steps:
and S1, acquiring a data set containing a plurality of night multi-frame images, and performing enhancement processing on the night multi-frame images in the data set.
The data set may include a large number of multi-frame images captured in a night motion scene, such as a video captured at night by a camera disposed on a corresponding road, or an image in gif format, and some of the multi-frame images include moving pedestrians and some of the multi-frame images do not include pedestrians. The data set is used as a training set, and the higher the number of multi-frame images contained in the data set is, the higher the accuracy of a subsequently trained detection model is.
In one embodiment of the invention, spatial-level image enhancement can be performed on night multi-frame images in a data set in the form of batch data so as to remove image noise without destroying structural information of original images.
Specifically, the multi-frame image in the data set can be randomly sampled, and for the sampled multi-frame image IiCompare its own width IiW and high IiH, selecting the long side max (I) in width and heighti_w,IiH) scaling to L, short side min (I)i_w,IiH) scaling to S, S from S1~S2Randomly selected from the above. Sampled multiple multiframe images Ii(I ═ 1,2,3 … n) is fed into feature extraction in the form of batch IAnd a network, wherein the long side of all the multi-frame images in the batch is L, and the short sides of the images are uniform in size, and the short sides S of the multi-frame images in the whole batch are usedi(i is 1,2,3 … n) is the maximum value max (S)i) Is a reference S _ base, the rest SiAdding padding to S _ base.
S_base=Si+padding
In one embodiment of the present invention, L may be 2048 and the short sides S1-S2 may be 1024-1536.
And S2, constructing a neural network, wherein the neural network comprises a feature extraction network and a prediction network, the feature extraction network fuses a plurality of backbone networks and comprises a feature pyramid network, each backbone network fuses a deformable convolution network, and the prediction network comprises a double-branch structure.
In an embodiment of the invention, the backbone network can be ResNeXt, and a deformable convolution network can be added in the ResNeXt, so that the spatial information modeling capability of the network is improved, and the robustness of a subsequently trained detection model to the size of an object can be improved to a certain extent by adding additional parameters to learn the deformation of a target; fusing a plurality of ResNeXt networks by using a composite backbone network to fuse high-layer semantic information and low-layer semantic information and extract more effective characteristic information; and a characteristic pyramid network is accessed, and the multi-scale characteristics are fused by combining the shallow semantic information and the deep position information, so that the detection of the multi-scale object by the model is facilitated.
The dual-branch structure is respectively FC-Head and Conv-Head, the FC-Head is used as a classification network and the Conv-Head is used as a regression network aiming at different requirements, different branches have different biases, and compared with a single-Head structure, the dual-Head structure classification and coordinate regression precision is higher.
And S3, training the neural network through the enhanced data set, and judging the pedestrian target according to the interframe IOU value of the multi-frame image in the training process to obtain a pedestrian detection model.
Specifically, the multiframe image I in the enhanced data set may be first subjected to a convolution operation of 7 × 7, the purpose of which is to directly downsample the input image, and to retain as much information as possible of the original image without performing convolution on the input imageThe number of channels needs to be increased. Then, as shown in FIG. 2, the image is sequentially passed through four stages (stages)1,Stage2,Stage3,Stage4) Each Stage is composed of a plurality of Residual Block Residual blocks horizontally. Each Residual Block is used to extract features more finely over the broader features obtained in the previous stage and is composed of two branches, one of which is a Residual branch and the other of which is composed of three layers in turn. The three layers are sequentially a 1x1 convolutional layer, a deformable convolutional layer, and a 1x1 convolutional layer. The deformable convolution layer comprises two steps, firstly, the position offset of each pixel required by deformable convolution is calculated through a convolution operation of 3x3, and then the position offset is acted on a convolution kernel to obtain the deformable convolution layer. The residual branch is composed of a 1x1 convolution layer and is mainly used for extracting residual characteristic information of the image. And after the characteristic diagrams respectively pass through two Residual branch circuits of the Residual Block, adding the formed characteristic diagrams to serve as input characteristics of the next Stage.
In particular, each Stage takes its output signature as an input signature for the Stage alongside it laterally before it enters the next Stage. Specifically, the input image passes through Stage1Then, a feature map F is generated1,F1As Stage1Stage arranged side by side transversely1_1) Input characteristic of (1), F1Passing through Stage1_1Post-production profile F2;F1Passing through Stage2Then, a feature map F is generated3,F3And F2Added to obtain Stage2Stage arranged side by side transversely2_2) Input features of (1), via Stage2_2Post-production profile F4;F3Passing through Stage3Then, a feature map F is generated5,F5And F4Added to obtain Stage3Stage arranged side by side transversely3_3) Input features of (1), via Stage3_3Post-production profile F6;F5Passing through Stage4Then, a feature map F is generated7,F7And F6Added to obtain Stage4Stage arranged side by side transversely4_4) Input features of (1), via Stage4_4Post-production profile F8
Extracting F produced by the above process2、F4、F6、F8Let it first go through a convolution of 1x1 to make their channels equal. Then, F8After interpolation, form F6Feature maps of the same size, same channel, added to fuse stages4_4And Stage3_3Characterization of the phases (denoted M)2);M2After interpolation, form F4Feature maps of the same size, same channel, added to fuse stages3_3And Stage2_2Characterization of the phases (denoted M)1);M1After interpolation, form F2Feature maps of the same size, same channel, added to fuse stages2_2And Stage1_1Characterization of the phases (denoted M)0) (ii) a F is to be8Directly as M3And (6) outputting.
Next, M may be first aligned3、M2、M1、M0A 3x3 convolution is performed and then fed into a two-stage Network, such as RPN (Region pro-active Network) and Cascade RCNN, respectively. The structure of the first-stage network, i.e. the RPN, is shown in fig. 3, and a plurality of anchors with fixed size and fixed proportion are manually set as the reference frames for prediction, and then propusals with higher confidence degree is screened from the anchors through a classification network and a regression network to be used as the reference frames of the second-stage network. The classification network is a two-class network, only predicts the probability value of whether a target exists in an anchor, and predicts the offset through a regression network, namely if a certain anchor possibly has a target, the anchor deviates from the real bounding box of the target. Similarly, the second stage network uses the propulses as reference frames for prediction, and then screens the final detection frames from these propulses through a classification network and a regression network. The classification network is a multi-classification network, and the number of classes of the multi-classification network depends on the number of classes to be detected in the data set. Regression netThe network predicts the offset between all propassals and the real bounding box.
The structure of the second stage network, i.e., Cascade RCNN, is shown in FIG. 4, which comprises a three-stage Cascade network, i.e., a first stage network Head1Output of (3) propusals 1 as second stage network Head2Input of the first network, after screening, second network Head2Output of (3) propusals 2 as third level network Head3Input of (3) Proposals, third level network Head3The output of propusals 3 is the final prediction result. The output box of Head at each stage of the network, namely, Proposal, is obtained by inputting the Pooling-based features and Proposal into the stage of the network, and predicting the class score and regression offset of Proposal. That is, each stage of network is composed of a classification network and a regression network, in the embodiment of the present invention, FC-Head is used as the classification network, Conv-Head is used as the regression network, and a two-branch structure, i.e., a Double Head structure, is shown in fig. 5, and is composed of a ROI Align layer and two parallel branches (a classification branch and a regression branch), i.e., is generally divided into a classification prediction branch and a regression prediction branch. The classification task often needs more image semantic information, and the regression task needs more spatial information. Therefore, the adopted Double Head structure considers the characteristics of different requirements, and the effect is more obvious.
In one embodiment of the invention, the classification loss L in training the networkclsUsing cross-entropy loss, for each ROI (Region Of Interest), Head structure (Head) is traversedi) Then obtaining a classification result Ci(i=1,2,3):
Figure BDA0002638454730000081
Wherein h (x) represents HeadiThe classification branch in (1) outputs a vector with dimension of M +1, the ROI is predicted to be one category in the dimension of M +1, and N represents the current HeadiThe number of ROIs in a stage, y corresponds to a category label, and the category label of y is determined by the IoU size of the ROI and the corresponding label:
Figure BDA0002638454730000082
wherein, Head1IoU threshold u set at u1,Head2And Head3Is set to u respectively2、u3X is ROI, gyIs the class label of the object x, the IoU threshold u defines the quality of the detector. Through different IOU threshold values, the noise interference problem in detection is effectively solved. In one embodiment of the invention, u1、u2、u3May be set to 0.5, 0.6, 0.7, respectively.
Regression loss L in training a networklocUsing smoothed L1Loss, x is ROI, b is predicted coordinates for ROI, g is tag coordinate values, f represents regressor:
Figure BDA0002638454730000091
b=(bx,by,bw,bh)
to ensure invariance of regression operations to scale, location, LlocOperation-associated vector Δ ═ andx,y,w,h),
Figure BDA0002638454730000092
the numerical values in the above formula are all small, and in order to improve the efficiency of the multi-task training, the regularization operation is performed on delta:
x=(x-ux)/σx
detecting each Head in a networkiTotal loss of (i ═ 1,2, 3):
L(xt,g)=Lcls(ht(xt),yt)+λ[yt≥1]Lloc(ft(xt,bt),g)
Figure BDA0002638454730000093
bt=ft-1(xt-1,bt-1)
wherein T represents the total number of branches of Cascade RCNN superposition, T represents the current branch, and each branch f in Cascade RCNNtBy training data b on individual branchestOptimization, btDerived from b1The result after all the previous branches are output, instead of directly using the initial distribution b of RPN1To train ftλ is a weighting coefficient, [ y ]t≥1]Means that the regression loss, y, is calculated only in the positive samplestIs xtAccording to the above formulaetThe calculated label. In one embodiment of the invention, T is 3 and λ is 1.
Further, for the detection frames obtained through the training process, firstly, a filtering operation may be performed, the detection frames with the category scores larger than the first threshold θ are left, and set as Boxes1, for the current frame, the IOU values of the detection frames box 1 of the current frame and the tracking frames of the previous frame tracking queue are calculated first, the maximum IOU value of each detection frame is determined, if the maximum IOU value is larger than the second threshold σ, the detection frame is considered to be correctly detected, otherwise, if the maximum IOU value is smaller than the second threshold σ, it is determined whether the maximum detection score of the tracking frame in the previous video frame is larger than a third threshold, and whether the number of times that the tracking frame appears in the previous frame is larger than the minimum occurrence threshold T, and if both are larger than the corresponding thresholds, the detection frame of the current frame is erroneous. For the object tracked by using the IOU information, if no detection box which can match with the previous frame exists in the current frame, the object of the current frame is newly appeared, and the object needs to be added into the tracking queue again.
And S4, carrying out pedestrian detection on the night multi-frame image to be detected through the pedestrian detection model.
According to the multi-frame image pedestrian detection method for the night motion scene, the multi-frame images subjected to enhancement processing are input into the neural network for training, the plurality of trunk networks are fused in the feature extraction network of the neural network, the deformable convolution network is fused in each trunk network, the double-branch structure is arranged in the prediction network, and the pedestrian target is judged according to the inter-frame IOU value of the multi-frame images in the training process, so that the obtained pedestrian detection model can achieve pedestrian detection for the multi-frame images of the night scene, and is high in accuracy and robustness.
The invention further provides a multi-frame image pedestrian detection device facing the night motion scene, which is corresponding to the multi-frame image pedestrian detection method facing the night motion scene of the embodiment.
As shown in fig. 6, the device for detecting pedestrians facing a multi-frame image in a moving night scene according to an embodiment of the present invention includes an enhancement module 10, a construction module 20, a training module 30, and a detection module 40. The enhancement module 10 is configured to obtain a data set including a plurality of night multi-frame images, and perform enhancement processing on the night multi-frame images in the data set; the building module 20 is configured to build a neural network, where the neural network includes a feature extraction network and a prediction network, the feature extraction network merges a plurality of trunk networks and includes a feature pyramid network, each trunk network merges a deformable convolution network, and the prediction network includes a double-branch structure; the training module 30 is configured to train the neural network through the enhanced data set, and judge a pedestrian target according to the inter-frame IOU value of the multiple frames of images in the training process to obtain a pedestrian detection model; the detection module 40 is configured to perform pedestrian detection on night multi-frame images to be detected through a pedestrian detection model.
The data set may include a large number of multi-frame images captured in a night motion scene, such as a video captured at night by a camera disposed on a corresponding road, or an image in gif format, and some of the multi-frame images include moving pedestrians and some of the multi-frame images do not include pedestrians. The data set is used as a training set, and the higher the number of multi-frame images contained in the data set is, the higher the accuracy of a subsequently trained detection model is.
In an embodiment of the present invention, the enhancement module 10 may perform spatial-level image enhancement on night multi-frame images in a data set in the form of batch data to remove image noise without destroying structural information of an original image.
Specifically, the multi-frame image in the data set can be randomly sampled, and for the sampled multi-frame image IiCompare its own width IiW and high IiH, selecting the long side max (I) in width and heighti_w,IiH) scaling to L, short side min (I)i_w,IiH) scaling to S, S from S1~S2Randomly selected from the above. Sampled multiple multiframe images Ii(I is 1,2,3 … n) is sent to the feature extraction network in the form of batch, the long side of all multi-frame images in the batch is L, and the short sides of the images are uniform in size, and the short side S of the multi-frame images in the whole batch is usedi(i is 1,2,3 … n) is the maximum value max (S)i) Is a reference S _ base, the rest SiAdding padding to S _ base.
S_base=Si+padding
In one embodiment of the present invention, L may be 2048 and the short sides S1-S2 may be 1024-1536.
In an embodiment of the invention, the backbone network can be ResNeXt, and a deformable convolution network can be added in the ResNeXt, so that the spatial information modeling capability of the network is improved, and the robustness of a subsequently trained detection model to the size of an object can be improved to a certain extent by adding additional parameters to learn the deformation of a target; fusing a plurality of ResNeXt networks by using a composite backbone network to fuse high-layer semantic information and low-layer semantic information and extract more effective characteristic information; and a characteristic pyramid network is accessed, and the multi-scale characteristics are fused by combining the shallow semantic information and the deep position information, so that the detection of the multi-scale object by the model is facilitated.
The dual-branch structure is respectively FC-Head and Conv-Head, the FC-Head is used as a classification network and the Conv-Head is used as a regression network aiming at different requirements, different branches have different biases, and compared with a single-Head structure, the dual-Head structure classification and coordinate regression precision is higher.
The training module 30 may first perform a convolution operation of 7 × 7 on the multi-frame image I in the enhanced data set, which aims to directly down-sample the input image, and to retain as much information as possible of the original image,without increasing the number of channels. Then, as shown in FIG. 2, the image is sequentially passed through four stages (stages)1,Stage2,Stage3,Stage4) Each Stage is composed of a plurality of Residual Block Residual blocks horizontally. Each Residual Block is used to extract features more finely over the broader features obtained in the previous stage and is composed of two branches, one of which is a Residual branch and the other of which is composed of three layers in turn. The three layers are sequentially a 1x1 convolutional layer, a deformable convolutional layer, and a 1x1 convolutional layer. The deformable convolution layer comprises two steps, firstly, the position offset of each pixel required by deformable convolution is calculated through a convolution operation of 3x3, and then the position offset is acted on a convolution kernel to obtain the deformable convolution layer. The residual branch is composed of a 1x1 convolution layer and is mainly used for extracting residual characteristic information of the image. And after the characteristic diagrams respectively pass through two Residual branch circuits of the Residual Block, adding the formed characteristic diagrams to serve as input characteristics of the next Stage.
In particular, each Stage takes its output signature as an input signature for the Stage alongside it laterally before it enters the next Stage. Specifically, the input image passes through Stage1Then, a feature map F is generated1,F1As Stage1Stage arranged side by side transversely1_1) Input characteristic of (1), F1Passing through Stage1_1Post-production profile F2;F1Passing through Stage2Then, a feature map F is generated3,F3And F2Added to obtain Stage2Stage arranged side by side transversely2_2) Input features of (1), via Stage2_2Post-production profile F4;F3Passing through Stage3Then, a feature map F is generated5,F5And F4Added to obtain Stage3Stage arranged side by side transversely3_3) Input features of (1), via Stage3_3Post-production profile F6;F5Passing through Stage4Then, a feature map F is generated7,F7And F6Added to obtain Stage4Stage arranged side by side transversely4_4) Input features of (1), via Stage4_4Post-production profile F8
Extracting F produced by the above process2、F4、F6、F8Let it first go through a convolution of 1x1 to make their channels equal. Then, F8After interpolation, form F6Feature maps of the same size, same channel, added to fuse stages4_4And Stage3_3Characterization of the phases (denoted M)2);M2After interpolation, form F4Feature maps of the same size, same channel, added to fuse stages3_3And Stage2_2Characterization of the phases (denoted M)1);M1After interpolation, form F2Feature maps of the same size, same channel, added to fuse stages2_2And Stage1_1Characterization of the phases (denoted M)0) (ii) a F is to be8Directly as M3And (6) outputting.
Next, M may be first aligned3、M2、M1、M0A 3x3 convolution is performed before being fed into the two-stage network, e.g., RPN and Cascade RCNN, respectively. The structure of the first-stage network, i.e. the RPN, is shown in fig. 3, and a plurality of anchors with fixed size and fixed proportion are manually set as the reference frames for prediction, and then propusals with higher confidence degree is screened from the anchors through a classification network and a regression network to be used as the reference frames of the second-stage network. The classification network is a two-class network, only predicts the probability value of whether a target exists in an anchor, and predicts the offset through a regression network, namely if a certain anchor possibly has a target, the anchor deviates from the real bounding box of the target. Similarly, the second stage network uses the propulses as reference frames for prediction, and then screens the final detection frames from these propulses through a classification network and a regression network. The classification network is a multi-classification network, and the number of classes of the multi-classification network depends on the number of classes to be detected in the data set. Regression network prediction of all proposals and real boundOffset between the ing boxes.
The structure of the second stage network, i.e., Cascade RCNN, is shown in FIG. 4, which comprises a three-stage Cascade network, i.e., a first stage network Head1Output of (3) propusals 1 as second stage network Head2Input of the first network, after screening, second network Head2Output of (3) propusals 2 as third level network Head3Input of (3) Proposals, third level network Head3The output of propusals 3 is the final prediction result. The output box of Head at each stage of the network, namely, Proposal, is obtained by inputting the Pooling-based features and Proposal into the stage of the network, and predicting the class score and regression offset of Proposal. That is, each stage of network is composed of a classification network and a regression network, in the embodiment of the present invention, FC-Head is used as the classification network, Conv-Head is used as the regression network, and a two-branch structure, i.e., a Double Head structure, is shown in fig. 5, and is composed of a ROI Align layer and two parallel branches (a classification branch and a regression branch), i.e., is generally divided into a classification prediction branch and a regression prediction branch. The classification task often needs more image semantic information, and the regression task needs more spatial information. Therefore, the adopted Double Head structure considers the characteristics of different requirements, and the effect is more obvious.
In one embodiment of the invention, training module 30 trains the network with a classification penalty LclsUsing cross entropy loss, for each ROI, via Head structures (Head)i) Then obtaining a classification result Ci(i=1,2,3):
Figure BDA0002638454730000141
Wherein h (x) represents HeadiThe classification branch in (1) outputs a vector with dimension of M +1, the ROI is predicted to be one category in the dimension of M +1, and N represents the current HeadiThe number of ROIs in a stage, y corresponds to a category label, and the category label of y is determined by the IoU size of the ROI and the corresponding label:
Figure BDA0002638454730000142
wherein, Head1IoU threshold u set at u1,Head2And Head3Is set to u respectively2、u3X is ROI, gyIs the class label of the object x, the IoU threshold u defines the quality of the detector. Through different IOU threshold values, the noise interference problem in detection is effectively solved. In one embodiment of the invention, u1、u2、u3May be set to 0.5, 0.6, 0.7, respectively.
Regression loss L when training module 30 trains a networklocUsing smoothed L1Loss, x is ROI, b is predicted coordinates for ROI, g is tag coordinate values, f represents regressor:
Figure BDA0002638454730000143
b=(bx,by,bw,bh)
to ensure invariance of regression operations to scale, location, LlocOperation-associated vector Δ ═ andx,y,w,h),
Figure BDA0002638454730000151
the numerical values in the above formula are all small, and in order to improve the efficiency of the multi-task training, the regularization operation is performed on delta:
x=(x-ux)/σx
detecting each Head in a networkiTotal loss of (i ═ 1,2, 3):
L(xt,g)=Lcls(ht(xt),yt)+λ[yt≥1]Lloc(ft(xt,bt),g)
Figure BDA0002638454730000152
bt=ft-1(xt-1,bt-1)
wherein T represents the total number of branches of Cascade RCNN superposition, T represents the current branch, and each branch f in Cascade RCNNtBy training data b on individual branchestOptimization, btDerived from b1The result after all the previous branches are output, instead of directly using the initial distribution b of RPN1To train ftλ is a weighting coefficient, [ y ]t≥1]Means that the regression loss, y, is calculated only in the positive samplestIs xtAccording to the above formulaetThe calculated label. In one embodiment of the invention, T is 3 and λ is 1.
Further, for the detection frames obtained through the training process, firstly, a filtering operation may be performed, the detection frames with the category scores larger than the first threshold θ are left, and set as Boxes1, for the current frame, the IOU values of the detection frames box 1 of the current frame and the tracking frames of the previous frame tracking queue are calculated first, the maximum IOU value of each detection frame is determined, if the maximum IOU value is larger than the second threshold σ, the detection frame is considered to be correctly detected, otherwise, if the maximum IOU value is smaller than the second threshold σ, it is determined whether the maximum detection score of the tracking frame in the previous video frame is larger than a third threshold, and whether the number of times that the tracking frame appears in the previous frame is larger than the minimum occurrence threshold T, and if both are larger than the corresponding thresholds, the detection frame of the current frame is erroneous. For the object tracked by using the IOU information, if no detection box which can match with the previous frame exists in the current frame, the object of the current frame is newly appeared, and the object needs to be added into the tracking queue again.
According to the multi-frame image pedestrian detection device for the night motion scene, the enhanced multi-frame images are input into the neural network for training, the plurality of trunk networks are fused in the feature extraction network of the neural network, the deformable convolution network is fused in each trunk network, the double-branch structure is arranged in the prediction network, and the pedestrian target is judged according to the inter-frame IOU value of the multi-frame images in the training process, so that the obtained pedestrian detection model can realize pedestrian detection for the multi-frame images of the night scene, and the accuracy and the robustness are high.
The invention further provides a computer device corresponding to the embodiment.
The computer device of the embodiment of the invention comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, and when the processor executes the computer program, the method for detecting pedestrians by using multi-frame images facing to the night motion scene can be realized according to the embodiment of the invention.
According to the computer device of the embodiment of the invention, when the processor executes the computer program stored on the memory, the enhanced multi-frame images are input into the neural network for training, the plurality of trunk networks are fused in the characteristic extraction network of the neural network, the deformable convolution network is fused in each trunk network, the double-branch structure is arranged in the prediction network, and the pedestrian target is judged according to the inter-frame IOU value of the multi-frame images in the training process, so that the obtained pedestrian detection model can realize the pedestrian detection aiming at the multi-frame images of the night scene, and has high accuracy and robustness.
The invention also provides a non-transitory computer readable storage medium corresponding to the above embodiment.
A non-transitory computer-readable storage medium of an embodiment of the present invention stores thereon a computer program, which when executed by a processor, can implement the method for detecting pedestrians using multiple frames of images facing a moving night scene according to the above-described embodiment of the present invention.
According to the non-transitory computer readable storage medium of the embodiment of the invention, when the processor executes the computer program stored on the processor, the enhanced multi-frame images are input into the neural network for training, the feature extraction network of the neural network is fused with a plurality of trunk networks, a deformable convolution network is fused in each trunk network, a double-branch structure is arranged in the prediction network, and the pedestrian target is judged according to the inter-frame IOU value of the multi-frame images in the training process, so that the obtained pedestrian detection model can realize pedestrian detection for the multi-frame images of night scenes, and has high accuracy and robustness.
The present invention also provides a computer program product corresponding to the above embodiments.
When the instructions in the computer program product of the embodiment of the invention are executed by the processor, the method for detecting pedestrians by using multi-frame images facing the night motion scene can be executed according to the embodiment of the invention.
According to the computer program product of the embodiment of the invention, when the processor executes the instruction, the enhanced multi-frame image is input into the neural network for training, the plurality of trunk networks are fused in the characteristic extraction network of the neural network, the deformable convolution network is fused in each trunk network, the double-branch structure is arranged in the prediction network, and the pedestrian target is judged according to the inter-frame IOU value of the multi-frame image in the training process, so that the obtained pedestrian detection model can realize pedestrian detection aiming at the multi-frame image of the night scene, and has high accuracy and robustness.
In the description of the present invention, the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implying any number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. The meaning of "plurality" is two or more unless specifically limited otherwise.
In the present invention, unless otherwise expressly stated or limited, the terms "mounted," "connected," "secured," and the like are to be construed broadly and can, for example, be fixedly connected, detachably connected, or integrally formed; can be mechanically or electrically connected; either directly or indirectly through intervening media, either internally or in any other relationship. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.
In the present invention, unless otherwise expressly stated or limited, the first feature "on" or "under" the second feature may be directly contacting the first and second features or indirectly contacting the first and second features through an intermediate. Also, a first feature "on," "over," and "above" a second feature may be directly or diagonally above the second feature, or may simply indicate that the first feature is at a higher level than the second feature. A first feature being "under," "below," and "beneath" a second feature may be directly under or obliquely under the first feature, or may simply mean that the first feature is at a lesser elevation than the second feature.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.
The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.
The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims (9)

1. A multi-frame image pedestrian detection method facing a night motion scene is characterized by comprising the following steps:
acquiring a data set containing a plurality of night multi-frame images, and performing enhancement processing on the night multi-frame images in the data set;
constructing a neural network, wherein the neural network comprises a feature extraction network and a prediction network, the feature extraction network fuses a plurality of backbone networks and comprises a feature pyramid network, a deformable convolution network is fused in each backbone network, and the prediction network comprises a double-branch structure;
training the neural network through the enhanced data set, and judging a pedestrian target according to the interframe IOU value of the multi-frame image in the training process to obtain a pedestrian detection model;
and carrying out pedestrian detection on the night multi-frame image to be detected through the pedestrian detection model.
2. The method for detecting pedestrians through multiple frames of images facing to the night moving scene, according to claim 1, wherein spatial-level image enhancement is performed on the night multiple frames of images in the data set in the form of batch data.
3. The method for detecting pedestrians through the multiframe images in the night motion scene as claimed in claim 1 or 2, wherein the main network is ResNeXt, the dual-branch structures are FC-head and Conv-head respectively, the FC-head is used as a classification network, and the Conv-head is used as a regression network.
4. The method for detecting pedestrians through the multi-frame images facing the night motion scene as claimed in claim 3, wherein the step of judging the pedestrian target according to the inter-frame IOU value of the multi-frame images in the training process comprises the following steps:
filtering the detection frames obtained by training, leaving the detection frames with the category scores larger than a first threshold value theta, setting the detection frames as Box 1, for a current frame, firstly calculating IOU values of the detection frames Boxes1 of the current frame and the tracking frames of a tracking queue of a previous frame, judging the maximum IOU value of each detection frame, if the maximum IOU value is larger than a second threshold value sigma, considering that the detection of the detection frame is correct, otherwise, if the maximum IOU value is smaller than the second threshold value sigma, judging whether the maximum detection score of the tracking frame in a previous video frame is larger than a third threshold value, and whether the number of times of the tracking frame appearing in the previous frame is larger than a minimum occurrence threshold value T, if the maximum IOU value is larger than the corresponding threshold value, judging that the detection frame of the current frame is wrong.
5. The method for detecting pedestrians through multiple frames of images in night motion scene as claimed in claim 4, wherein the regression loss L during network traininglocUsing smoothed L1Loss, x is ROI, b is predicted coordinates for ROI, g is tag coordinate values, f represents regressor,
Figure FDA0002638454720000021
b=(bx,by,bw,bh)
to ensure invariance of regression operations to scale, location, LlocOperation-associated vector Δ ═ andx,y,w,h),
Figure FDA0002638454720000022
and (3) carrying out a regularization operation on delta:
x=(x-ux)/σx
detecting each Head in a networkiTotal loss of (i ═ 1,2, 3):
L(xt,g)=Lcls(ht(xt),yt)+λ[yt≥1]Lloc(ft(xt,bt),g)
Figure FDA0002638454720000023
bt=ft-1(xt-1,bt-1)
wherein T represents the total number of branches of Cascade RCNN superposition, T represents the current branch, and each branch f in Cascade RCNNtBy training data b on individual branchestOptimization, btDerived from b1As a result of the outputs of all the branches, λ is a weighting coefficient, λ is 1, [ y ═ y-t≥1]Means that the regression loss, y, is calculated only in the positive samplestIs xtAccording to the above formulaetThe calculated label.
6. A multi-frame image pedestrian detection device facing a night motion scene, comprising:
the enhancement module is used for acquiring a data set containing a plurality of night multi-frame images and enhancing the night multi-frame images in the data set;
the device comprises a construction module and a prediction module, wherein the construction module is used for constructing a neural network, the neural network comprises a feature extraction network and a prediction network, the feature extraction network fuses a plurality of backbone networks and comprises a feature pyramid network, a deformable convolution network is fused in each backbone network, and the prediction network comprises a double-branch structure;
the training module is used for training the neural network through the enhanced data set, and judging a pedestrian target according to the interframe IOU value of the multi-frame image in the training process to obtain a pedestrian detection model;
the detection module is used for detecting pedestrians for the night multi-frame images to be detected through the pedestrian detection model.
7. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor, when executing the program, implements the method for detecting pedestrians using multiple frames of images according to any one of claims 1 to 5.
8. A non-transitory computer-readable storage medium on which a computer program is stored, the program, when executed by a processor, implementing a multi-frame image pedestrian detection method oriented to a nighttime moving scene according to any one of claims 1 to 5.
9. A computer program product, characterized in that instructions in the computer program product, when executed by a processor, perform a multi-frame image pedestrian detection method for a night-time moving scene according to any of claims 1-5.
CN202010832374.5A 2020-08-18 2020-08-18 Multi-frame image pedestrian detection method and device for night motion scene Pending CN111814755A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010832374.5A CN111814755A (en) 2020-08-18 2020-08-18 Multi-frame image pedestrian detection method and device for night motion scene

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010832374.5A CN111814755A (en) 2020-08-18 2020-08-18 Multi-frame image pedestrian detection method and device for night motion scene

Publications (1)

Publication Number Publication Date
CN111814755A true CN111814755A (en) 2020-10-23

Family

ID=72859207

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010832374.5A Pending CN111814755A (en) 2020-08-18 2020-08-18 Multi-frame image pedestrian detection method and device for night motion scene

Country Status (1)

Country Link
CN (1) CN111814755A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112365497A (en) * 2020-12-02 2021-02-12 上海卓繁信息技术股份有限公司 High-speed target detection method and system based on Trident Net and Cascade-RCNN structures
CN112528782A (en) * 2020-11-30 2021-03-19 北京农业信息技术研究中心 Underwater fish target detection method and device
CN112686344A (en) * 2021-03-22 2021-04-20 浙江啄云智能科技有限公司 Detection model for rapidly filtering background picture and training method thereof
CN112819858A (en) * 2021-01-29 2021-05-18 北京博雅慧视智能技术研究院有限公司 Target tracking method, device and equipment based on video enhancement and storage medium
CN113313078A (en) * 2021-07-02 2021-08-27 昆明理工大学 Lightweight night infrared image pedestrian detection method and system based on model optimization
CN113378857A (en) * 2021-06-28 2021-09-10 北京百度网讯科技有限公司 Target detection method and device, electronic equipment and storage medium
CN113657467A (en) * 2021-07-29 2021-11-16 北京百度网讯科技有限公司 Model pre-training method and device, electronic equipment and storage medium
CN113780193A (en) * 2021-09-15 2021-12-10 易采天成(郑州)信息技术有限公司 RCNN-based cattle group target detection method and equipment
CN113869361A (en) * 2021-08-20 2021-12-31 深延科技(北京)有限公司 Model training method, target detection method and related device
CN114972490A (en) * 2022-07-29 2022-08-30 苏州魔视智能科技有限公司 Automatic data labeling method, device, equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104091171A (en) * 2014-07-04 2014-10-08 华南理工大学 Vehicle-mounted far infrared pedestrian detection system and method based on local features
CN110837769A (en) * 2019-08-13 2020-02-25 广州三木智能科技有限公司 Embedded far infrared pedestrian detection method based on image processing and deep learning
US20200082165A1 (en) * 2016-12-16 2020-03-12 Peking University Shenzhen Graduate School Collaborative deep network model method for pedestrian detection
CN111460926A (en) * 2020-03-16 2020-07-28 华中科技大学 Video pedestrian detection method fusing multi-target tracking clues

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104091171A (en) * 2014-07-04 2014-10-08 华南理工大学 Vehicle-mounted far infrared pedestrian detection system and method based on local features
US20200082165A1 (en) * 2016-12-16 2020-03-12 Peking University Shenzhen Graduate School Collaborative deep network model method for pedestrian detection
CN110837769A (en) * 2019-08-13 2020-02-25 广州三木智能科技有限公司 Embedded far infrared pedestrian detection method based on image processing and deep learning
CN111460926A (en) * 2020-03-16 2020-07-28 华中科技大学 Video pedestrian detection method fusing multi-target tracking clues

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
HE010103: "100kfps多目标追踪器-iou-tracker", 《HTTPS://ZHUANLAN.ZHIHU.COM/P/35291325》 *
HIROSHI FUKUI等: "Pedestrian detection based on deep convolutional neural network with ensemble inference network", 《2015 IEEE INTELLIGENT VEHICLES SYMPOSIUM (IV)》 *
宣晓刚等: "一种无监督视频行人检测与估计算法", 《杭州电子科技大学学报》 *
罗志鹏: "CVPR 2020夜间行人检测挑战赛两冠一亚:DeepBlueAI团队获胜方案解读", 《HTTPS://PICTURE.ICZHIKU.COM/WEIXIN/MESSAGE1592815205387.HTML》 *
葛俊锋等: "一种改进的夜间行人检测算法", 《计算机工程》 *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112528782B (en) * 2020-11-30 2024-02-23 北京农业信息技术研究中心 Underwater fish target detection method and device
CN112528782A (en) * 2020-11-30 2021-03-19 北京农业信息技术研究中心 Underwater fish target detection method and device
CN112365497A (en) * 2020-12-02 2021-02-12 上海卓繁信息技术股份有限公司 High-speed target detection method and system based on Trident Net and Cascade-RCNN structures
CN112819858A (en) * 2021-01-29 2021-05-18 北京博雅慧视智能技术研究院有限公司 Target tracking method, device and equipment based on video enhancement and storage medium
CN112819858B (en) * 2021-01-29 2024-03-22 北京博雅慧视智能技术研究院有限公司 Target tracking method, device, equipment and storage medium based on video enhancement
CN112686344A (en) * 2021-03-22 2021-04-20 浙江啄云智能科技有限公司 Detection model for rapidly filtering background picture and training method thereof
CN113378857A (en) * 2021-06-28 2021-09-10 北京百度网讯科技有限公司 Target detection method and device, electronic equipment and storage medium
CN113313078A (en) * 2021-07-02 2021-08-27 昆明理工大学 Lightweight night infrared image pedestrian detection method and system based on model optimization
CN113657467A (en) * 2021-07-29 2021-11-16 北京百度网讯科技有限公司 Model pre-training method and device, electronic equipment and storage medium
CN113657467B (en) * 2021-07-29 2023-04-07 北京百度网讯科技有限公司 Model pre-training method and device, electronic equipment and storage medium
CN113869361A (en) * 2021-08-20 2021-12-31 深延科技(北京)有限公司 Model training method, target detection method and related device
CN113780193A (en) * 2021-09-15 2021-12-10 易采天成(郑州)信息技术有限公司 RCNN-based cattle group target detection method and equipment
CN114972490A (en) * 2022-07-29 2022-08-30 苏州魔视智能科技有限公司 Automatic data labeling method, device, equipment and storage medium
CN114972490B (en) * 2022-07-29 2022-12-20 苏州魔视智能科技有限公司 Automatic data labeling method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN111814755A (en) Multi-frame image pedestrian detection method and device for night motion scene
CN111160379B (en) Training method and device of image detection model, and target detection method and device
US11062123B2 (en) Method, terminal, and storage medium for tracking facial critical area
Bautista et al. Convolutional neural network for vehicle detection in low resolution traffic videos
CN108960266B (en) Image target detection method and device
CN111104903B (en) Depth perception traffic scene multi-target detection method and system
Kalsotra et al. Background subtraction for moving object detection: explorations of recent developments and challenges
CN107944403B (en) Method and device for detecting pedestrian attribute in image
CN107273832B (en) License plate recognition method and system based on integral channel characteristics and convolutional neural network
CN102087790B (en) Method and system for low-altitude ground vehicle detection and motion analysis
Ippalapally et al. Object detection using thermal imaging
Luo et al. Traffic analytics with low-frame-rate videos
Wu et al. UAV imagery based potential safety hazard evaluation for high-speed railroad using Real-time instance segmentation
Thomas et al. Moving vehicle candidate recognition and classification using inception-resnet-v2
CN111814754A (en) Single-frame image pedestrian detection method and device for night scene
Toprak et al. Conditional weighted ensemble of transferred models for camera based onboard pedestrian detection in railway driver support systems
Oğuz et al. A deep learning based fast lane detection approach
Babaei Vehicles tracking and classification using traffic zones in a hybrid scheme for intersection traffic management by smart cameras
Ghasemi et al. A real-time multiple vehicle classification and tracking system with occlusion handling
CN111027482B (en) Behavior analysis method and device based on motion vector segmentation analysis
Anees et al. Deep learning framework for density estimation of crowd videos
Yang et al. High-speed rail pole number recognition through deep representation and temporal redundancy
EP4332910A1 (en) Behavior detection method, electronic device, and computer readable storage medium
CN116189286A (en) Video image violence behavior detection model and detection method
CN112949634B (en) Railway contact net nest detection method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20201023

RJ01 Rejection of invention patent application after publication