CN114863370A - Complex scene high altitude parabolic identification method and system - Google Patents

Complex scene high altitude parabolic identification method and system Download PDF

Info

Publication number
CN114863370A
CN114863370A CN202210796750.9A CN202210796750A CN114863370A CN 114863370 A CN114863370 A CN 114863370A CN 202210796750 A CN202210796750 A CN 202210796750A CN 114863370 A CN114863370 A CN 114863370A
Authority
CN
China
Prior art keywords
picture
sequence
training
parabola
altitude
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210796750.9A
Other languages
Chinese (zh)
Other versions
CN114863370B (en
Inventor
孙俊
康凯
刘海峰
艾坤
王子磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei Zhongke Leinao Intelligent Technology Co ltd
Original Assignee
Hefei Zhongke Leinao Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei Zhongke Leinao Intelligent Technology Co ltd filed Critical Hefei Zhongke Leinao Intelligent Technology Co ltd
Priority to CN202210796750.9A priority Critical patent/CN114863370B/en
Publication of CN114863370A publication Critical patent/CN114863370A/en
Application granted granted Critical
Publication of CN114863370B publication Critical patent/CN114863370B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Image Analysis (AREA)

Abstract

The invention is suitable for the technical field of intelligent identification, and provides a complex scene high altitude parabolic identification method and a complex scene high altitude parabolic identification system, wherein an image sequence set SI and an object picture set SO are synthesized to obtain a training picture sequence set DI with a parabola; using a training picture sequence set DI training time sequence block segmentation network to obtain TPSNet; and (4) carrying out parabolic recognition prediction on the test video by using the TPSNet obtained after training. The method solves the problem that a deep learning algorithm for the high-altitude parabolic field needs a large amount of multi-dimensional high-altitude parabolic labeled data according to the characteristic synthesis data of the high-altitude parabolic. Through higher down sampling, specific shallow features are ignored when the network extracts features, sequence features with relative semantics are concerned, and the problem that seamless migration to each scene cannot be achieved at low cost is solved.

Description

Complex scene high altitude parabolic identification method and system
Technical Field
The invention belongs to the technical field of intelligent identification, and particularly relates to a complex scene high-altitude parabolic identification method and system.
Background
With the continuous development of society, the range of activities of people is changed from only the ground to the ground and above, so that a big problem caused by the development is throwing things at high altitude. Once the residents living in the high-rise and the high-rise cleaning operators or the electric power workers climbing telegraph poles and electric power iron stands do not pay attention to the standard behaviors, the related devices are directly thrown away during the high-altitude operation, and casualty accidents are likely to happen. Thus, high altitude parabolic recognition helps to normalize the behavior of the associated person while helping to backtrack problems.
The method is different from the traditional feature extraction method, the deep learning algorithm is good in robustness and strong in mobility, and features of better representation images can be extracted. Generally, information in a picture is extracted by using a Convolutional Neural Network (CNN), but the CNN cannot directly process a task related to a time sequence, and a Recurrent Neural Network (RNN) can better make up for the deficiency of the CNN because input data is cyclically processed in a modeling process. The deep learning algorithm needs to use enough labeled data to perform iterative training continuously, so as to improve the performance, and therefore, the data is an important consideration for applying the deep learning algorithm. In addition, the deep learning algorithm generally has huge parameters, and the performance of the model (the result obtained by the deep learning algorithm) has a certain relationship with the parameter quantity, so that the relationship between the performance and the efficiency needs to be considered comprehensively for some tasks with higher real-time performance.
The Chinese patent application (publication number: CN 112308000A) discloses a high-altitude parabolic detection method based on space-time information, which is based on the idea of pixel segmentation, collects a large number of high-altitude parabolic sequence pictures (various background interference scenes, various rays and various shooting angles), marks parabolas on each picture by using a rectangular frame, and trains a convolutional neural network. When the method is used, N pictures of continuous video frames are input into a convolutional neural network, a map is predicted, a parabolic sequence is displayed on the map, and whether high-altitude parabolas exist is determined by judging whether a vertical direction communication region with a certain length exists in the sequence. This method has problems: 1) the demand for the data volume is too high, and not only enough data needs to be collected, but also each piece of data needs to be labeled; 2) because the method is based on the idea of segmentation, the integral calculation amount is too high, and the real-time performance is difficult to meet; 3) the method cannot solve the false alarm of the upper throwing object.
The Chinese patent application (publication number: CN 113223081A) discloses a high-altitude parabolic detection method and system based on background modeling and deep learning, wherein a background model is established through a Gaussian mixture model, after the background model is successfully established, a difference image is obtained by subtracting a background model picture from a newly input video frame, image and position information of a foreground target are obtained through image preprocessing, noise in the foreground is extracted through CNN, the steps of extracting the foreground and the characteristics are repeated, matching is carried out according to the characteristics extracted through the CNN to obtain a foreground set, and then LSTM is used for carrying out time sequence analysis to determine whether the object is a high-altitude parabolic object. The problems of this method are: 1) the background modeling method fails in a complex scene; 2) in order to ensure that the CNN can distinguish the background and the foreground in different application scenarios, and the extracted features have discriminability (when the CNN is used for matching two objects, if the features do not have discriminability, it cannot be determined whether the two objects are an object), the CNN needs to acquire enough data if it needs more data for training, and the background and the foreground are dynamic concepts and not fixed.
The Chinese patent application (publication number: CN 111931719A) discloses a high-altitude parabolic detection method and a high-altitude parabolic detection device, wherein a plurality of frames of images in a video are input into a trained high-altitude parabolic detection model, the high-altitude parabolic detection model comprises a convolutional neural network model and a cyclic neural network model which are connected in series, the convolutional neural network model is used for identifying a high-altitude object, and the cyclic neural network model is used for detecting whether the high-altitude object identified by the convolutional neural network is a high-altitude parabolic object. The problems of this method are: 1) because the method considers the problem of the class of the object, the model is required to be retrained when being transferred to other objects; 2) training the convolutional neural network and the cyclic neural network requires a large amount of real data and labeling, and the cost is very high.
In summary, the conventional high altitude parabola detection method has the following problems: 1) the traditional algorithm has low robustness when meeting a complex background, needs a large amount of data in a deep learning algorithm, and has high collection and labeling cost; 2) the algorithm based on deep learning is limited by a modeling mode, and seamless migration is difficult to achieve, namely, seamless migration to each scene with low cost is difficult to achieve; 3) it is difficult to maintain both a fast speed and good results.
Disclosure of Invention
In order to solve the above problems, in one aspect, the present invention discloses a complex scene high altitude parabola identification method, including:
synthesizing the image sequence set SI and the object picture set SO to obtain a training picture sequence set DI with a parabola;
using a training picture sequence set DI training time sequence block segmentation network to obtain TPSNet;
and (4) carrying out parabolic recognition prediction on the test video by using the TPSNet obtained after training, and judging whether a high-altitude parabolic exists in the test video.
Further, the method for acquiring the image sequence set SI includes:
collecting N1 video segments, and extracting each frame of image in the video;
dividing all images in each video into a plurality of image sequence segments S with the length T according to time sequence;
all the image sequence segments obtained by the division constitute an image sequence set SI.
Further, the image sequence set SI specifically includes:
{ (0, 1.·, T-1), (sep, 1+ sep.,....,. T + sep-1),... ere, wherein (sep, 1+ sep.,..., T + sep-1) represents a numbered set of images within a set of image sequence segments S; sep represents the interval of sampling; t represents the length of a single image sequence segment S;
the image sequence set SI includes 1+ FIoor ((VL-T)/sep) image sequence segments, where FIoor represents rounding-down and VL represents the number of pictures extracted per video.
Further, the method for acquiring the object picture set SO includes:
collecting N2 object pictures, and segmenting and labeling the object pictures;
and obtaining a segmented and labeled object picture set SO.
Further, the segmenting and labeling the object picture specifically includes:
marking the required objects in the object picture to obtain a segmentation marking label of each required object i
Generating a label set corresponding to each object picture as { (im) 1 ,label 1 )(im 2 ,label 2 ),......,(im n ,label n ) In which im is 1 、im 2 、...、im n Refers to picture index numbers; label 1 、label 2 、...、label n Labels referring to n desired objects, (im) n ,label n ) Representing the nth group of labels in the label set;
after all the object pictures are segmented and labeled, integrating the label set of each obtained object picture to obtain a segmented and labeled picture and segmented label set SO, wherein the picture and segmented label set SO is expressed as { (im) i ,label i )| i∈[1,m]Therein (im) i ,label i ) And the i-th group of labels in the picture and segmentation label set are represented, and m represents that m required objects are arranged in all the object pictures.
Further, the method for synthesizing the training picture sequence set DI comprises:
randomly selecting an object picture from the object picture set SO as a foreground picture, and selecting an image sequence segment from the image sequence set SI as a background picture;
synthesizing the selected foreground picture and the selected background picture according to the characteristic of the high-altitude parabola to obtain a training picture sequence D with the parabola;
and repeatedly synthesizing to obtain a plurality of training picture sequences D with parabolas, wherein the plurality of training picture sequences D form a training picture sequence set DI.
Further, the characteristics of the high altitude parabola specifically include:
presetting a longitudinal interval sequence Fall conforming to the law of free Fall comprising k coordinates base Is (fb) 0 ,fb 1 ,fb 2 ,......,fb k-1 ),fb k-1 Is shown in the longitudinal interval sequence Fall base The kth longitudinal interval value;
presetting a group of transverse disturbance XOffset as (xf) 0 ,xf 1 ,xf 2 ,......,xf n ) Wherein xf is n Represents the n +1 th lateral disturbance value in a group of lateral disturbances XOffset;
presetting a set of area ratio ratios as (r) 0 ,r 1 ,r 2 ,......,r n ) Wherein r is n Representing the (n + 1) th area ratio in a set of area ratio ratios;
presetting a group of ordinate scaling coefficients YScale to (ys) 0 ,ys 1 ,ys 2 ,......,ys n ) Wherein ys is n Indicating the (n + 1) th ordinate scaling factor in a set of ordinate scaling factors YScale.
Further, the synthesizing the selected foreground picture and the selected background picture to obtain the training picture sequence D with a parabola specifically includes:
in the selected background picture, determining a position information set of the foreground picture through characteristic transformation of a high-altitude parabola;
determining the size information of the foreground picture according to the areas of the foreground picture and the background picture;
and according to the position information set and the size information of the foreground picture, sequentially combining the foreground picture and a group of background pictures into a group of training pictures with parabolas to form a training picture sequence D.
Further, the determining the position information set of the foreground picture in the selected background picture through the characteristic transformation of the high altitude parabola specifically includes:
randomly selecting a coordinate as an initial coordinate loc within the coverage range of the selected background picture base
Randomly selecting a position index ind from the interval of [0, T-3) as the index of the initial parabolic appearance, and spacing the sequence Fall from the longitudinal direction base Randomly selecting a longitudinal interval sequence (fb) with the length of T-ind-1 from the middle sequence j ,...,fb j+T-ind-2 ) Wherein fb j To show Fall base The (j + 1) th longitudinal interval value, fb j+T-ind-2 To show Fall base The j + T-ind-1 longitudinal interval value;
randomly selecting a scaling coefficient ys from the ordinate scaling coefficient Yscale to obtain a new longitudinal interval sequence (ys fb) j ,...,ys*fb j+T-ind-2 );
Randomly selecting abscissa disturbance sequence from transverse disturbance XOffset
Figure 404441DEST_PATH_IMAGE001
Wherein k is i E.n (i = {0, 1, 2.., T-ind-1 }), wherein
Figure 69908DEST_PATH_IMAGE002
Denotes the kth in XOffset T-ind-1 An abscissa perturbation value;
new interval sequence and abscissa perturbation sequence are compared with the initial coordinate loc base Adding the corresponding coordinate values to obtain a position sequence loc = (loc) of the foreground picture in the background picture of one image sequence segment base +
Figure 558658DEST_PATH_IMAGE003
,loc base +
Figure 967774DEST_PATH_IMAGE004
+ys*fb j ,...,loc base +
Figure 225580DEST_PATH_IMAGE005
+ys*fb j+T-ind-3 ,loc base +
Figure 327528DEST_PATH_IMAGE006
+ys*fb j+T-ind-2 )。
Further, the determining the size information of the foreground picture according to the areas of the foreground picture and the background picture specifically includes:
calculating the object area A in the foreground picture fi And picture area A in background picture bi
Randomly selecting a ratio r from the area ratios to obtain a new object area
Figure 530274DEST_PATH_IMAGE007
=A bi *r;
Further obtaining the scaling fr =of the object
Figure 274240DEST_PATH_IMAGE008
Labeling label according to segmentation in foreground picture i Generating an area mask covered by each required object i
Picture im of an object according to the scaling fr of the object i Area mask covered by object i Zooming to obtain a new object picture n _ im with a parabola i And n _ mask i
Further, the sequentially synthesizing the foreground picture and the set of background pictures into the set of training pictures with the parabola specifically includes:
establishing an index sequence number (0, 1, 2., T-1) for a background picture of an image sequence segment;
sequentially selecting the background picture and the position coordinates in the position sequence loc;
judging whether the index sequence number of the background picture is smaller than the position index ind;
skipping the background picture when the background picture index sequence number is smaller than the position index ind;
when the index number of the background picture is not less than the index ind of the position, a new object picture n _ im with a parabola is taken i And the area n _ mask covered by the object i Simultaneously carrying out data transformation; and superposing the new object picture to the corresponding background picture according to the position coordinate in the position sequence loc to synthesize a training picture.
Further, the time sequence block segmentation network comprises a feature extraction layer, a feature transformation layer, a time sequence feature extraction layer and a classification layer;
the characteristic extraction layer is based on a CNN network and is used for extracting the characteristics of pictures, splicing picture sequences with input length T and channel 3 into data with length 1 and channel 3T in sequence, and then changing the number of input channels of the first layer of convolution of the used backhaul network to be 3T;
the feature conversion layer is used for converting the picture features extracted by the feature extraction layer into the acceptable input of the time sequence feature extraction layer according to the number of the divided blocks;
the time sequence feature extraction layer is used for inputting the transformed picture features into a recurrent neural network to obtain the features on the time sequence dimension;
and the classification layer is used for classifying the extracted time sequence characteristics and judging whether a parabola exists.
Further, the method for obtaining the TPSNet comprises the following steps:
and performing data enhancement transformation on the training picture sequence set DI, inputting the training picture sequence set DI into a time sequence block segmentation network, and optimizing by using a loss function to obtain a converged network TPSNet.
Further, the performing parabola identification prediction on the test video and judging whether a high altitude parabola exists in the test video specifically includes:
acquiring a test video, dividing the test video into a plurality of sequences with the length of T, and identifying and predicting by using TPSNet;
obtaining the score of each sequence of high altitude parabolas, and recording the score;
and judging whether a high-altitude parabolic object exists in the test video or not according to the score, and if the high-altitude parabolic object exists in the test video, alarming and recording a corresponding video image segment.
Further, obtaining a score of each sequence of high altitude parabolas, and recording the score specifically includes:
deriving a score scores = (score) predicting whether each block of each sequence contains a parabola or not 0 ,score 1 ,...,score p ) Wherein P is the number of blocks into which the input picture is divided;
screening the score of each patch, and if the score is smaller than a set first threshold, filtering, and not recording the score value; and if the score is not less than the set first threshold, recording the score value, and finally obtaining a qualified effective score sequence set.
Further, the determining whether a high altitude parabola exists in the test video specifically includes:
judging whether the number of data in the effective fraction sequence set is greater than the preset sequence length and whether the number of data in the effective fraction sequence set which is greater than a second threshold value is greater than the preset number;
and when the number of the data in the effective fraction sequence set is larger than the preset sequence length and the number of the data in the effective fraction sequence set, which is larger than the second threshold value, is larger than the preset number, alarming and recording the corresponding video image segment.
In another aspect, the invention further discloses a complex scene high altitude parabolic recognition system, which comprises:
the training picture generation module is used for synthesizing the image sequence set SI and the object picture set SO to obtain a training picture sequence set DI with a parabola;
the time sequence patch segmentation network training module is used for training a time sequence block segmentation network by using a training picture sequence set DI to obtain TPSNet;
and the detection and identification module is used for carrying out parabolic identification prediction on the test video by using the TPSNet obtained after training and judging whether high-altitude parabolic exists in the test video.
Further, the training picture generation module specifically includes:
the video frame extracting unit is used for collecting N1 video segments and extracting each frame image in the video;
the background image dividing unit is used for dividing all images in each video into a plurality of image sequence segments S with the length T according to time sequence; all the image sequence segments obtained by division form an image sequence set SI;
the object picture labeling unit is used for collecting N2 object pictures and carrying out segmentation and labeling on the object pictures; obtaining a segmented and labeled object picture set SO;
the image selecting unit is used for randomly selecting an object image from the object image set SO as a foreground image and selecting an image sequence segment from the image sequence set SI as a background image;
the picture synthesis unit is used for synthesizing the selected foreground picture and the selected background picture according to the characteristic of the high-altitude parabola to obtain a training picture sequence D with the parabola; and repeatedly synthesizing to obtain a plurality of training picture sequences D with parabolas, wherein the plurality of training picture sequences D form a training picture sequence set DI.
Further, the time series patch segmentation network training module is specifically configured to perform data enhancement transformation on a training picture sequence set DI, input the data enhancement transformation into a time series block segmentation network, and perform optimization by using a loss function to obtain a converged network TPSNet;
the time sequence block segmentation network comprises a feature extraction layer, a feature transformation layer, a time sequence feature extraction layer and a classification layer;
the characteristic extraction layer is based on a CNN network and is used for extracting the characteristics of pictures, splicing picture sequences with input length T and channel 3 into data with length 1 and channel 3T in sequence, and then changing the number of input channels of the first layer of convolution of the used backhaul network to be 3T;
the feature conversion layer is used for converting the picture features extracted by the feature extraction layer into the acceptable input of the time sequence feature extraction layer according to the number of the divided blocks;
the time sequence feature extraction layer is used for inputting the transformed picture features into a recurrent neural network to obtain the features on the time sequence dimension;
and the classification layer is used for classifying the extracted time sequence characteristics and judging whether a parabola exists.
Further, the detection and identification module specifically includes:
the test video processing unit is used for acquiring a test video, dividing the test video into a plurality of sequences with the length of T, and performing identification prediction by using TPSNet; obtaining the score of each sequence of high altitude parabolas, and recording the score;
and the identification and judgment unit is used for judging whether the high-altitude parabolic object exists in the test video according to the score, and alarming and recording the corresponding video image segment if the high-altitude parabolic object exists in the test video.
Compared with the prior art, the invention has the following beneficial effects:
the invention provides a complex scene high-altitude parabolic recognition method, which provides a data synthesis method and a new modeling mode in the execution process of the recognition method, avoids the requirement on large-scale collection of data, has good generalization in different scenes and is high in speed:
1. the method solves the problem that a deep learning algorithm for the high-altitude parabolic field needs a large amount of multi-dimensional high-altitude parabolic labeled data according to the characteristic synthesis data of the high-altitude parabolic.
2. Through higher down sampling, specific shallow features are ignored when the network extracts features, sequence features with relative semantics are concerned, and the problem that seamless migration to each scene cannot be achieved at low cost is solved.
3. Different from the commonly adopted strong positioning, the method adopts a weak positioning mode, so that the positioning requirement on the model is reduced while the subsequent manual checking is facilitated, and the efficient model can be constructed; the sequence information is coupled to the feature dimension in the input process, and then the features are extracted, so that the efficiency of extracting the image features is ensured; the characteristic dimension is converted into the sequence dimension, and the extracted semantic information can be subjected to sequence integration by using a sequence model, so that the final performance is ensured; the problem that the high speed and the good effect are difficult to maintain simultaneously is solved.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
FIG. 1 illustrates a data synthesis and model training flow diagram of an embodiment of the present invention;
fig. 2 shows a flow chart of the parabolic identification prediction according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
The high-altitude parabolic identification method based on data synthesis and time sequence block segmentation mainly comprises the following steps:
the data synthesis and model training process shown in fig. 1 includes the following steps 1-4; and a parabolic identification prediction flow shown in fig. 2, as follows step 5.
Step 1: collecting N1 video segments, extracting each frame in the video, and dividing an image sequence in each video according to the length T to obtain an image sequence set SI;
step 2: collecting N2 object pictures, and carrying out segmentation and labeling to obtain a picture and a segmentation and labeling object picture set SO;
and step 3: randomly selecting a picture from the SO, randomly selecting a sequence S from the image sequence set SI, synthesizing a picture sequence D with a parabola according to the characteristics of the high altitude parabola (the longitudinal coordinate basically accords with the law of free fall, the rotation occurs in the falling process, and a certain transverse offset exists in the falling process), and repeatedly synthesizing to finally obtain a picture sequence set DI with the parabola;
and 4, step 4: training a Time sequence block Segmentation Network by using a synthesized image sequence set DI (direct integration) and positioning in a parabolic weak positioning mode, namely positioning a vertical region in which a parabola appears in a sequence, integrating input sequence dimensionality into characteristic dimensionality, simultaneously extracting characteristics of a picture, splitting sequence information from the characteristic dimensionality by using a characteristic conversion layer, performing sequence modeling on the characteristics rich in semantic information by using a sequence model, and finally obtaining a TPSNet (Time-sequence Patch Segmentation Network); the sequence model refers to a model that can extract the correlation of consecutive sequences in the time dimension.
And 5: inputting a test video, predicting by using the trained TPSNet to obtain a score Scors for predicting the high-altitude parabola of each sequence, recording the score by using a recorder, judging whether the high-altitude parabola is present or not according to the sensitivity, and alarming and recording a corresponding video image if the high-altitude parabola is present.
Specifically, step 1 comprises the following steps:
step 1.1, collecting N1 video segments, performing frame extraction on each video, numbering the first frame of each video (assumed to be composed of VL pictures) from 0, and obtaining { (0, 1,... page., T-1), (sep, 1+ sep,. page., T + sep-1) } for 1+ FIoor ((VL-T)/sep) group image sequence segments, wherein sep represents the interval of sampling, (sep, 1+ sep,. page., T + sep-1) represents the number set of images in a group of image sequence segments S; t represents the length of a single sequence, FIoor represents rounding-down, and finally the set SI of the image sequences of all videos is obtained;
for step 2, the following steps are included:
step 2.1: collecting N2 object pictures, labeling the objects in the pictures to obtain the segmentation label of each object in need i Assuming that there are n objects in demand in a picture im1, im1 is a numerical value at this time, and is used to index a specific picture, the annotation set generated for that picture is { (im) 1 ,label 1 )(im 2 ,label 2 ),......,(im n ,label n ) In which im is 1 、im 2 、...、im n Denoted by picture im 1; label 1 、label 2 、...、label n Refers to the labeling of n desired objects. Assuming that there are m required objects in all the N2 object pictures, the set SO of all the pictures and the segmentation labels is finally obtained as { (im) i ,label i )| i∈[1,m]Therein (im) i ,label i ) Representing the ith group of labels in the segmentation label set; the segmentation label is a series of point coordinate sets, im i Representing the picture index number in the ith group of labels; label i And (3) representing the segmentation label of the ith required object, wherein points in the point coordinate sets are sequentially connected to frame the region where the required object is located, or points in the point coordinate sets are sequentially connected to form an outer contour curve of the required object.
For step 3, the following steps are included:
step 3.1: randomly selecting an object label (im) from the SO i ,label i ) As foreground picture FI i FI is the set of all foreground pictures: { (im) 0 ,label 0 ),(im 1 ,label 1 ), ....,(im i ,label i )},FI i A picture showing the i-th object label. Randomly selecting a group of image sequences from SI (
Figure 652131DEST_PATH_IMAGE009
Figure 924981DEST_PATH_IMAGE009
+1,.....,
Figure 326006DEST_PATH_IMAGE009
+ T-1) as background picture BI j ,BI={BI 0 , BI 1 , ..., BI n Total n groups of sequences, BI j Representing the jth sequence. Longitudinal interval sequence Fall conforming to free Fall law and comprising k coordinates is designed in advance base Is (fb) 0 ,fb 1 ,fb 2 ,......,fb k-1 ),fb k-1 Is shown in the longitudinal interval sequence Fall base The kth longitudinal interval value; presetting a group of transverse disturbance XOffset as (xf) 0 ,xf 1 ,xf 2 ,......,xf n ) Wherein xf is n Represents the n +1 th lateral disturbance value in a group of lateral disturbances XOffset; presetting a set of area ratio ratios as (r) 0 ,r 1 ,r 2 ,......,r n ) Wherein r is n Representing the (n + 1) th area ratio in a set of area ratio ratios; presetting a group of ordinate scaling coefficients YScale to (ys) 0 ,ys 1 ,ys 2 ,......,ys n ) Wherein ys is n Indicating the (n + 1) th ordinate scaling factor in a set of ordinate scaling factors YScale.
Step 3.2: calculating a background picture BI j And randomly selecting a coordinate as an initial coordinate loc within the picture coverage range base Said initial coordinate loc base Is a point coordinate with an abscissa and an ordinate. Randomly selecting a position index ind from the interval of [0, T-3) as an index of the initial parabolic appearance, wherein the value range of the ind is [0, T-3) so as to ensure that at least three continuous photos can be selected; from Fall base Selecting a longitudinal interval sequence (fb) with the length of T-ind-1 in a random sequence j ,...,fb j+T-ind-2 ),fb j To show Fall base The (j + 1) th longitudinal interval value, fb j+T-ind-2 To show Fall base The (j + T-ind-1) th longitudinal interval value is selected, and a scaling coefficient ys is randomly selected from the scale to obtain a new interval sequence (ys fb) j ,...,ys*fb j+T-ind-2 ) (ii) a Subsequent random selection of abscissa from XOffsetPerturbation sequence
Figure 873662DEST_PATH_IMAGE010
Wherein k is i ∈N(i={0,1,2,...,T-ind-1}),
Figure 309323DEST_PATH_IMAGE011
Denotes the kth in XOffset T-ind-1 An abscissa perturbation value;
perturbing the new interval sequence and abscissa with loc base Correspondingly adding to obtain a position sequence loc = (loc) of the foreground picture in the background picture of one image sequence segment base +
Figure 815390DEST_PATH_IMAGE012
,loc base +
Figure 438133DEST_PATH_IMAGE013
+ys*fb j ,...,loc base +
Figure 55059DEST_PATH_IMAGE014
+ys*fb j+T-ind-3 ,loc base +
Figure 79647DEST_PATH_IMAGE015
+ys*fb j+T-ind-2 );
Step 3.3: calculating foreground picture FI i Area A of the middle object fi Background picture BI j Area A of picture in bi Randomly selecting a ratio r from the ratios according to
Figure 756616DEST_PATH_IMAGE016
=A bi Calculating to obtain the area of new object
Figure 866654DEST_PATH_IMAGE016
Calculating the scaling fr =ofthe object
Figure 21692DEST_PATH_IMAGE017
Step 3.4: according to the foreground mapSlice segmentation label i Generating a mask for each object covering area i ;mask i Representing the area covered by the ith object, im, from the object picture according to the object scaling fr found in step 3.3 i Area mask covered by object i Zooming to obtain a new object picture n _ im with a parabola i And n _ mask i
Step 3.5: establishing an index (0, 1., T-1) for the background picture, selecting the current parabolic coordinate and the background picture according to the sequence, skipping if the background picture index is less than ind, otherwise, performing the step 3.4 of n _ im i And n _ mask i After the data transformation is used for transformation, the object picture is superposed into the background picture according to the corresponding coordinate in the loc, and a picture containing a parabola is obtained; finally, a group of training picture sequence sets DI containing parabolas is obtained; the data transformation includes random inversion, gaussian noise, random rotation, and the like.
For step 4, the following steps are included:
step 4.1: designing a time sequence block segmentation network, wherein the time sequence block segmentation network consists of four parts, namely a feature extraction layer, a feature transformation layer, a time sequence feature extraction layer and a classification layer.
Step 4.2: the feature extraction layer is a CNN-based network and mainly aims to extract features of pictures. In order to ensure the overall efficiency in the last run, we first splice the picture sequence with channel 3 and input length T into data with channel 3 and length 1 in sequence, then the commonly used backbone network (a neural network model for extracting the characteristics of the high, middle and low layers of the image) is changed, the number of input channels of the first layer convolution is 3T, the rest is maintained unchanged, so that only the calculated amount is increased on the first layer (almost negligible), the subsequent calculated amount is consistent with the calculated amount when one image is input, moreover, a plurality of down-sampling steps are added on the premise of keeping the number of the network layers unchanged (because a weak positioning mode is adopted, down-sampling can be greatly adopted without worrying about the influence effect too much), the operation efficiency of the network is further ensured, and the size of the finally obtained feature image is C x H W;
step 4.3: the feature conversion layer converts the features extracted by the feature extraction layer into an input acceptable to the time-series feature extraction layer according to the number of blocks dividing the input picture. By way of example, for the feature C × H × W extracted in step 4.2, a pooling layer with kernel (convolution kernel) size (k1, k2) and step size (s1, s2) is used to down-sample W direction, and the feature in the height direction is globally pooled (since we use weak localization, so the height space information can be globally pooled here without much worrying about the effect), so as to obtain a new feature C × 1P, which is then converted into a feature dimension P × T (C/T), where C is an integer multiple of T, and P is the number of blocks dividing the input picture;
step 4.4: the time sequence feature extraction layer inputs the converted features into a recurrent neural network to obtain features on time sequence dimensions, and then a classification layer is used for classifying the extracted time sequence features to judge whether parabolas exist or not;
step 4.5: and (3) performing some data enhancement transformations on the parabolic sequence picture DI containing the parabolic sequence acquired in the step (3), inputting the parabolic sequence picture DI into a time sequence block segmentation network, and optimizing by using BCE (basic loss function) as a loss function to finally obtain a converged network TPSNet. The data enhancement transformation comprises Gaussian blur, random clipping, size change, random inversion and the like.
For step 5, the following steps are included:
step 5.1: for the input video, the sequence T is input, and prediction is performed using TPSNet, resulting in a score scores = (score) that predicts whether each block of each sequence contains a parabola or not 0 ,score 1 ,...,score p ) Wherein P is the number of blocks (patch) dividing the input picture, the score of each patch is screened, if the score is smaller than a set first threshold thr1, filtering is performed, otherwise, recording is performed; the recorded effective score of score is obtained by supposing that the score for obtaining a certain block i continuously meets the condition i =(score i 0 ,score i 1 ,...,score i K ) Wherein K represents the recorded sequence length; if the effective fraction is record-score i The number of the median score larger than the second threshold thr2 is larger than the preset number HighNum, and the effective score record-score i If the sequence length K is greater than the preset number SeqNum, the object is regarded as a high-altitude object, an alarm is output, and a corresponding video clip is recorded; it is noted that HighNum, SeqNum, thr1 and thr2 control the sensitivity of the overall method, and their values can be changed if different sensitivities of the model are desired.
An embodiment of the high-altitude parabolic identification method based on data synthesis and time sequence block segmentation is as follows:
step 1: collecting N1 video segments, performing frame extraction on each video, and numbering the first frame extracted from each video (if the video consists of VL pictures) from 0 to obtain a { (0, 1,. talka., T-1), (sep, 1+ sep,. talka., T + sep-1),. talka. }, 1+ FIoor ((VL-T)/sep) group image sequence segments, wherein sep represents the interval of sampling, and (sep, 1+ sep,. talka., T + sep-1) represents the numbering set of images in a group of image sequence segments S; t represents the length of a single sequence, FIoor represents rounding-down, and finally the set SI of the image sequences of all videos is obtained; in this embodiment, N1 is 2, sep is 1, and T is 5;
step 2: collecting N2 object pictures, labeling the objects in the pictures to obtain the segmentation label of each object in need i Assuming that there are n objects in a picture im1, the annotation set generated for that picture is { (im) 1 ,label 1 )(im 2 ,label 2 ),......,(im n ,label n ) In which im is 1 、im 2 、...、im n Denoted by picture im 1; label 1 、label 2 、...、label n Refers to the labeling of n desired objects. Assuming that m required objects exist in all the N2 object pictures, finally, the set SO of all the pictures and the segmentation labels is obtained as { (im) i ,label i )| i∈[1,m]Therein (im) i ,label i ) Representing the ith group of labels in the segmentation label set; in this example, N2 is 6;
and step 3: randomly selecting a picture from the SO, randomly selecting a sequence S from the image sequence set SI, synthesizing a picture sequence D with a parabola, and repeatedly synthesizing to finally obtain an image sequence set DI with the parabola;
specifically, step 3.1: randomly selecting an object label (im) from the SO i ,label i ) As foreground picture FI i Randomly selecting a set of image sequences from SI (
Figure 166366DEST_PATH_IMAGE009
Figure 14236DEST_PATH_IMAGE009
+1,.....,
Figure 611570DEST_PATH_IMAGE009
+ T-1) as background picture BI j The longitudinal interval sequence Fall conforming to the free Fall law and comprising k coordinates is designed in advance base Is (fb) 0 ,fb 1 ,fb 2 ,......,fb k-1 ) (ii) a Presetting a group of transverse disturbance XOffset as (xf) 0 ,xf 1 ,xf 2 ,..); presetting a set of area ratio ratios as (r) 0 ,r 1 ,r 2 ,..); presetting a group of ordinate scaling coefficients YScale to (ys) 0 ,ys 1 ,ys 2 ,......). Fall in this example base Is [0, 1, 4, 9, 16, 25, 36, 49 ]]XOffset is [ -20, -19.. 7.19.20]The ratios are uniformly distributed in intervals of (0.0001, 0.005), YScale is [5, 6, 7.., 50 ]]。
Step 3.2: calculating a background picture BI j And randomly selecting a coordinate as an initial coordinate loc within the picture coverage range base Said initial coordinate loc base Is a point coordinate with an abscissa and an ordinate. Randomly selecting a position index ind from the interval of [0, T-3) as the index of the initial parabolic appearance, from Fall base In random orderSelecting a longitudinal interval sequence (fb) with the length of T-ind-1 j ,...,fb j+T-ind-2 ) Then randomly selecting a scaling coefficient ys from the Yscale to obtain a new interval sequence (ys fb) j ,...,ys*fb j+T-ind-2 ) (ii) a Then randomly selecting an abscissa perturbation sequence from the XOffset
Figure 570299DEST_PATH_IMAGE010
Wherein k is i E.n (i = {0, 1, 2.., T-ind-1 }); perturbing the new interval sequence and abscissa with loc base Correspondingly adding to obtain a position sequence loc = (loc) of the foreground picture in the background picture of one image sequence segment base +
Figure 569479DEST_PATH_IMAGE012
,loc base +
Figure 322672DEST_PATH_IMAGE013
+ys*fb j ,...,loc base +
Figure 407302DEST_PATH_IMAGE014
+ys*fb j+T-ind-3 ,loc base +
Figure 372984DEST_PATH_IMAGE015
+ys*fb j+T-ind-2 );
Step 3.3: calculating foreground picture FI i Area A of the middle object fi Background picture BI j Area A of picture in bi Randomly selecting a ratio r from the ratios according to
Figure 288988DEST_PATH_IMAGE016
=A bi Calculating to obtain the area of new object
Figure 150764DEST_PATH_IMAGE016
Calculating the scaling fr =ofthe object
Figure 785008DEST_PATH_IMAGE017
Step 3.4: segmenting labeled label according to foreground picture i Generating a mask for each object covering area i (ii) a According to the object scaling fr obtained in step 3.3, the object picture im i And mask i Zooming to obtain a new object picture n _ im with a parabola i And n _ mask i
Step 3.5: establishing an index (0, 1., T-1) for the background picture, selecting the current parabolic coordinate and the background picture according to the sequence, skipping if the background picture index is less than ind, otherwise, performing the step 3.4 of n _ im i And n _ mask i After being transformed by using data transformation, the transformed data is superposed into the background picture according to the corresponding coordinates in the loc to obtain a picture containing a parabola; finally, a group of sequence pictures DI containing parabolas is obtained; in this embodiment, the transform includes random inversion with a probability of 0.5, gaussian noise with a probability of 0.5, and random rotation with a probability of 1, and the superposition mode is that parabolic content directly covers background picture content, and of course, the transform and the superposition mode may also adopt other methods, for example, the superposition mode may adopt poisson fusion;
and 4, step 4: the Network is trained using the DI training Time sequence block partition of the synthetic parabolic sequence set to obtain TPSNet (Time-sequence Patch Segmentation Network).
For step 4, specifically:
step 4.1: designing a time sequence block segmentation network, wherein the time sequence block segmentation network consists of four parts, namely a feature extraction layer, a feature transformation layer, a time sequence feature extraction layer and a classification layer.
And 4.2: the feature extraction layer is a CNN-based network and mainly aims to extract features of pictures. In order to ensure the overall efficiency in the last operation, firstly, splicing a picture sequence with a channel of 3 and an input length of T into data with a channel of 3 and a length of 1 in sequence, then changing the number of input channels of the convolution of the first layer of the commonly used backbone network to be 3T, and keeping the rest unchanged, so that the calculation amount is increased (almost ignored) only in the first layer, the subsequent calculation amount is consistent with the calculation amount when one picture is input, and moreover, a plurality of down-sampling steps are added on the premise of keeping the number of network layers unchanged, so that the operation efficiency of the network is further ensured; in the embodiment, the T is 5, the back bone is MobileNet V2 which has the downsampling of 1/128 and removes the final pooling layer and the full-connection layer, the downsampling can be modified into other numbers according to the effect and the speed of the actual requirement, and the back bone can be replaced by other networks such as ResNet/DesNet/LiteHRNet and the like as a basic model;
step 4.3: the feature conversion layer converts the features extracted by the feature extraction layer into an input acceptable to the time-series feature extraction layer according to the number of blocks dividing the input picture. By way of example, for the feature C × H × W extracted in step 4.2, an average pooling layer (which may be other pooling, such as maximum pooling) with a kernel size of (1,2) and a step size of (1,2 is used first, and the kernel size and step size may control the size of the final output patch, and if each patch is desired to cover a larger area, the kernel size and step size may be increased accordingly, and the W direction is down-sampled, and the feature in the height direction is globally average pooled (which may be other pooling) to obtain a new feature C1 (W/2), and then converted into a feature dimension of (W/2) T (C/T), where C is an integer multiple of T, and W/2 is the number of block partitions of the input picture; in this embodiment, T is 5, W/2 is the width of the input picture divided by 256 (obtained by 1/128 in step 4.2 and the down-sampling in the W direction, and the overall down-sampling size is 256);
step 4.4: the time sequence feature extraction layer inputs the converted features into a recurrent neural network to obtain features on time sequence dimensions, and then a classification layer is used for classifying the extracted time sequence features and judging whether parabolas exist or not; in this embodiment, the recurrent neural network used is LSTM (long short term memory), but other networks such as GRU (gated round robin unit) may be used.
Step 4.5: and (3) performing data enhancement transformation on the parabolic sequence picture DI containing the parabolic sequence obtained in the step (3), inputting the parabolic sequence picture DI into a time sequence block segmentation network, and optimizing by using BCE as a loss function to finally obtain a converged network TPSNet. In this embodiment, the data enhancement transformation may be gaussian blur with probability of 0.5, random cropping with probability of 0.5, aspect ratio change of holding area with probability of 1, size change with probability of 1 (the length of the picture is adjusted to 512, the width is changed according to the original length-width ratio, and the value may be adjusted according to actual requirements), multiple of the filling length-width to 256, and random sequence inversion with probability of 0.2 (falling objects become ascending objects), and these data enhancement transformations may use different probabilities, different parameters, different combination orders, and other data enhancement transformations such as color transformation;
and 5: for an input video, 5 consecutive frames are input, the size is changed (the length is adjusted to 512, the width is changed according to the original length-width ratio), and the filling length-width is predicted to a multiple of 256, using TPSNet, so as to obtain a score scores = (score) for predicting whether each block of each sequence contains a parabola or not 0 ,score 1 ,...,score p ) Wherein p is the number of blocks dividing the input picture (the calculation method is described above), the score of each block is screened, if the score is smaller than the set threshold value of 0.6, filtering is performed, otherwise, recording is performed; the recorded effective score of score is obtained by supposing that the score for obtaining a certain block i continuously meets the condition i =(score i 0 ,score i 1 ,...,score i K ) If the effective fraction is record-score i The score number of the score greater than the preset threshold value 0.93 is greater than the preset number 3, and the effective score of record-score i If the sequence length K is more than the preset number 3, the object is regarded as a high-altitude object, and an alarm is output and a corresponding video image is recorded.
The invention provides a complex scene high-altitude parabolic recognition method, which provides a data synthesis method and a new modeling mode in the execution process of the recognition method, avoids the requirement on large-scale collection of data, has good generalization in different scenes and is high in speed:
1. synthesizing data according to the characteristics of the high-altitude parabolas (the longitudinal coordinate basically accords with the free falling body law, the rotation occurs in the falling process, and a certain transverse offset exists in the falling process) to solve the problem that a deep learning algorithm for the high-altitude parabolas field needs a large amount of multi-dimensional high-altitude parabolas labeling data.
2. Through higher down sampling, specific shallow features are ignored when the network extracts features, sequence features with relative semantics are concerned, and the problem that seamless migration to each scene cannot be achieved at low cost is solved.
3. Different from the commonly adopted strong positioning (positioning to the position of a parabola in an image), a weak positioning mode (positioning to the position of the parabola in a certain vertical area of the image) is adopted, so that the subsequent manual checking is facilitated, and meanwhile, the positioning requirement on the model is reduced, and the efficient model can be constructed; the sequence information is coupled to the feature dimension in the input process, and then the features are extracted, so that the efficiency of extracting the image features is ensured; the characteristic dimension is decoded into the sequence dimension, and the extracted semantic information can be subjected to sequence integration by using a sequence model, so that the final performance is ensured; the problem that the high speed and the good effect are difficult to maintain simultaneously is solved.
In order to support the smooth execution of the method, a complex scene high-altitude parabolic recognition system is correspondingly arranged, and the system comprises:
the training picture generation module is used for synthesizing the image sequence set SI and the object picture set SO to obtain a training picture sequence set DI with a parabola;
the time sequence patch segmentation network training module is used for training a time sequence block segmentation network by using a training picture sequence set DI to obtain TPSNet;
and the detection and identification module is used for carrying out parabolic identification and prediction on the test video by using the TPSNet obtained after training.
Specifically, the training picture generation module specifically includes the following units:
and the video frame extracting unit is used for collecting N1 video segments and extracting each frame of image in the video.
The background image dividing unit is used for dividing all images in each video into a plurality of image sequence segments S with the length T according to time sequence; all the image sequence segments obtained by division form an image sequence set SI.
The object picture labeling unit is used for collecting N2 object pictures and carrying out segmentation and labeling on the object pictures; and obtaining a segmented and labeled object picture set SO.
And the picture selecting unit is used for randomly selecting an object picture from the object picture set SO as a foreground picture and selecting an image sequence segment from the image sequence set SI as a background picture.
The picture synthesis unit is used for synthesizing the selected foreground picture and the selected background picture according to the characteristic of the high-altitude parabola to obtain a training picture sequence D with the parabola; and repeatedly synthesizing to obtain a plurality of training picture sequences D with parabolas, wherein the plurality of training picture sequences D form a training picture sequence set DI.
Specifically, the time series patch segmentation network training module is specifically configured to perform data enhancement transformation on a training picture sequence set DI, input the data enhancement transformation into a time series block segmentation network, and perform optimization by using a loss function to obtain a converged network TPSNet;
the time sequence block segmentation network comprises a feature extraction layer, a feature transformation layer, a time sequence feature extraction layer and a classification layer;
the characteristic extraction layer is based on a CNN network and is used for extracting the characteristics of pictures, splicing picture sequences with input length T and channel 3 into data with length 1 and channel 3T in sequence, and then changing the number of input channels of the first layer of convolution of the used backhaul network to be 3T;
the feature conversion layer is used for converting the picture features extracted by the feature extraction layer into the acceptable input of the time sequence feature extraction layer according to the number of the divided blocks;
the time sequence feature extraction layer is used for inputting the transformed picture features into a recurrent neural network to obtain the features on the time sequence dimension;
and the classification layer is used for classifying the extracted time sequence characteristics and judging whether a parabola exists.
Specifically, the detection and identification module comprises the following units:
the system comprises a test video processing unit, a data processing unit and a data processing unit, wherein the test video processing unit is used for acquiring a test video, dividing the test video into a plurality of sequences with the length of T, and performing identification prediction by using TPSNet; a score is obtained for each sequence of high altitude parabolas and recorded.
And the identification and judgment unit is used for judging whether the high-altitude parabolic object exists in the test video according to the score, and alarming and recording the corresponding video image segment if the high-altitude parabolic object exists in the test video.
Although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (20)

1. A complex scene high altitude parabola identification method is characterized by comprising the following steps:
synthesizing the image sequence set SI and the object picture set SO to obtain a training picture sequence set DI with a parabola;
training a time sequence block segmentation network by using a training picture sequence set DI to obtain TPSNet;
and (4) carrying out parabolic recognition prediction on the test video by using the TPSNet obtained after training, and judging whether a high-altitude parabolic exists in the test video.
2. The complex scene high altitude parabola identification method according to claim 1, wherein the method for acquiring the image sequence set SI comprises:
collecting N1 video segments, and extracting each frame of image in the video;
dividing all images in each video into a plurality of image sequence segments S with the length T according to time sequence;
all the image sequence segments obtained by the division constitute an image sequence set SI.
3. The complex scene high-altitude parabolic recognition method according to claim 2, wherein the image sequence set SI is specifically:
{ (0, 1.·, T-1), (sep, 1+ sep.,....,. T + sep-1),... ere, wherein (sep, 1+ sep.,..., T + sep-1) represents a numbered set of images within a set of image sequence segments S; sep represents the interval of sampling; t represents the length of a single image sequence segment S;
the image sequence set SI includes 1+ FIoor ((VL-T)/sep) image sequence segments, where FIoor represents rounding-down and VL represents the number of pictures extracted per video.
4. The complex scene high-altitude parabolic recognition method according to claim 2, wherein the method for acquiring the object picture set SO comprises:
collecting N2 object pictures, and segmenting and labeling the object pictures;
and obtaining a segmented and labeled object picture set SO.
5. The complex scene high-altitude parabolic recognition method according to claim 4, wherein the segmenting and labeling of the object picture specifically comprises:
marking the required objects in the object picture to obtain a segmentation marking label of each required object i
Each object picture correspondingly generates a label set which is { (im) 1 ,label 1 )(im 2 ,label 2 ),......,(im n ,label n ) In which im is 1 、im 2 、...、im n Refers to picture index numbers; label 1 、label 2 、...、label n Labels referring to n desired objects, (im) n ,label n ) Representing the nth group of labels in the label set;
after all the object pictures are segmented and labeled, integrating the label set of each obtained object picture to obtain segmented and labeled pictures and segmentationA set of annotations SO, said set of pictures and segmentation annotations SO being denoted by { (im) i ,label i )| i∈[1,m]Therein (im) i ,label i ) Representing the i-th group of labels, im, in the set of pictures and segmentation labels i Representing the picture index number in the ith group of labels; label i And (3) representing the segmentation label of the ith required object, wherein m represents that m required objects are arranged in all the object pictures.
6. The complex scene high altitude parabola identification method according to claim 4, wherein the method for synthesizing the training picture sequence set DI comprises:
randomly selecting an object picture from the object picture set SO as a foreground picture, and selecting an image sequence segment from the image sequence set SI as a background picture;
synthesizing the selected foreground picture and the selected background picture according to the characteristic of the high-altitude parabola to obtain a training picture sequence D with the parabola;
and repeatedly synthesizing to obtain a plurality of training picture sequences D with parabolas, wherein the plurality of training picture sequences D form a training picture sequence set DI.
7. The complex scene high-altitude parabola identification method according to claim 6, wherein the characteristics of the high-altitude parabola specifically include:
presetting a longitudinal interval sequence Fall conforming to the law of free Fall comprising k coordinates base Is (fb) 0 ,fb 1 ,fb 2 ,......,fb k-1 ),fb k-1 Is shown in the longitudinal interval sequence Fall base The kth longitudinal interval value;
presetting a group of transverse disturbance XOffset as (xf) 0 ,xf 1 ,xf 2 ,......,xf n ) Wherein xf is n Represents the n +1 th lateral disturbance value in a group of lateral disturbances XOffset;
presetting a set of area ratio ratios as (r) 0 ,r 1 ,r 2 ,......,r n ) Wherein r is n Is shown inThe (n + 1) th area ratio in a group of area ratio ratios;
presetting a group of ordinate scaling coefficients YScale to (ys) 0 ,ys 1 ,ys 2 ,......,ys n ) Wherein ys is n Indicating the (n + 1) th ordinate scaling factor in a set of ordinate scaling factors YScale.
8. The complex scene high-altitude parabolic recognition method according to claim 7, wherein the synthesizing the selected foreground picture and the selected background picture to obtain the training picture sequence D with the parabola specifically comprises:
in the selected background picture, determining a position information set of the foreground picture through characteristic transformation of a high-altitude parabola;
determining the size information of the foreground picture according to the areas of the foreground picture and the background picture;
and according to the position information set and the size information of the foreground picture, sequentially combining the foreground picture and a group of background pictures into a group of training pictures with parabolas to form a training picture sequence D.
9. The method for identifying the high-altitude parabola in the complex scene according to claim 8, wherein the determining the position information set of the foreground picture in the selected background picture through the characteristic transformation of the high-altitude parabola specifically comprises:
randomly selecting a coordinate as an initial coordinate loc within the coverage range of the selected background picture base
Randomly selecting a position index ind from the interval of [0, T-3) as the index of the initial parabolic appearance, and spacing the sequence Fall from the longitudinal direction base Randomly selecting a longitudinal interval sequence (fb) with the length of T-ind-1 from the middle sequence j ,...,fb j+T-ind-2 ) Wherein fb j To show Fall base The (j + 1) th longitudinal interval value, fb j+T-ind-2 To show Fall base The j + T-ind-1 longitudinal interval value;
randomly selecting a scaling coefficient ys from the ordinate scaling coefficient Yscale to obtain a new longitudinal scaling coefficient ysSpacer sequence (ys fb) j ,...,ys*fb j+T-ind-2 );
Randomly selecting abscissa disturbance sequence from transverse disturbance XOffset
Figure 96096DEST_PATH_IMAGE001
Wherein k is i E.n (i = {0, 1, 2.., T-ind-1 }), wherein
Figure 950920DEST_PATH_IMAGE002
Denotes the kth in XOffset T-ind-1 An abscissa perturbation value;
new interval sequence and abscissa perturbation sequence are compared with the initial coordinate loc base Adding the corresponding coordinate values to obtain a position sequence loc = (loc) of the foreground picture in the background picture of one image sequence segment base +
Figure 402761DEST_PATH_IMAGE003
,loc base +
Figure 532391DEST_PATH_IMAGE004
+ys*fb j ,...,loc base +
Figure 18867DEST_PATH_IMAGE005
+ys*fb j+T-ind-3 ,loc base +
Figure 575750DEST_PATH_IMAGE006
+ys*fb j+T-ind-2 )。
10. The complex scene high-altitude parabolic recognition method according to claim 9, wherein the determining the size information of the foreground picture according to the areas of the foreground picture and the background picture specifically comprises:
calculating the object area A in the foreground picture fi And picture area A in background picture bi
Randomly selecting a ratio r from the area ratios to obtainTo new object area
Figure 514887DEST_PATH_IMAGE007
=A bi *r;
Further obtaining the scaling fr =of the object
Figure 448208DEST_PATH_IMAGE008
Labeling label according to segmentation in foreground picture i Generating an area mask covered by each required object i
Picture im of an object according to the scaling fr of the object i Area mask covered by object i Zooming to obtain a new object picture n _ im with a parabola i And n _ mask i
11. The method for identifying high-altitude parabolas in complex scenes according to claim 10, wherein the step of sequentially synthesizing a foreground picture and a group of background pictures into a group of training pictures with parabolas specifically comprises:
establishing an index sequence number (0, 1, 2., T-1) for a background picture of an image sequence segment;
sequentially selecting the background picture and the position coordinates in the position sequence loc;
judging whether the index sequence number of the background picture is smaller than the position index ind;
skipping the background picture when the background picture index sequence number is smaller than the position index ind;
when the index number of the background picture is not less than the index ind of the position, a new object picture n _ im with a parabola is taken i And the area n _ mask covered by the object i Simultaneously carrying out data transformation; and superposing the new object picture to the corresponding background picture according to the position coordinate in the position sequence loc to synthesize a training picture.
12. The complex scene high altitude parabola identification method as claimed in any one of claims 1-11, wherein said time sequence block segmentation network comprises a feature extraction layer, a feature transformation layer, a time sequence feature extraction layer and a classification layer;
the characteristic extraction layer is based on a CNN network and is used for extracting the characteristics of pictures, splicing picture sequences with input length T and channel 3 into data with length 1 and channel 3T in sequence, and then changing the number of input channels of the first layer of convolution of the used backhaul network to be 3T;
the feature conversion layer is used for converting the picture features extracted by the feature extraction layer into the acceptable input of the time sequence feature extraction layer according to the number of the divided blocks;
the time sequence feature extraction layer is used for inputting the transformed picture features into a recurrent neural network to obtain the features on the time sequence dimension;
and the classification layer is used for classifying the extracted time sequence characteristics and judging whether a parabola exists.
13. The complex scene high altitude parabola identification method according to claim 12, wherein the method for obtaining the TPSNet comprises:
and performing data enhancement transformation on the training picture sequence set DI, inputting the training picture sequence set DI into a time sequence block segmentation network, and optimizing by using a loss function to obtain a converged network TPSNet.
14. The complex scene high-altitude parabolic recognition method according to any one of claims 1 to 11, wherein the performing parabolic recognition prediction on the test video and judging whether a high-altitude parabolic exists in the test video specifically comprises:
acquiring a test video, dividing the test video into a plurality of sequences with the length of T, and identifying and predicting by using TPSNet;
obtaining the score of each sequence of high altitude parabolas, and recording the score;
and judging whether a high-altitude parabola exists in the test video or not according to the score, and if the high-altitude parabola exists in the test video, alarming and recording a corresponding video image segment.
15. The method for identifying high altitude parabolas in complex scenes as claimed in claim 14, wherein obtaining a score of each sequence of high altitude parabolas, and recording the score specifically comprises:
deriving a score scores = (score) predicting whether each block of each sequence contains a parabola or not 0 ,score 1 ,...,score p ) Wherein P is the number of blocks into which the input picture is divided;
screening the score of each patch, and if the score is smaller than a set first threshold, filtering, and not recording the score value; and if the score is not less than the set first threshold, recording the score value, and finally obtaining a qualified effective score sequence set.
16. The complex scene high-altitude parabola identification method according to claim 15, wherein the judging whether the high-altitude parabola exists in the test video specifically comprises:
judging whether the number of data in the effective fraction sequence set is greater than the preset sequence length and whether the number of data in the effective fraction sequence set which is greater than a second threshold value is greater than the preset number;
and when the number of the data in the effective fraction sequence set is larger than the preset sequence length and the number of the data in the effective fraction sequence set, which is larger than the second threshold value, is larger than the preset number, alarming and recording the corresponding video image segment.
17. A complex scene high altitude parabola recognition system, characterized in that the system comprises:
the training picture generation module is used for synthesizing the image sequence set SI and the object picture set SO to obtain a training picture sequence set DI with a parabola;
the time sequence patch segmentation network training module is used for training a time sequence block segmentation network by using a training picture sequence set DI to obtain TPSNet;
and the detection and identification module is used for carrying out parabolic identification and prediction on the test video by using the TPSNet obtained after training.
18. The complex scene high altitude parabola recognition system of claim 17, wherein the training picture generation module specifically comprises:
the video frame extracting unit is used for collecting N1 video segments and extracting each frame image in the video;
the background image dividing unit is used for dividing all images in each video into a plurality of image sequence segments S with the length T according to time sequence; all the image sequence segments obtained by division form an image sequence set SI;
the object picture labeling unit is used for collecting N2 object pictures and carrying out segmentation and labeling on the object pictures; obtaining a segmented and labeled object picture set SO;
the image selecting unit is used for randomly selecting an object image from the object image set SO as a foreground image and selecting an image sequence segment from the image sequence set SI as a background image;
the picture synthesis unit is used for synthesizing the selected foreground picture and the selected background picture according to the characteristic of the high-altitude parabola to obtain a training picture sequence D with the parabola; and repeatedly synthesizing to obtain a plurality of training picture sequences D with parabolas, wherein the plurality of training picture sequences D form a training picture sequence set DI.
19. The complex scene high altitude parabola identification system of claim 18, wherein said timing patch segmentation network training module is specifically configured to perform data enhancement transformation on a training picture sequence set DI, input the transformed training picture sequence set DI into a timing block segmentation network, and perform optimization using a loss function to obtain a converged network TPSNet;
the time sequence block segmentation network comprises a feature extraction layer, a feature transformation layer, a time sequence feature extraction layer and a classification layer;
the characteristic extraction layer is based on a CNN network and is used for extracting the characteristics of pictures, splicing picture sequences with input length T and channel 3 into data with length 1 and channel 3T in sequence, and then changing the number of input channels of the first layer of convolution of the used backhaul network to be 3T;
the feature conversion layer is used for converting the picture features extracted by the feature extraction layer into the acceptable input of the time sequence feature extraction layer according to the number of the divided blocks;
the time sequence feature extraction layer is used for inputting the transformed picture features into a recurrent neural network to obtain the features on the time sequence dimension;
and the classification layer is used for classifying the extracted time sequence characteristics and judging whether a parabola exists.
20. The complex scene high altitude parabola identification system of claim 17, wherein the detection identification module specifically comprises:
the test video processing unit is used for acquiring a test video, dividing the test video into a plurality of sequences with the length of T, and performing identification prediction by using TPSNet; obtaining the score of each sequence of high altitude parabolas, and recording the score;
and the identification and judgment unit is used for judging whether the high-altitude parabolic object exists in the test video according to the score, and alarming and recording the corresponding video image segment if the high-altitude parabolic object exists in the test video.
CN202210796750.9A 2022-07-08 2022-07-08 Complex scene high altitude parabolic identification method and system Active CN114863370B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210796750.9A CN114863370B (en) 2022-07-08 2022-07-08 Complex scene high altitude parabolic identification method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210796750.9A CN114863370B (en) 2022-07-08 2022-07-08 Complex scene high altitude parabolic identification method and system

Publications (2)

Publication Number Publication Date
CN114863370A true CN114863370A (en) 2022-08-05
CN114863370B CN114863370B (en) 2022-10-25

Family

ID=82626249

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210796750.9A Active CN114863370B (en) 2022-07-08 2022-07-08 Complex scene high altitude parabolic identification method and system

Country Status (1)

Country Link
CN (1) CN114863370B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107862376A (en) * 2017-10-30 2018-03-30 中山大学 A kind of human body image action identification method based on double-current neutral net
CN108615011A (en) * 2018-04-24 2018-10-02 东南大学 Non- trimming video behavior identification prediction method based on multi-scale sliding window mouth
CN110020596A (en) * 2019-02-21 2019-07-16 北京大学 A kind of video content localization method based on Fusion Features and cascade study
CN110796087A (en) * 2019-10-30 2020-02-14 江西赣鄱云新型智慧城市技术研究有限公司 Method and system for quickly generating high-altitude parabolic training sample
CN111723654A (en) * 2020-05-12 2020-09-29 中国电子***技术有限公司 High-altitude parabolic detection method and device based on background modeling, YOLOv3 and self-optimization
CN113076809A (en) * 2021-03-10 2021-07-06 青岛海纳云科技控股有限公司 High-altitude falling object detection method based on visual Transformer
CN113506315A (en) * 2021-07-26 2021-10-15 上海智眸智能科技有限责任公司 Method and device for detecting moving object and storage medium
WO2022105609A1 (en) * 2020-11-19 2022-05-27 中科智云科技有限公司 High-altitude parabolic object detection method and apparatus, computer device, and storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107862376A (en) * 2017-10-30 2018-03-30 中山大学 A kind of human body image action identification method based on double-current neutral net
CN108615011A (en) * 2018-04-24 2018-10-02 东南大学 Non- trimming video behavior identification prediction method based on multi-scale sliding window mouth
CN110020596A (en) * 2019-02-21 2019-07-16 北京大学 A kind of video content localization method based on Fusion Features and cascade study
CN110796087A (en) * 2019-10-30 2020-02-14 江西赣鄱云新型智慧城市技术研究有限公司 Method and system for quickly generating high-altitude parabolic training sample
CN111723654A (en) * 2020-05-12 2020-09-29 中国电子***技术有限公司 High-altitude parabolic detection method and device based on background modeling, YOLOv3 and self-optimization
WO2022105609A1 (en) * 2020-11-19 2022-05-27 中科智云科技有限公司 High-altitude parabolic object detection method and apparatus, computer device, and storage medium
CN113076809A (en) * 2021-03-10 2021-07-06 青岛海纳云科技控股有限公司 High-altitude falling object detection method based on visual Transformer
CN113506315A (en) * 2021-07-26 2021-10-15 上海智眸智能科技有限责任公司 Method and device for detecting moving object and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
WEI WANG: "TPSNet: Thin-Plate-Spline Representation for Arbitrary Shape Scene Text", 《ARXIV:2110.12826V1》 *

Also Published As

Publication number Publication date
CN114863370B (en) 2022-10-25

Similar Documents

Publication Publication Date Title
CN110837778B (en) Traffic police command gesture recognition method based on skeleton joint point sequence
CN109829443B (en) Video behavior identification method based on image enhancement and 3D convolution neural network
CN111259786B (en) Pedestrian re-identification method based on synchronous enhancement of appearance and motion information of video
CN110728200B (en) Real-time pedestrian detection method and system based on deep learning
CN111523410B (en) Video saliency target detection method based on attention mechanism
CN111104903B (en) Depth perception traffic scene multi-target detection method and system
CN109583315B (en) Multichannel rapid human body posture recognition method for intelligent video monitoring
CN107273832B (en) License plate recognition method and system based on integral channel characteristics and convolutional neural network
CN111027377B (en) Double-flow neural network time sequence action positioning method
CN111507275B (en) Video data time sequence information extraction method and device based on deep learning
CN110287798B (en) Vector network pedestrian detection method based on feature modularization and context fusion
CN112861575A (en) Pedestrian structuring method, device, equipment and storage medium
WO2023030182A1 (en) Image generation method and apparatus
CN110852179B (en) Suspicious personnel invasion detection method based on video monitoring platform
CN114360067A (en) Dynamic gesture recognition method based on deep learning
CN115527269B (en) Intelligent human body posture image recognition method and system
CN111915583A (en) Vehicle and pedestrian detection method based on vehicle-mounted thermal infrared imager in complex scene
CN115861619A (en) Airborne LiDAR (light detection and ranging) urban point cloud semantic segmentation method and system of recursive residual double-attention kernel point convolution network
CN114882440A (en) Human head detection method and system
CN113591674A (en) Real-time video stream-oriented edge environment behavior recognition system
Wang et al. Video-based air quality measurement with dual-channel 3-D convolutional network
CN110688512A (en) Pedestrian image search algorithm based on PTGAN region gap and depth neural network
CN113505640A (en) Small-scale pedestrian detection method based on multi-scale feature fusion
CN117292324A (en) Crowd density estimation method and system
CN112668493A (en) Reloading pedestrian re-identification, positioning and tracking system based on GAN and deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant