CN112613516A - Semantic segmentation method for aerial video data - Google Patents
Semantic segmentation method for aerial video data Download PDFInfo
- Publication number
- CN112613516A CN112613516A CN202011459565.8A CN202011459565A CN112613516A CN 112613516 A CN112613516 A CN 112613516A CN 202011459565 A CN202011459565 A CN 202011459565A CN 112613516 A CN112613516 A CN 112613516A
- Authority
- CN
- China
- Prior art keywords
- semantic segmentation
- histogram
- video data
- aerial video
- data set
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000011218 segmentation Effects 0.000 title claims abstract description 36
- 238000000034 method Methods 0.000 title claims abstract description 28
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 21
- 238000001514 detection method Methods 0.000 claims abstract description 18
- 238000004590 computer program Methods 0.000 claims description 11
- 238000004364 calculation method Methods 0.000 claims description 10
- 230000008602 contraction Effects 0.000 claims description 6
- 230000004913 activation Effects 0.000 claims description 4
- 238000000638 solvent extraction Methods 0.000 claims 1
- 238000013527 convolutional neural network Methods 0.000 abstract description 4
- 230000003287 optical effect Effects 0.000 abstract description 4
- 230000008859 change Effects 0.000 abstract description 2
- 238000007781 pre-processing Methods 0.000 abstract description 2
- 230000035945 sensitivity Effects 0.000 abstract description 2
- 230000006870 function Effects 0.000 description 6
- 238000010586 diagram Methods 0.000 description 3
- 238000005381 potential energy Methods 0.000 description 3
- 239000007787 solid Substances 0.000 description 2
- 238000013500 data storage Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
- G06V10/267—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/50—Extraction of image or video features by performing operations within image blocks; by using histograms, e.g. histogram of oriented gradients [HoG]; by summing image-intensity values; Projection analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/56—Extraction of image or video features relating to colour
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/10—Terrestrial scenes
- G06V20/13—Satellite images
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Astronomy & Astrophysics (AREA)
- Remote Sensing (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
Abstract
The application discloses a semantic segmentation method for aerial video data, which is characterized in that an aerial video data set is trained and identified through a shot boundary detection algorithm, key frames in the aerial video data set are obtained and form a key frame data set, and then the key frame data set is subjected to semantic segmentation through a semantic segmentation algorithm based on a full convolution network. The semantic segmentation method reduces the calculated amount of data by preprocessing the data and extracting key frames, does not need a large data set driving model to learn, solves the sensitivity of the model to the optical flow change generated by shadows by combining color and texture features, and optimizes the result of semantic segmentation by learning local features and global features in an end-to-end mode by using a convolutional neural network, thereby improving the accuracy and reliability of later period expansibility jessage.
Description
Technical Field
The application relates to a semantic segmentation method for aerial video data.
Background
Video captured by analyzing drones has a wide range of applications, such as tracking vehicles, object detection, anomaly detection, and the like. For most applications, spatial and contextual information needs to be inferred from the image frames of the video. For example, tracking of vehicles will be easier with knowledge about roads, semantic segmentation being one of the tools used to divide an image into different semantic regions and classify these regions into predefined classes. Semantic segmentation helps to understand the layout of a scene, and therefore it is becoming an increasingly important factor for anomaly detection, autonomous vehicle, object detection, and the like. Semantic segmentation remains challenging due to changes in objects in the class, loss of perspective, context of the scene, presence of noise, and changes in lighting. Current semantic segmentation can be achieved by using traditional machine learning methods such as those of Conditional Random Fields (CRF) and deep Convolutional Neural Networks (CNN).
CRF-based algorithms are widely used for their ability to capture contextual information, and the framework is usually composed of unitary and paired potentials. Unitary potential energy captures local features that depend on the pixel itself, and paired potential energy captures spatial information. Capturing the different potential energies of various features (e.g., texture, color location, etc.) requires manual encoding into the model. However, these manually operated functions may not capture all of the variations in the data.
While the success of automated systems for anomaly detection, event detection, etc. in aerial video relies heavily on scene understanding for greater accuracy. In addition, due to the lack of available data sets, there is limited research on semantic segmentation of drone video.
Therefore, how to more effectively realize semantic segmentation on the unmanned aerial vehicle aerial video and further utilize the semantic segmentation in analysis is a technical problem which needs to be solved urgently at present.
Disclosure of Invention
It is an object of the present application to overcome the above problems or to at least partially solve or mitigate the above problems.
According to one aspect of the application, a semantic segmentation method for aerial video data is provided, an aerial video data set is trained and identified through a shot boundary detection algorithm, key frames in the aerial video data set are obtained and form a key frame data set, and then the key frame data set is subjected to semantic segmentation through a semantic segmentation algorithm based on a full convolution network.
Optionally, the shot boundary detection algorithm identifies shot boundaries for consecutive frames in the aerial video data set by calculating histogram differences of the consecutive frames and comparing the histogram differences with a set threshold value to complete the shot boundary identification.
Optionally, the shot boundary detection algorithm identifies the shot boundary of consecutive frames in the aerial video data set by dividing non-overlapping grids and combining histogram difference calculation to identify the shot boundary of each frame.
Optionally, when the shot boundary detection algorithm identifies the shot boundary of each frame by using the non-overlapping grids and combining the histogram difference calculation, each frame is divided into the non-overlapping grids with the size of 16 × 16, then the corresponding grid histogram difference between two adjacent frames is calculated by using the chi-square distance, then the histogram average difference between two consecutive frames is calculated, and finally the histogram average difference and the set threshold T are calculatedshotA comparison is made to identify shot boundaries.
Optionally, the equation for calculating the corresponding grid histogram difference between two adjacent frames by using the chi-square distance is as follows:
wherein HiRepresents the ith frame histogram, Hi+1Indicates the (I +1) th frame histogram, and I indicates the image block at the same position in the two frames.
Optionally, the calculation formula of the histogram average difference between two consecutive frames is:
where D is the mean difference of the histograms of two consecutive frames, DkN represents the total number of image blocks in the image as the chi-squared difference between the k-th image blocks.
Optionally, the histogram average difference is compared with a set threshold value TshotThe calculation formula for comparison is:
where i and i +1 represent two consecutive frames.
Optionally, a U-Net model is adopted in performing semantic segmentation on the key frame data set through a semantic segmentation algorithm based on a full convolution network, the U-Net model includes a contraction path and a symmetric expansion path, convolution operation is performed on features in the key frame through the contraction path, the features are extracted through a Relu activation function, a maxpool function is applied to the extracted features to identify relevant features, and Softmax is applied to the last layer of the U-Net model to activate, so that the pixel probability of each class is obtained.
Optionally, the keyframes processed by the U-Net model are directed to 256 × 256 color images, and each layer of the U-Net model is simultaneously filled with the most relevant features for the keyframe features.
In particular, the present invention also provides a computing device comprising a memory, a processor and a computer program stored in the memory and executable by the processor, wherein the processor implements the method as described above when executing the computer program.
The invention also provides a computer-readable storage medium, preferably a non-volatile readable storage medium, having stored therein a computer program which, when executed by a processor, implements a method as described above.
The invention also provides a computer program product comprising computer readable code which, when executed by a computer device, causes the computer device to perform the method as described above.
The semantic segmentation method for the aerial video data is characterized in that key frames are extracted through data preprocessing, so that the calculated amount of data is reduced, a large data set driving model is not needed for learning, the sensitivity of the model to the light stream change generated by shadows is solved by combining color and texture features, a convolutional neural network is used for learning local features and global features in an end-to-end mode to optimize the result of semantic segmentation, and the accuracy and reliability of the later period expansibility jessay are improved.
The above and other objects, advantages and features of the present application will become more apparent to those skilled in the art from the following detailed description of specific embodiments thereof, taken in conjunction with the accompanying drawings.
Drawings
Some specific embodiments of the present application will be described in detail hereinafter by way of illustration and not limitation with reference to the accompanying drawings. The same reference numbers in the drawings identify the same or similar elements or components. Those skilled in the art will appreciate that the drawings are not necessarily drawn to scale. In the drawings:
FIG. 1 is a flow diagram of a method for semantic segmentation of aerial video data according to one embodiment of the present application;
FIG. 2 is a block diagram of a computing device according to another embodiment of the present application;
fig. 3 is a diagram of a computer-readable storage medium structure according to another embodiment of the present application.
Detailed Description
According to the scheme, as shown in fig. 1, an aerial video data set is trained and identified through a shot boundary detection algorithm, key frames in the aerial video data set are obtained and form a key frame data set, and then the key frame data set is subjected to semantic segmentation through a semantic segmentation algorithm based on a full convolution network.
The shot boundary detection algorithm identifies shot boundaries of continuous frames in the aerial video data set by calculating histogram differences of the continuous frames and comparing the histogram differences with a set threshold value so as to complete the identification of the shot boundaries. Further, the shot boundary of each frame is identified by dividing the non-overlapping grids and combining histogram difference calculation.
Specifically, when the shot boundary detection algorithm identifies the shot boundary of each frame by using the non-overlapping grids and combining the histogram difference calculation, each frame is divided into the non-overlapping grids with the size of 16 × 16, then the corresponding grid histogram difference between two adjacent frames is calculated by using the chi-square distance,
wherein HiRepresents the ith frame histogram, Hi+1Indicates the (I +1) th frame histogram, and I indicates the image block at the same position in the two frames.
Then, calculating the mean difference of the histogram between two continuous frames,
where D is the mean difference of the histograms of two consecutive frames, DkN represents the total number of image blocks in the image as the chi-squared difference between the k-th image blocks.
Finally, the average difference of the histogram is compared with a set threshold value TshotComparing to identify shot boundary, and comparing the histogram average difference with a set threshold value TshotThe calculation formula for comparison is:
where i and i +1 represent two consecutive frames. The threshold Tshot may be determined according to specific working condition requirements, where the threshold Tshot in this embodiment is determined according to peaks and valleys of a histogram curve, and preferably, the threshold Tshot corresponds to a minimum value between two peaks in a selected histogram, and may be determined according to experimental performance. When determining shot boundaries, e.g. Di+1-Di>TshotIf the value is 1, the shot boundary is determined, otherwise, the shot boundary is a non-shot boundary.
Optionally, a U-Net model is adopted in performing semantic segmentation on the key frame data set through a semantic segmentation algorithm based on a full convolution network, the U-Net model includes a contraction path and a symmetric expansion path, convolution operation is performed on features in the key frame through the contraction path, the features are extracted through a Relu activation function, a maxpool function is applied to the extracted features to identify relevant features, and Softmax is applied to the last layer of the U-Net model to activate, so that the pixel probability of each class is obtained. Generally, a picture includes a plurality of semantic classes, such as "road", "lawn", "house", and the like, and after the pixel probability of each class is obtained in this embodiment, the semantic class corresponding to the pixel point can be obtained, that is, the semantics in the picture can be analyzed.
In the embodiment, the U-Net model is modified correspondingly to process the aerial image. The key frame processed by the U-Net model aims at 256 × 256 color images, each layer of the U-Net model is filled simultaneously, the input of each layer is convoluted by the upper layer so as to be enriched, and the most relevant characteristic aiming at the key frame characteristic is reserved.
The above and other objects, advantages and features of the present application will become more apparent to those skilled in the art from the following detailed description of specific embodiments thereof, taken in conjunction with the accompanying drawings.
Embodiments also provide a computing device, referring to fig. 2, comprising a memory 1120, a processor 1110 and a computer program stored in said memory 1120 and executable by said processor 1110, the computer program being stored in a space 1130 for program code in the memory 1120, the computer program, when executed by the processor 1110, implementing the method steps 1131 for performing any of the methods according to the invention.
The embodiment of the application also provides a computer readable storage medium. Referring to fig. 3, the computer readable storage medium comprises a storage unit for program code provided with a program 1131' for performing the steps of the method according to the invention, which program is executed by a processor.
The embodiment of the application also provides a computer program product containing instructions. Which, when run on a computer, causes the computer to carry out the steps of the method according to the invention.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed by a computer, cause the computer to perform, in whole or in part, the procedures or functions described in accordance with the embodiments of the application. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
Those of skill would further appreciate that the various illustrative components and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
It will be understood by those skilled in the art that all or part of the steps in the method for implementing the above embodiments may be implemented by a program, and the program may be stored in a computer-readable storage medium, where the storage medium is a non-transitory medium, such as a random access memory, a read only memory, a flash memory, a hard disk, a solid state disk, a magnetic tape (magnetic tape), a floppy disk (floppy disk), an optical disk (optical disk), and any combination thereof.
The above description is only for the preferred embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application should be covered within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
Claims (10)
1. The semantic segmentation method for the aerial video data is characterized in that an aerial video data set is trained and identified through a shot boundary detection algorithm to obtain key frames in the aerial video data set and form a key frame data set, and then the key frame data set is subjected to semantic segmentation through a semantic segmentation algorithm based on a full convolution network.
2. The method of claim 1, wherein the shot boundary detection algorithm identifies shot boundaries for successive frames in the set of aerial video data by calculating histogram differences for successive frames and comparing the histogram differences to a set threshold.
3. The method for semantic segmentation of aerial video data according to claim 2, wherein the shot boundary detection algorithm identifies shot boundaries for successive frames in the aerial video data set by partitioning non-overlapping meshes in combination with histogram difference calculation.
4. The method of claim 3, wherein when the shot boundary detection algorithm identifies the shot boundary of each frame by non-overlapping meshes and combining histogram difference calculation, each frame is divided into non-overlapping meshes of 16 × 16, then the chi-square distance is used to calculate the corresponding mesh histogram difference between two adjacent frames, then the histogram average difference between two consecutive frames is calculated, and finally the histogram average difference is compared with a set threshold TshotA comparison is made to identify shot boundaries.
5. The semantic segmentation method for the aerial video data according to claim 4, characterized in that a formula for calculating a corresponding grid histogram difference between two adjacent frames by using a chi-squared distance is as follows:
wherein HiRepresents the ith frame histogram, Hi+1Indicates the (I +1) th frame histogram, and I indicates the image block at the same position in the two frames.
6. The method of semantic segmentation for aerial video data of claim 5 wherein the histogram mean difference between two consecutive frames is calculated as:
where D is the mean difference of the histograms of two consecutive frames, DkN represents the total number of image blocks in the image as the chi-squared difference between the k-th image blocks.
8. The semantic segmentation method for the aerial video data according to claim 2, wherein a U-Net model is adopted in performing semantic segmentation on the key frame data set through a semantic segmentation algorithm based on a full convolution network, the U-Net model comprises a contraction path and a symmetrical expansion path, features in a key frame are subjected to convolution operation through the contraction path, the features are extracted through a Relu activation function, a maxpool function is applied to the extracted features to identify relevant features, and Softmax is applied to the last layer of the U-Net model for activation to obtain the pixel probability of each class.
9. The method of claim 8, wherein the keyframes processed by the U-Net model are 256 x 256 color images, and the U-Net model is populated on each layer simultaneously, preserving the most relevant features for the keyframe features.
10. A computer program product comprising computer readable code which, when executed by a computer device, causes the computer device to perform the method of any of claims 1-9.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011459565.8A CN112613516A (en) | 2020-12-11 | 2020-12-11 | Semantic segmentation method for aerial video data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011459565.8A CN112613516A (en) | 2020-12-11 | 2020-12-11 | Semantic segmentation method for aerial video data |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112613516A true CN112613516A (en) | 2021-04-06 |
Family
ID=75233598
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011459565.8A Pending CN112613516A (en) | 2020-12-11 | 2020-12-11 | Semantic segmentation method for aerial video data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112613516A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2023000159A1 (en) * | 2021-07-20 | 2023-01-26 | 海南长光卫星信息技术有限公司 | Semi-supervised classification method, apparatus and device for high-resolution remote sensing image, and medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107590442A (en) * | 2017-08-22 | 2018-01-16 | 华中科技大学 | A kind of video semanteme Scene Segmentation based on convolutional neural networks |
CN108182421A (en) * | 2018-01-24 | 2018-06-19 | 北京影谱科技股份有限公司 | Methods of video segmentation and device |
CN109753913A (en) * | 2018-12-28 | 2019-05-14 | 东南大学 | Calculate efficient multi-mode video semantic segmentation method |
CN109919044A (en) * | 2019-02-18 | 2019-06-21 | 清华大学 | The video semanteme dividing method and device of feature propagation are carried out based on prediction |
CN110782469A (en) * | 2019-10-25 | 2020-02-11 | 北京达佳互联信息技术有限公司 | Video frame image segmentation method and device, electronic equipment and storage medium |
CN110852961A (en) * | 2019-10-28 | 2020-02-28 | 北京影谱科技股份有限公司 | Real-time video denoising method and system based on convolutional neural network |
-
2020
- 2020-12-11 CN CN202011459565.8A patent/CN112613516A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107590442A (en) * | 2017-08-22 | 2018-01-16 | 华中科技大学 | A kind of video semanteme Scene Segmentation based on convolutional neural networks |
CN108182421A (en) * | 2018-01-24 | 2018-06-19 | 北京影谱科技股份有限公司 | Methods of video segmentation and device |
CN109753913A (en) * | 2018-12-28 | 2019-05-14 | 东南大学 | Calculate efficient multi-mode video semantic segmentation method |
CN109919044A (en) * | 2019-02-18 | 2019-06-21 | 清华大学 | The video semanteme dividing method and device of feature propagation are carried out based on prediction |
CN110782469A (en) * | 2019-10-25 | 2020-02-11 | 北京达佳互联信息技术有限公司 | Video frame image segmentation method and device, electronic equipment and storage medium |
CN110852961A (en) * | 2019-10-28 | 2020-02-28 | 北京影谱科技股份有限公司 | Real-time video denoising method and system based on convolutional neural network |
Non-Patent Citations (1)
Title |
---|
GIRISHA, S.,等: ""Semantic segmentation of UAV aerial videos using convolutional neural networks"", 《IEEE SECOND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND KNOWLEDGE ENGINEERING》, pages 21 - 27 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2023000159A1 (en) * | 2021-07-20 | 2023-01-26 | 海南长光卫星信息技术有限公司 | Semi-supervised classification method, apparatus and device for high-resolution remote sensing image, and medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111010590B (en) | Video clipping method and device | |
CN107274433B (en) | Target tracking method and device based on deep learning and storage medium | |
CN109035304B (en) | Target tracking method, medium, computing device and apparatus | |
JP6474854B2 (en) | Method and apparatus for updating a background model | |
AU2009243442B2 (en) | Detection of abnormal behaviour in video objects | |
US10068137B2 (en) | Method and device for automatic detection and tracking of one or multiple objects of interest in a video | |
CN111311475A (en) | Detection model training method and device, storage medium and computer equipment | |
Girisha et al. | Semantic segmentation of UAV aerial videos using convolutional neural networks | |
CN110287877B (en) | Video object processing method and device | |
CN113191180B (en) | Target tracking method, device, electronic equipment and storage medium | |
CN109859250B (en) | Aviation infrared video multi-target detection and tracking method and device | |
CN113205138B (en) | Face and human body matching method, equipment and storage medium | |
CN111753590A (en) | Behavior identification method and device and electronic equipment | |
CN115511920A (en) | Detection tracking method and system based on deep sort and deep EMD | |
CN110795599B (en) | Video emergency monitoring method and system based on multi-scale graph | |
CN115761655A (en) | Target tracking method and device | |
Mishra | Video shot boundary detection using hybrid dual tree complex wavelet transform with Walsh Hadamard transform | |
CN110969645A (en) | Unsupervised abnormal track detection method and unsupervised abnormal track detection device for crowded scenes | |
CN112613516A (en) | Semantic segmentation method for aerial video data | |
JP2014110020A (en) | Image processor, image processing method and image processing program | |
CN115187884A (en) | High-altitude parabolic identification method and device, electronic equipment and storage medium | |
KR20170095599A (en) | System and method for video searching | |
CN110956649A (en) | Method and device for tracking multi-target three-dimensional object | |
CN113762027B (en) | Abnormal behavior identification method, device, equipment and storage medium | |
CN112686828B (en) | Video denoising method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |