CN110097026B - Paragraph association rule evaluation method based on multi-dimensional element video segmentation - Google Patents

Paragraph association rule evaluation method based on multi-dimensional element video segmentation Download PDF

Info

Publication number
CN110097026B
CN110097026B CN201910395119.6A CN201910395119A CN110097026B CN 110097026 B CN110097026 B CN 110097026B CN 201910395119 A CN201910395119 A CN 201910395119A CN 110097026 B CN110097026 B CN 110097026B
Authority
CN
China
Prior art keywords
video
segmentation
audio
paragraph
association rule
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910395119.6A
Other languages
Chinese (zh)
Other versions
CN110097026A (en
Inventor
胡燕祝
田雯嘉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN201910395119.6A priority Critical patent/CN110097026B/en
Publication of CN110097026A publication Critical patent/CN110097026A/en
Application granted granted Critical
Publication of CN110097026B publication Critical patent/CN110097026B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/49Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computing Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention mainly provides a paragraph association rule evaluation method based on multi-dimensional element video segmentation, which comprises the following specific contents: the method comprises the following steps: analyzing a video; step two: extracting key frames in scene segmentation; step three: scene segmentation based on key frames; step four, audio segmentation of the video; fifthly, semantic segmentation of the video; step six: a GNN network segmented video paragraph association rule judgment method; step seven: and constructing the associated network. After the same video is subjected to multi-dimensional segmentation, the corresponding multi-dimensional elements are matched in a way of constructing paragraph association rules. Compared with other paragraph association rule judging methods for video segmentation, the paragraph association rule judging method for video segmentation based on the multi-dimensional elements has the advantages that the change of pixels in an image sequence in a time domain and the correlation between adjacent frames are combined to realize good segmentation of the video in image dimensions, the key information of the video is reserved, and the paragraph association rule judging method for video segmentation based on the multi-dimensional elements can be effectively provided.

Description

Paragraph association rule evaluation method based on multi-dimensional element video segmentation
Technical Field
The invention mainly relates to a paragraph association rule evaluation method, in particular to a paragraph association rule evaluation method based on multi-dimensional element video segmentation.
Background
At present, aiming at the problem of video structuring, most of videos are segmented on the aspect of a single-dimensional element of an image, and the research on a video structuring method based on multi-dimensional segmentation involves less. In practice, audio information, text information, and the like contained in the video play an important role in video monitoring. In addition, when a moving object in a video is segmented and a key frame is extracted, in order to consider the problem of computational efficiency, only one frame in the video is taken as the key frame, and important information contained in the video is often ignored, or the key frame is selected by sequentially performing visual feature comparison on the video frames in a threshold setting mode. Meanwhile, after the same video is subjected to three-dimensional segmentation of scene, sound and text, videos of different time periods are obtained. The segmented video in these three dimensions is not perfectly aligned, creating an intersection situation. Therefore, there is a need to establish a paragraph association rule evaluation method capable of completely matching three-dimensional elements of scenes, sounds and texts.
The current application in video structuring is very wide. For example, the video-based structuring is applied to a fire-fighting facility monitoring system in a public place, the video structuring in public safety, the video structuring technology, the application in a safe city and the like. With the large-scale deployment of urban video monitoring systems, video monitoring has penetrated into all corners of cities, and a large amount of monitoring video data is generated in all industries such as intelligent transportation, government supervision, enterprise operation and the like. With the continuous deepening of edge computing, cloud computing and big data technologies, the problems of huge video data volume, difficult storage, inconvenient retrieval and the like are increasingly highlighted, large-scale real-time video monitoring data are oriented, image processing work such as real-time spatio-temporal information labeling, character extraction, feature extraction, target classification, structured labeling and the like needs to be carried out on video stream data and is quickly transmitted to central computing processing, a paragraph association rule evaluation method for multi-dimensional element video segmentation needs to be constructed, scenes, sounds and texts can be quickly and accurately matched, and a real-time and efficient monitoring means is provided for the operation of governments and enterprises in China.
Disclosure of Invention
To solve the problems in the prior art, the present invention mainly provides a method for evaluating paragraph association rules based on multi-dimensional element video segmentation, and the specific flow of the method is shown in fig. 1.
The technical scheme comprises the following implementation steps:
the method comprises the following steps: and (6) video parsing.
The first step of video analysis is data reception, and the video needs to be demultiplexed into an image track, an audio track and a subtitle track.
Step two: key frame extraction in scene segmentation.
The key frame extraction method is mainly divided into five categories, and the specific method is shown in fig. 2.
(1) Key frames are extracted based on the boundaries. The method selects the first frame and the last frame or the intermediate frame of each shot directly as key frames. The operation amount is small, and the method is suitable for the lens with small content activity or unchanged content.
(2) Key frames are extracted based on the visual features. The method first selects the first frame as the nearest key frame, and then the following frames are compared with the nearest key frame in turn for visual features, such as color, motion, edge, shape, spatial relationship, and the like. If the difference between the current frame and the nearest key frame exceeds a predetermined threshold, then the current frame is selected as the key frame.
(3) Key frames are extracted based on clustering. This kind of method uses clustering technique to cluster all frames of a shot, then according to some criteria, such as the number of frames in the category, selects key category from these categories, and then selects the frame with the minimum clustering parameter from the key category as the key frame.
(4) Key frames are extracted based on the multiple modes. The method mainly simulates human perception capability to carry out simplified video content analysis, and generally integrates video, audio, text and the like. For example, in the scene change of videos such as movies and sports, the video and audio contents often change simultaneously, so a multi-mode extraction method is needed, and when the audio and video features of the shot boundary change greatly at the same time, the shot boundary is a new scene boundary.
(5) Key frames are extracted based on the compressed domain. The compressed domain based method does not need to decompress the video stream or only needs to partially decompress, and directly extracts the key frame from the MPEG compressed video stream, thereby reducing the complexity of calculation.
Step three: scene segmentation based on keyframes.
The method mainly comprises the following three aspects:
(1) based on inter-frame differential detection. The interframe difference method is a method for obtaining the contour of a moving target by carrying out difference operation on two adjacent frames in a video image sequence, and can be well suitable for the condition that a plurality of moving targets exist and a camera moves.
(2) Based on background differential detection. The background difference method is a general method for motion segmentation of a static scene, which performs difference operation on a currently acquired image frame and a background image to obtain a gray level image of a target motion area, performs thresholding on the gray level image to extract the motion area, and updates the background image according to the currently acquired image frame in order to avoid the influence of environmental illumination change. The details are shown in fig. 3.
(3) And detecting based on an optical flow method. The optical flow method calculates motion information of an object between adjacent frames according to the corresponding relation between a previous frame and a current frame by using the change of pixels in an image sequence on a time domain and the correlation between the adjacent frames.
(4) The segmented video, which can be represented as x1,…,xiWhere x represents a time period of the segmented videoAnd i represents the number of divided videos.
Step four: audio segmentation of video.
The EMD-based audio segmentation method comprises the following specific processes:
(1) and determining all maximum value points of the original audio data sequence X (t), and fitting by using a cubic spline interpolation function to form an upper envelope line of the original data.
(2) And finding out all minimum value points, and fitting all the minimum value points through a cubic spline interpolation function to form a lower envelope curve of the data.
(3) The mean value of the upper envelope and the lower envelope is denoted as ml, and the mean envelope ml is subtracted from the original data sequence x (t) to obtain a new audio data sequence hl, as shown in the formula:
hl=X(t)-ml
(4) and clustering and segmenting the audio data subjected to EMD decomposition.
(5) The divided audio, which can be represented as y1,…,yjWhere y represents the time period of the divided audio and j represents the number of divided audio.
Step five: and (4) semantic segmentation of the video.
For semantic segmentation of paragraphs, the following aspects are mainly included:
(1) semantic blocks are defined. The semantic block is used for dividing a sentence into a plurality of relatively independent semantic units, and the length of the semantic units is based on the length of the sentence above the word meaning and below the word meaning; the method is a preprocessing means associated with grammar, semantics and pragmatics. The semantic blocks are non-recursive, non-nested and non-overlapping.
(2) And (5) segmenting sentence meaning. Natural language processing typically requires analysis of three aspects: grammar, semanteme and context, so that the statistical treatment of text word segmentation and part of speech tag is firstly carried out, after word classification is finished, the fast labeling work is carried out, then the semanteme recombination is carried out on words, and finally sentence meaning segmentation is carried out according to the defined semantic blocks.
(3) The segmented paragraph can be denoted as z1,…,zkWhere z represents a time segment of the segmented audio and k represents the segmented audioThe number of (2).
Step six: a method for judging paragraph association rules of segmented videos of a GNN network.
Graphical Neural Networks (GNNs) are primarily effective at modeling relationships or interactions between objects in a system. For the same video, after the same video is segmented in three dimensions of the scene, the sound and the paragraph, videos in different time periods are obtained, and the three-dimensional segmented videos cannot be completely aligned and can generate a cross condition. t represents video per second, and GNN (t | x), GNN (t | y), GNN (t | z) refer to feature vectors currently extracted for segmenting video segments in various dimensions.
Step seven: and constructing the associated network.
The construction of the associated network is mainly divided into 2 steps.
(1) And starting from a single dimension, constructing a network association rule in each video segment according to the Euclidean distance or the Hamming distance, wherein the network association rule comprises the strength and the direction between nodes.
(2) And combining the association networks of the three dimensions together to form a new directed association network.
Compared with the prior art, the invention has the advantages that:
(1) the invention combines the change of the pixels in the image sequence in the time domain, the correlation between the adjacent frames and the corresponding relation between the previous frame and the current frame to realize good segmentation of the video in the image dimension, and reserves the key information of the video.
(2) The method and the device have the advantages that after the same video is segmented in three dimensions of scene, sound and text, the corresponding scene, sound and text are matched in a way of constructing paragraph association rules.
Drawings
For a better understanding of the present invention, reference is made to the following further description taken in conjunction with the accompanying drawings.
FIG. 1 is a flowchart illustrating steps of a method for building a paragraph association rule evaluation method based on multi-dimensional element video segmentation;
FIG. 2 is a schematic diagram of a key frame extraction method;
FIG. 3 is a schematic diagram of the content of the background difference detection method;
detailed description of the preferred embodiments
The present invention will be described in further detail below with reference to examples.
The technical scheme comprises the following implementation steps:
the method comprises the following steps: and (6) video parsing.
The first step of video analysis is data reception, and the video needs to be demultiplexed into an image track, an audio track and a subtitle track.
The traffic monitoring video at a certain place in Beijing city is demultiplexed, the video time is 1 minute and 50 seconds, the video is decomposed into an image track, an audio track and a subtitle track, and the time of the decomposed audio track and subtitle track is 1 minute and 50 seconds.
Step two: key frame extraction in scene segmentation.
The key frame extraction method is mainly divided into five categories, and the specific method is shown in fig. 2.
(1) Key frames are extracted based on the boundaries. The method selects the first frame and the last frame or the intermediate frame of each shot directly as key frames. The operation amount is small, and the method is suitable for the lens with small content activity or unchanged content.
(2) Key frames are extracted based on the visual features. The method first selects the first frame as the nearest key frame, and then the following frames are compared with the nearest key frame in turn for visual features, such as color, motion, edge, shape, spatial relationship, and the like. If the difference between the current frame and the nearest key frame exceeds a predetermined threshold, then the current frame is selected as the key frame.
(3) Key frames are extracted based on clustering. This kind of method uses clustering technique to cluster all frames of a shot, then according to some criteria, such as the number of frames in the category, selects key category from these categories, and then selects the frame with the minimum clustering parameter from the key category as the key frame.
(4) Key frames are extracted based on the multiple modes. The method mainly simulates human perception capability to carry out simplified video content analysis, and generally integrates video, audio, text and the like. For example, in the scene change of videos such as movies and sports, the video and audio contents often change simultaneously, so a multi-mode extraction method is needed, and when the audio and video features of the shot boundary change greatly at the same time, the shot boundary is a new scene boundary.
(5) Key frames are extracted based on the compressed domain. The compressed domain based method does not need to decompress the video stream or only needs to partially decompress, and directly extracts the key frame from the MPEG compressed video stream, thereby reducing the complexity of calculation.
In this example, the video is processed by a method of clustering to extract key frames, and the key frames are clustered into 5 categories.
Step three: scene segmentation based on keyframes.
The method mainly comprises the following three aspects:
(1) based on inter-frame differential detection. The interframe difference method is a method for obtaining the contour of a moving target by carrying out difference operation on two adjacent frames in a video image sequence, and can be well suitable for the condition that a plurality of moving targets exist and a camera moves.
(2) Based on background differential detection. The background difference method is a general method for motion segmentation of a static scene, which performs difference operation on a currently acquired image frame and a background image to obtain a gray level image of a target motion area, performs thresholding on the gray level image to extract the motion area, and updates the background image according to the currently acquired image frame in order to avoid the influence of environmental illumination change. The details are shown in fig. 3.
(3) And detecting based on an optical flow method. The optical flow method calculates motion information of an object between adjacent frames according to the corresponding relation between a previous frame and a current frame by using the change of pixels in an image sequence on a time domain and the correlation between the adjacent frames.
(4) The segmented video, which can be represented as x1,…,xiWhere x denotes a time period of the divided video and i denotes the number of the divided videos.
Eye-to-eyeAfter key frames are extracted, the video is divided by adopting an optical flow method detection technology, and the divided video has 25 segments which are x respectively1,x2,…,x25
Step four: audio segmentation of video.
The EMD-based audio segmentation method comprises the following specific processes:
(1) and determining all maximum value points of the original audio data sequence X (t), and fitting by using a cubic spline interpolation function to form an upper envelope line of the original data.
(2) And finding out all minimum value points, and fitting all the minimum value points through a cubic spline interpolation function to form a lower envelope curve of the data.
(3) The mean value of the upper envelope and the lower envelope is denoted as ml, and the mean envelope ml is subtracted from the original data sequence x (t) to obtain a new audio data sequence hl, as shown in the formula:
hl=X(t)-ml
(4) and clustering and segmenting the audio data subjected to EMD decomposition.
(5) The divided audio, which can be represented as y1,…,yjWhere y represents the time period of the divided audio and j represents the number of divided audio.
The maximum points included in the original audio data sequence x (t) are 2.3, 2.1, 2, 1.9, 1.8, 1.7, 0.9, and 0.8, respectively. The minimum values are respectively-1.9, -2.1, -2.6, -3.0, 0, -1.0 and-0.5. The mean of the upper envelope is 1.6875 and the mean of the lower envelope is-1.586. The number of the divided audios is 25, and the audios are respectively y1,y2,…,y25
Step five: and (4) semantic segmentation of the video.
For semantic segmentation of paragraphs, the following aspects are mainly included:
(1) semantic blocks are defined. The semantic block is used for dividing a sentence into a plurality of relatively independent semantic units, and the length of the semantic units is based on the length of the sentence above the word meaning and below the word meaning; the method is a preprocessing means associated with grammar, semantics and pragmatics. The semantic blocks are non-recursive, non-nested and non-overlapping.
(2) And (5) segmenting sentence meaning. Natural language processing typically requires analysis of three aspects: grammar, semanteme and context, so that the statistical treatment of text word segmentation and part of speech tag is firstly carried out, after word classification is finished, the fast labeling work is carried out, then the semanteme recombination is carried out on words, and finally sentence meaning segmentation is carried out according to the defined semantic blocks.
(3) The segmented paragraph can be denoted as z1,…,zkWhere z represents a time period of the divided audio and k represents the number of divided audio.
The number of the divided texts is 25, and the divided texts are respectively z1,z2,…,z25The concrete contents include 'right turn at crossroad', 'pedestrian stopping', 'serious vehicle congestion' and the like.
Step six: a method for judging paragraph association rules of segmented videos of a GNN network.
Graphical Neural Networks (GNNs) are primarily effective at modeling relationships or interactions between objects in a system. For the same video, after the same video is segmented in three dimensions of the scene, the sound and the paragraph, videos in different time periods are obtained, and the three-dimensional segmented videos cannot be completely aligned and can generate a cross condition. t represents video per second, and GNN (t | x), GNN (t | y), GNN (t | z) refer to feature vectors currently extracted for segmenting video segments in various dimensions.
Extracting the feature vector of each dimension of the segmented video segment at the 5s moment to obtain a scene feature vector GNN (5| x)1,x2,…,x25) The sound feature vector is GNN (5| y)1,y2,…,y25) The paragraph feature vector is GNN (5| z)1,z2,…,z25)。
Step seven: and constructing the associated network.
The construction of the associated network is mainly divided into 2 steps.
(1) And starting from a single dimension, constructing a network association rule in each video segment according to the Euclidean distance or the Hamming distance, wherein the network association rule comprises the strength and the direction between nodes.
(2) And combining the association networks of the three dimensions together to form a new directed association network.

Claims (1)

1. A paragraph association rule evaluation method based on multi-dimensional element video segmentation is characterized by comprising the following steps:
the method comprises the following steps: video analysis:
the first step of video analysis is data reception, and the video needs to be subjected to demultiplexing processing and is decomposed into an image track, an audio track and a subtitle track;
step two: extracting key frames in scene segmentation:
clustering all frames of a shot by using a clustering technology, then selecting key categories from the categories according to a frame number criterion in the categories, and then selecting the frame with the minimum clustering parameter from the key categories as a key frame;
step three: scene segmentation based on keyframes:
segmenting a scene by adopting an optical flow method, calculating motion information of an object between adjacent frames according to the corresponding relation between a previous frame and a current frame by utilizing the change of pixels in an image sequence on a time domain and the correlation between the adjacent frames, wherein a segmented video can be expressed as x1,…,xiWhere x represents a time period of the divided videos, and i represents the number of the divided videos;
step four: audio segmentation of video:
the EMD-based audio segmentation method comprises the following specific processes:
(1) determining all maximum value points of the original audio data sequence X (t), and fitting by using a cubic spline interpolation function to form an upper envelope line of the original data;
(2) finding out all minimum value points, and fitting all the minimum value points through a cubic spline interpolation function to form a lower envelope curve of the data;
(3) the mean value of the upper envelope and the lower envelope is denoted as ml, and the mean envelope ml is subtracted from the original data sequence x (t) to obtain a new audio data sequence hl, as shown in the formula:
hl=X(t)-ml;
(4) clustering and dividing the audio data subjected to EMD decomposition;
(5) the divided audio, which can be represented as y1,…,yjWhere y represents a time period of the divided audio, and j represents the number of the divided audio;
step five: semantic segmentation of video:
for semantic segmentation of paragraphs, the following aspects are included:
(1) defining a semantic block: the semantic block is used for dividing a sentence into a plurality of relatively independent semantic units, is a preprocessing means for associating grammar, semantics and pragmatics, and is non-recursive, non-nested and non-overlapping;
(2) sentence meaning segmentation: natural language processing requires analysis of three aspects: grammar, semanteme and context, therefore, the statistical processing work of text word segmentation and part of speech marks is carried out at first, after the word classification is finished, the fast labeling work is carried out on the word, then the semanteme recombination is carried out on the word, and finally the sentence meaning segmentation is carried out according to the well-defined semantic blocks;
(3) the segmented paragraph can be denoted as z1,…,zkWherein z represents a time period of the divided audio, and k represents the number of the divided audio;
step six: the method for judging the paragraph association rule of the segmented video of the GNN network comprises the following steps:
the method comprises the steps that the relationship or interaction among objects in a Graphical Neural Network (GNN) modeling system is achieved, for the same video, the video in different time periods is obtained after segmentation is carried out on the three dimensions according to the scene, the sound and the paragraph, and the videos segmented in the three dimensions cannot be completely aligned and can generate the crossing condition, so that the GNN Neural Network is adopted to evaluate the relevance of the segmented video paragraphs; t represents a video of each second, GNN (t | x) refers to a feature vector extracted by currently dividing a video segment in a scene dimension, GNN (t | y) refers to a feature vector extracted by currently dividing a video segment in a sound dimension, and GNN (t | z) refers to a feature vector extracted by currently dividing a video segment in a paragraph dimension, and on the basis, an association network is constructed for the divided three-dimensional video segments;
step seven: constructing a correlation network:
the construction of the associated network is divided into 2 steps:
(1) starting from a single dimension, constructing a network association rule in each video segment according to Euclidean distance or Hamming distance, wherein the network association rule comprises the strength and the direction between nodes;
(2) and combining the association networks of the three dimensions together to form a new directed association network.
CN201910395119.6A 2019-05-13 2019-05-13 Paragraph association rule evaluation method based on multi-dimensional element video segmentation Active CN110097026B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910395119.6A CN110097026B (en) 2019-05-13 2019-05-13 Paragraph association rule evaluation method based on multi-dimensional element video segmentation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910395119.6A CN110097026B (en) 2019-05-13 2019-05-13 Paragraph association rule evaluation method based on multi-dimensional element video segmentation

Publications (2)

Publication Number Publication Date
CN110097026A CN110097026A (en) 2019-08-06
CN110097026B true CN110097026B (en) 2021-04-27

Family

ID=67447957

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910395119.6A Active CN110097026B (en) 2019-05-13 2019-05-13 Paragraph association rule evaluation method based on multi-dimensional element video segmentation

Country Status (1)

Country Link
CN (1) CN110097026B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111126262B (en) * 2019-12-24 2023-04-28 中国科学院自动化研究所 Video highlight detection method and device based on graphic neural network
CN111586494B (en) * 2020-04-30 2022-03-11 腾讯科技(深圳)有限公司 Intelligent strip splitting method based on audio and video separation
CN113810782B (en) * 2020-06-12 2022-09-27 阿里巴巴集团控股有限公司 Video processing method and device, server and electronic device
CN111914118B (en) * 2020-07-22 2021-08-27 珠海大横琴科技发展有限公司 Video analysis method, device and equipment based on big data and storage medium
CN113470048B (en) * 2021-07-06 2023-04-25 北京深睿博联科技有限责任公司 Scene segmentation method, device, equipment and computer readable storage medium
CN115665359B (en) * 2022-10-09 2023-04-25 西华县环境监察大队 Intelligent compression method for environment monitoring data
CN115905584B (en) * 2023-01-09 2023-08-11 共道网络科技有限公司 Video splitting method and device

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8229227B2 (en) * 2007-06-18 2012-07-24 Zeitera, Llc Methods and apparatus for providing a scalable identification of digital video sequences
CN106780503A (en) * 2016-12-30 2017-05-31 北京师范大学 Remote sensing images optimum segmentation yardstick based on posterior probability information entropy determines method
CN109344780A (en) * 2018-10-11 2019-02-15 上海极链网络科技有限公司 A kind of multi-modal video scene dividing method based on sound and vision
CN109711379B (en) * 2019-01-02 2022-03-15 电子科技大学 Complex environment traffic signal lamp candidate area extraction and identification method

Also Published As

Publication number Publication date
CN110097026A (en) 2019-08-06

Similar Documents

Publication Publication Date Title
CN110097026B (en) Paragraph association rule evaluation method based on multi-dimensional element video segmentation
CN110197135B (en) Video structuring method based on multi-dimensional segmentation
CN109389055B (en) Video classification method based on mixed convolution and attention mechanism
CN103347167A (en) Surveillance video content description method based on fragments
CN105469425A (en) Video condensation method
CN101971190A (en) Real-time body segmentation system
CN102880692A (en) Retrieval-oriented monitoring video semantic description and inspection modeling method
CN114186069B (en) Depth video understanding knowledge graph construction method based on multi-mode different-composition attention network
CN111738218B (en) Human body abnormal behavior recognition system and method
CN110705412A (en) Video target detection method based on motion history image
CN109583334B (en) Action recognition method and system based on space-time correlation neural network
CN114708555A (en) Forest fire prevention monitoring method based on data processing and electronic equipment
Qin et al. Application of video scene semantic recognition technology in smart video
Zin et al. A probability-based model for detecting abandoned objects in video surveillance systems
CN113936236A (en) Video entity relationship and interaction identification method based on multi-modal characteristics
CN111160099B (en) Intelligent segmentation method for video image target
Hwang et al. Object extraction and tracking using genetic algorithms
CN115188081B (en) Complex scene-oriented detection and tracking integrated method
Jeyabharathi Cut set-based dynamic key frame selection and adaptive layer-based background modeling for background subtraction
Skadins et al. Edge pre-processing of traffic surveillance video for bandwidth and privacy optimization in smart cities
CN112306985A (en) Digital retina multi-modal feature combined accurate retrieval method
CN111246176A (en) Video transmission method for realizing banding
CN116152696A (en) Intelligent security image identification method and system for industrial control system
Khan et al. Segmentation of crowd into multiple constituents using modified mask R-CNN based on mutual positioning of human
Wang et al. Video Smoke Detection Based on Multi-feature Fusion and Modified Random Forest.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant