CN116229330A - Method, system, electronic equipment and storage medium for determining video effective frames - Google Patents

Method, system, electronic equipment and storage medium for determining video effective frames Download PDF

Info

Publication number
CN116229330A
CN116229330A CN202310293915.5A CN202310293915A CN116229330A CN 116229330 A CN116229330 A CN 116229330A CN 202310293915 A CN202310293915 A CN 202310293915A CN 116229330 A CN116229330 A CN 116229330A
Authority
CN
China
Prior art keywords
video
frames
video frame
clustered
screened
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310293915.5A
Other languages
Chinese (zh)
Inventor
林燕丹
张雷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fudan University
Original Assignee
Fudan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fudan University filed Critical Fudan University
Priority to CN202310293915.5A priority Critical patent/CN116229330A/en
Publication of CN116229330A publication Critical patent/CN116229330A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/762Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/762Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
    • G06V10/763Non-hierarchical techniques, e.g. based on statistics of modelling distributions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Television Signal Processing For Recording (AREA)

Abstract

The invention discloses a method, a system, electronic equipment and a storage medium for determining video effective frames, and relates to the technical field of video frame selection, wherein the method comprises the following steps: acquiring all video frames of a target video; the target video is a video containing valid frames to be determined; feature extraction and feature dimension reduction are sequentially carried out on all video frames, so that a dimension reduction feature matrix of a target video is obtained; clustering all video frames based on a kaline index, a clustering algorithm, a preset cluster number range and a dimension reduction feature matrix to obtain a clustered video frame set to be screened; the clustered video frame sets to be screened comprise a plurality of groups of video frame sets to be screened; and determining all effective frames of the target video based on the clustered video frame set to be screened. The invention improves the sensitivity and the universality of video frame selection.

Description

Method, system, electronic equipment and storage medium for determining video effective frames
Technical Field
The present invention relates to the field of video frame selection technologies, and in particular, to a method, a system, an electronic device, and a storage medium for determining a video valid frame.
Background
The method proposed by Moccia et al in 18 years uses an approval operator (criteria function) to extract the features of the endoscope video frame, the effect of this method depends on whether the person's observation of the data is accurate or not, and the criterion operators are manually chosen, each operator needs to be processed once over the whole data set, thus being time-consuming; in addition, the criterion operator also affects the final indexes such as effective frame sensitivity and the like. In another situation, contrary to Moccia et al, patrini et al and Galdran et al simultaneously proposed in 2019 to use the transfer learning in advanced deep learning technology, and the characteristics of the image are widely learned by using the neural network to realize end-to-end selection of effective frames. But back into the clinical data, this end-to-end model cannot be packaged and used as it is: on the one hand, the tag may not be adapted; on the other hand, if the type method of the video frame changes, the model needs to be retrained, and the model is difficult to recycle. Summarizing, the existing video frame selection method has the problems of low sensitivity and poor universality.
Disclosure of Invention
The invention aims to provide a method, a system, electronic equipment and a storage medium for determining effective frames of video, which improve the sensitivity and universality of video frame selection.
In order to achieve the above object, the present invention provides the following solutions:
a method of determining a valid frame of a video, the method comprising:
acquiring all video frames of a target video; the target video is a video containing effective frames to be determined;
feature extraction and feature dimension reduction are sequentially carried out on all the video frames, so that a dimension reduction feature matrix of the target video is obtained;
clustering all the video frames based on a kaline index, a clustering algorithm, a preset cluster number range and the dimension reduction feature matrix to obtain a clustered video frame set to be screened; the clustered video frame sets to be screened comprise a plurality of groups of video frame sets to be screened;
and determining all valid frames of the target video based on the clustered video frame set to be screened.
Optionally, feature extraction and feature dimension reduction are sequentially performed on all the video frames to obtain a dimension reduction feature matrix of the target video, which specifically includes:
extracting features of each video frame to obtain an initial feature matrix of the target video;
and performing feature dimension reduction on the initial feature matrix to obtain the dimension reduction feature matrix.
Optionally, clustering all the video frames based on a kaline index, a clustering algorithm, a preset cluster number range and the dimension reduction feature matrix to obtain a clustered video frame set to be screened, which specifically comprises the following steps:
taking each preset cluster number in the preset cluster number range as a cluster number, and clustering all the video frames by using the clustering algorithm and the dimension reduction feature matrix to obtain a clustered video frame set corresponding to each preset cluster number;
respectively calculating the kaline index of the clustered video frame set corresponding to each preset cluster number;
and determining the clustered video frame set with the maximum kalina index as the clustered video frame set to be screened.
Optionally, determining all valid frames of the target video based on the clustered video frame set to be screened specifically includes:
judging whether a preset number of video frames in the current video frame set to be screened are valid frames or not;
if yes, all video frames in the current video frame set to be screened are determined to be effective frames.
Optionally, the clustering algorithm is an aggregation clustering algorithm, a K-means clustering algorithm, a spectral clustering algorithm or a density-based noisy application spatial clustering algorithm.
Optionally, when the target video includes N Zhang Shipin frames and the preset cluster number is K, the calculation formula of the kalina index is:
Figure BDA0004142510100000021
Figure BDA0004142510100000022
Figure BDA0004142510100000023
wherein CH is a kaline index; BGSS is an intra-cluster spacing index; WGSS is an inter-cluster spacing index; k is the serial number of the clustered video frame set; n is n k The number of video frames in the video frame set for the kth cluster; c (C) k Is the centroid of the kth cluster video frame set; c is the mass center of all the clustered video frame sets; WGSS (Wireless telecommunication System) k The distances from all video frames to the centroid are concentrated for the kth cluster of video frames; i is the sequence number of the video frame in the k cluster video frame set; x is X ik Is the ith video frame in the kth cluster of video frames set.
A system for determining a valid frame of a video, the system comprising:
the target video acquisition module is used for acquiring all video frames of the target video; the target video is a video containing effective frames to be determined;
the feature processing module is used for sequentially carrying out feature extraction and feature dimension reduction on all the video frames to obtain a dimension reduction feature matrix of the target video;
the clustering module is used for clustering all the video frames based on a kaline index, a clustering algorithm, a preset cluster number range and the dimension reduction feature matrix to obtain a clustered video frame set to be screened; the clustered video frame sets to be screened comprise a plurality of groups of video frame sets to be screened;
and the effective frame determining module is used for determining all effective frames of the target video based on the clustered video frame set to be screened.
An electronic device, comprising:
one or more processors;
a storage device having one or more programs stored thereon;
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of determining a video active frame as described above.
A storage medium having stored thereon a computer program which, when executed by a processor, implements a method of determining a video active frame as described above.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
the invention discloses a method, a system, electronic equipment and a storage medium for determining effective frames of a video, wherein the method comprises the steps of firstly, sequentially carrying out feature extraction and feature dimension reduction on all video frames of a target video to obtain a dimension reduction feature matrix of the target video; clustering all video frames based on a kaline index, a clustering algorithm, a preset cluster number range and a dimension reduction feature matrix to obtain a clustered video frame set to be screened; the clustered video frame sets to be screened comprise a plurality of groups of video frame sets to be screened; and determining all effective frames of the target video based on the clustered video frame set to be screened. Compared with the method for extracting the characteristics of the endoscope video frames by using the criterion operator, the method reduces manpower and improves the sensitivity of video frame selection; compared with the method for realizing end-to-end selection of effective frames, when the initial video frames are changed, retraining is not needed, and the universality of video frame selection is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the drawings that are needed in the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart illustrating a method for determining an effective frame of a video according to embodiment 1 of the present invention;
FIG. 2 is a flow chart of feature extraction for each video frame using a vanilla neural network;
fig. 3 is a schematic diagram of an automatic label distribution algorithm.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The invention aims to provide a method, a system, electronic equipment and a storage medium for determining video effective frames, aiming at improving the sensitivity and universality of video frame selection.
In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description.
Example 1
Fig. 1 is a flowchart illustrating a method for determining an effective frame of a video according to embodiment 1 of the present invention. As shown in fig. 1, the method for determining a video valid frame in this embodiment includes:
step 101: acquiring all video frames of a target video; the target video is a video containing valid frames to be determined.
Specifically, the target video includes, but is not limited to, endoscopic video.
Step 102: and carrying out feature extraction and feature dimension reduction on all the video frames in sequence to obtain a dimension reduction feature matrix of the target video.
Step 103: clustering all video frames based on a kaline index, a clustering algorithm, a preset cluster number range and a dimension reduction feature matrix to obtain a clustered video frame set to be screened; the clustered video frame sets to be screened comprise a plurality of groups of video frame sets to be screened.
Step 104: and determining all effective frames of the target video based on the clustered video frame set to be screened.
As an alternative embodiment, step 102 specifically includes:
step 1021: and extracting the characteristics of each video frame to obtain an initial characteristic matrix of the target video.
Specifically, before step 1021, the method further includes: each video frame is preprocessed.
Preprocessing is scaling: the original video frame is cropped to an RGB image of 224 x 224 size.
Step 1021 specifically includes:
and extracting the characteristics of each video frame by using the vanilla neural network to obtain the characteristic vector of each video frame, thereby obtaining the initial characteristic matrix of the target video.
As shown in fig. 2, the vanilla neural network (vanillaneural network) is a deep convolutional neural network, which inputs an RGB image of 224×224. The depth of the image is shrunk to 64 dimensions by 2 convolution kernels (3 x 3conv, 64) of 3 x 3, and then stretched to 128 dimensions by a maximum Pooling layer (Max Pooling); the depth of the image is stretched to 256 dimensions via 2 3 x 3 image convolution kernels (3 x 3conv, 128) and one maximum Pooling layer (Max Pooling); a 3-dimensional vector of 512 dimensions is obtained via 3 x 3 convolution kernels (3 x 3conv, 256), and the width and height of the image are reduced without changing the depth of the image via the fourth max pooling layer; passing through 3 convolution kernels (3×3conv, 512) of 3×3 and a Max Pooling layer (Max Pooling) again to obtain a 3-dimensional vector of 14×14×512; some features were randomly discarded via one-time-flattening (flat), the dimensions of the features became 4096; the last full link layer (Dense) does not change the feature size, and the feature remains 4096. Thus, the feature vector with the length of 4096 of each video frame is extracted, and when the target video comprises N video frames, a matrix (N, 4096) is obtained, namely an initial feature matrix of the target video.
Step 1022: and performing feature dimension reduction on the initial feature matrix to obtain a dimension reduction feature matrix.
Step 1022 specifically includes: and performing feature dimension reduction on the initial feature matrix by using UMAP algorithm to obtain a dimension reduction feature matrix.
UMAP is a dimension-reducing algorithm based on flow pattern learning, and has high calculation speed and high-dimension space mappingThe global retention of the radiation to the low-dimensional flow pattern space is good, and the radiation is transmitted through the following loss function C UMAP To optimize the mapping of the high-dimensional space to the low-dimensional space; two points x in high-dimensional space i And x j (data from the initial feature matrix) is distributed v by a gaussian j∣i To approximate; likewise, two points y on the low-dimensional flow pattern space i And y j Distributed w by student t ij And (5) approximating. Finally, a final random gradient descent is made to the loss function such that x i And x j The relation is mapped as y i And y j
Figure BDA0004142510100000062
Figure BDA0004142510100000063
Figure BDA0004142510100000064
Wherein sigma i Is Gaussian distribution v j∣i The corresponding variance.
As an optional embodiment, step 103 specifically includes:
and clustering all video frames by using each preset cluster number in a preset cluster number range as a cluster number and using a clustering algorithm and a dimension reduction feature matrix to obtain a clustered video frame set corresponding to each preset cluster number.
And respectively calculating the kaline index of the clustered video frame set corresponding to each preset cluster number.
And determining the clustered video frame set with the maximum kalina index as a clustered video frame set to be screened.
As an optional implementation, step 104 specifically includes:
and judging whether the preset number of video frames in the current video frame set to be screened are valid frames or not.
If yes, all video frames in the current video frame set to be screened are determined to be effective frames.
And in order to detect the effective frames under the determined condition, the sensitivity of the classification result of the clustering algorithm is improved. Sensitivity reflects the detection rate, i.e., the ability to pick out the target frame from all frames, and is also evaluated.
If 4 video frame sets are finally obtained after clustering, namely k=4, k=0, 1,2,3, and the evaluation includes:
1. automatic label distribution algorithm (as shown in FIG. 3)
Through steps 101-103, the target video has been reduced to a (N, 2) matrix, and the clustering algorithm obtains clustering results (where the number of clusters is determined by the kalina index to be K), and the clustering results may be represented as K sets of digital numbers ('0', '1', '2', '3'), where each set contains several video frames. Consider that the automatic label assignment algorithm is to collect K numbers without true meaning
Figure BDA0004142510100000071
Conversion to a set of tags X of practical significance G ("I" indicates active, "B" indicates blur, "S" indicates light reflection, "U" indicates low exposure). The flow of the algorithm is described as follows.
(1) Input: clustering result set
Figure BDA0004142510100000072
And a real tag set X G . Wherein, the real labels are marked by manpower (expert); judging which category (label) a video frame belongs to through human experience; and the result of the clustering, the label output is classified by the algorithm according to the characteristics of the data (generally output is 0,1,2 and …), and the numbers have no practical meaning, but may represent one of the manually classified labels.
(2) Additional conditions: k=4; the number in the cluster set is o+ {0,1,2,3}, and the label set is p+ { I, B, S, U }.
(3) Traversing the image ids under the numbering of each of the cluster sets (e.g., starting from 0);
(4) On the basis of (3), the image id under the tag set is traversed (e.g., starting from I).
(5) Calculating the same number of image ids in the traversing results of (3) and (4),
Figure BDA0004142510100000073
(6) After one traversal is finished, the label number with the most intersection with the serial number of the current traversed cluster set (the set (a plurality of image ids in the set) with the cluster of 0 in the clustering result) is sequentially overlapped with the four sets (a plurality of image ids) of the real label I, B, S, U, the serial number of the label with the most overlapping of the 0 set and the number of the images in the label of the manual label data set is found,
Figure BDA0004142510100000074
(7) And finishing one-time mapping relation, wherein f is o-p.
(8) Until all the number traversal in all the step (3) is finished.
2. Index of evaluation
The results of the clustering algorithm are scored by these several indices.
2.1 precision
Figure BDA0004142510100000075
2.2 sensitivity
Figure BDA0004142510100000076
2.3F1 score
Figure BDA0004142510100000081
TP is called true positive, which represents a true positive sample; FP is known collectively as false positive, not positive but mistaken for positive; FN collectively refers to false positive, meaning that it is not a negative sample but rather predicted to be a negative sample, otherwise a positive sample. The sensitivity measurement model is used for detecting positive samples (positive); the accuracy measures how much (%) of the number of positive samples predicted to be true positive samples. The F1 score provides a single score to balance accuracy and sensitivity.
3. Loss function in super-parameter search algorithm
Since a dimension reduction algorithm, UMAP, is used, and 4 different clustering algorithms are used. Firstly, there are configurable parameters of UMAP, there are also parameters which are not only preset parameters, such as the optimal cluster number, but also configurable parameters in 4 kinds of clustering algorithms, in addition, the relation between UMAP and the clustering algorithm parameters is independent, and the number of the configurable parameters of UMAP is 4, the configurable parameters of 4 kinds of clustering algorithms are {3,3,4,1}, so that the searchable parameter space formed by the parameters is {12, 12, 16,4}. Taking K-means clustering in a clustering algorithm as an example (K-means), assuming that the configurable parameters are 12 in the set in cooperation with UMAP, one possibility that makes the final result (evaluation index, such as average sensitivity) the best needs to be searched out among the 12 possibilities.
The main purpose of the algorithm is to complement and balance the performance difference between the clustering algorithm and the supervised classification algorithm; the optimal classification performance is obtained by combining the feature dimension reduction (UMAP) and the super parameters in the clustering algorithm, and the search range is a parameter space formed by the two-part algorithm (feature dimension reduction and feature clustering). According to the process of the clustering algorithm, two cases are divided.
In the embodiment, K-means clustering (K-means), aggregation clustering (Agglomerative Clustering) and spectral clustering (Spectral Clustering) are divided into the same situation; while the density space (DBSCAN) is divided into another case.
Wherein, for the first case (K-means, agglomerative Clusteri and Spectral Clustering):
(1) Input: sensitivity, accuracy, and parameters of the algorithm (including UMAP, and one of three clustering algorithms) for each algorithm result.
(2) Additional conditions: k=4; weight α≡100; beta+.0.01.
(3) A penalty p is calculated, taking 10% of the absolute value between sensitivity and accuracy.
(4) Constructing a loss function, sensitivity deviates from a 1-weighted penalty value p, and is constrained by a parameter space
Figure BDA0004142510100000091
For the second case (DBSCAN):
(5) The number of outliers ('-1') in the statistical algorithm is denoted N out The method comprises the steps of carrying out a first treatment on the surface of the Counting the number of non-outliers and recording N in
(6) The variance between the number of samples in each non-outlier number and the mean of the population of samples was calculated and denoted as σ.
(7) The absolute value between the number of non-outlier tags (tags other than '-1') and the K value in (5) is weighted, and the variance in (6) is weighted.
(8) A loss function consisting of outliers and weighted results in (7), constrained by the parameter space
Figure BDA0004142510100000092
(9) The loss functions (4) and (8) are optimized using bayesian until a specified number of search steps is reached.
(10) The parameter space returns the optimal parameters.
Algorithm advantage: the task of the loss function is to minimize the gap (0) between the optimal target infinitely, and the search phase is very time-consuming and resource-consuming, so that the cost of searching in the oversized parameter space is greatly saved by means of Bayesian optimization search. It is stated that the sampling efficiency of bayesian optimized searching is 100 times that of random searching. In the present example, in a preliminary trial (finding the best average sensitivity task), the bayesian optimized search (BO) took 262s in the 200-step evaluation and 840s in the 500-step evaluation, which was faster than the Random Search (RS). Meanwhile, the average sensitivity difference between BO and RS is less than 0.3%.
As an alternative embodiment, the clustering algorithm is an aggregate clustering algorithm, a K-means clustering algorithm, a spectral clustering algorithm, or a density-based noisy applied spatial clustering algorithm.
As an optional implementation manner, when the target video includes N Zhang Shipin frames and the preset cluster number is K, the calculation formula of the kalina index is:
Figure BDA0004142510100000093
/>
Figure BDA0004142510100000094
Figure BDA0004142510100000095
wherein CH is a kaline index; BGSS is an intra-cluster spacing index; WGSS is an inter-cluster spacing index; k is the serial number of the clustered video frame set; n is n k The number of video frames in the video frame set for the kth cluster; c (C) k Is the centroid of the kth cluster video frame set; c is the mass center of all the clustered video frame sets; WGSS (Wireless telecommunication System) k The distances from all video frames to the centroid are concentrated for the kth cluster of video frames; i is the sequence number of the video frame in the k cluster video frame set; x is X ik Is the ith video frame in the kth cluster of video frames set.
Example 2
In order to implement the method for determining a video valid frame in embodiment 1, this embodiment provides a system for determining a video valid frame, including:
the target video acquisition module is used for acquiring all video frames of the target video; the target video is a video containing valid frames to be determined.
And the feature processing module is used for sequentially carrying out feature extraction and feature dimension reduction on all the video frames to obtain a dimension reduction feature matrix of the target video.
The clustering module is used for clustering all video frames based on a kaline index, a clustering algorithm, a preset cluster number range and a dimension reduction feature matrix to obtain a clustered video frame set to be screened; the clustered video frame sets to be screened comprise a plurality of groups of video frame sets to be screened.
And the effective frame determining module is used for determining all effective frames of the target video based on the clustered video frame set to be screened.
Example 3
An electronic device, comprising:
one or more processors.
A storage device having one or more programs stored thereon.
The one or more programs, when executed by the one or more processors, cause the one or more processors to implement a method of determining a video active frame as in embodiment 1.
Example 4
A storage medium having stored thereon a computer program which, when executed by a processor, implements a method of determining a video valid frame as in embodiment 1.
In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the system disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.
The principles and embodiments of the present invention have been described herein with reference to specific examples, the description of which is intended only to assist in understanding the methods of the present invention and the core ideas thereof; also, it is within the scope of the present invention to be modified by those of ordinary skill in the art in light of the present teachings. In view of the foregoing, this description should not be construed as limiting the invention.

Claims (9)

1. A method for determining a valid frame of a video, the method comprising:
acquiring all video frames of a target video; the target video is a video containing effective frames to be determined;
feature extraction and feature dimension reduction are sequentially carried out on all the video frames, so that a dimension reduction feature matrix of the target video is obtained;
clustering all the video frames based on a kaline index, a clustering algorithm, a preset cluster number range and the dimension reduction feature matrix to obtain a clustered video frame set to be screened; the clustered video frame sets to be screened comprise a plurality of groups of video frame sets to be screened;
and determining all valid frames of the target video based on the clustered video frame set to be screened.
2. The method for determining a valid video frame according to claim 1, wherein feature extraction and feature dimension reduction are sequentially performed on all the video frames to obtain a dimension reduction feature matrix of the target video, and the method specifically comprises:
extracting features of each video frame to obtain an initial feature matrix of the target video;
and performing feature dimension reduction on the initial feature matrix to obtain the dimension reduction feature matrix.
3. The method for determining video valid frames according to claim 1, wherein all the video frames are clustered based on a kalina index, a clustering algorithm, a preset cluster number range and the dimension-reduction feature matrix to obtain a clustered video frame set to be screened, comprising the following steps:
taking each preset cluster number in the preset cluster number range as a cluster number, and clustering all the video frames by using the clustering algorithm and the dimension reduction feature matrix to obtain a clustered video frame set corresponding to each preset cluster number;
respectively calculating the kaline index of the clustered video frame set corresponding to each preset cluster number;
and determining the clustered video frame set with the maximum kalina index as the clustered video frame set to be screened.
4. The method for determining video valid frames according to claim 1, wherein determining all valid frames of the target video based on the clustered video frame set to be filtered specifically comprises:
judging whether a preset number of video frames in the current video frame set to be screened are valid frames or not;
if yes, all video frames in the current video frame set to be screened are determined to be effective frames.
5. The method of claim 1, wherein the clustering algorithm is an aggregate clustering algorithm, a K-means clustering algorithm, a spectral clustering algorithm, or a density-based noisy applied spatial clustering algorithm.
6. The method for determining a valid frame of a video according to claim 1, wherein when the target video includes N Zhang Shipin frames and the preset cluster number is K, the calculation formula of the kalina index is:
Figure FDA0004142510080000021
Figure FDA0004142510080000022
Figure FDA0004142510080000023
wherein CH is a kaline index; BGSS is an intra-cluster spacing index; WGSS is an inter-cluster spacing index; k is the serial number of the clustered video frame set; n is n k The number of video frames in the video frame set for the kth cluster; c (C) k Is the centroid of the kth cluster video frame set; c is the mass center of all the clustered video frame sets; WGSS (Wireless telecommunication System) k The distances from all video frames to centroid are concentrated for the kth cluster of video framesThe method comprises the steps of carrying out a first treatment on the surface of the i is the sequence number of the video frame in the k cluster video frame set; x is X ik Is the ith video frame in the kth cluster of video frames set.
7. A system for determining a valid frame of a video, the system comprising:
the target video acquisition module is used for acquiring all video frames of the target video; the target video is a video containing effective frames to be determined;
the feature processing module is used for sequentially carrying out feature extraction and feature dimension reduction on all the video frames to obtain a dimension reduction feature matrix of the target video;
the clustering module is used for clustering all the video frames based on a kaline index, a clustering algorithm, a preset cluster number range and the dimension reduction feature matrix to obtain a clustered video frame set to be screened; the clustered video frame sets to be screened comprise a plurality of groups of video frame sets to be screened;
and the effective frame determining module is used for determining all effective frames of the target video based on the clustered video frame set to be screened.
8. An electronic device, comprising:
one or more processors;
a storage device having one or more programs stored thereon;
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of determining video active frames of any of claims 1 to 6.
9. A storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the method of determining a video active frame according to any one of claims 1 to 6.
CN202310293915.5A 2023-03-23 2023-03-23 Method, system, electronic equipment and storage medium for determining video effective frames Pending CN116229330A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310293915.5A CN116229330A (en) 2023-03-23 2023-03-23 Method, system, electronic equipment and storage medium for determining video effective frames

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310293915.5A CN116229330A (en) 2023-03-23 2023-03-23 Method, system, electronic equipment and storage medium for determining video effective frames

Publications (1)

Publication Number Publication Date
CN116229330A true CN116229330A (en) 2023-06-06

Family

ID=86589216

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310293915.5A Pending CN116229330A (en) 2023-03-23 2023-03-23 Method, system, electronic equipment and storage medium for determining video effective frames

Country Status (1)

Country Link
CN (1) CN116229330A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117911956A (en) * 2024-03-19 2024-04-19 洋县阿拉丁生物工程有限责任公司 Dynamic monitoring method and system for processing environment of food processing equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117911956A (en) * 2024-03-19 2024-04-19 洋县阿拉丁生物工程有限责任公司 Dynamic monitoring method and system for processing environment of food processing equipment
CN117911956B (en) * 2024-03-19 2024-05-31 洋县阿拉丁生物工程有限责任公司 Dynamic monitoring method and system for processing environment of food processing equipment

Similar Documents

Publication Publication Date Title
CN112308158B (en) Multi-source field self-adaptive model and method based on partial feature alignment
CN106228185B (en) A kind of general image classifying and identifying system neural network based and method
CN106803247B (en) Microangioma image identification method based on multistage screening convolutional neural network
WO2021238455A1 (en) Data processing method and device, and computer-readable storage medium
CN109671102B (en) Comprehensive target tracking method based on depth feature fusion convolutional neural network
CN106295124B (en) The method of a variety of image detecting technique comprehensive analysis gene subgraph likelihood probability amounts
CN109299664B (en) Reordering method for pedestrian re-identification
CN110751027B (en) Pedestrian re-identification method based on deep multi-instance learning
CN110097060A (en) A kind of opener recognition methods towards trunk image
CN106682681A (en) Recognition algorithm automatic improvement method based on relevance feedback
CN111179216A (en) Crop disease identification method based on image processing and convolutional neural network
CN104699781B (en) SAR image search method based on double-deck anchor figure hash
CN114694178A (en) Method and system for monitoring safety helmet in power operation based on fast-RCNN algorithm
WO2015146113A1 (en) Identification dictionary learning system, identification dictionary learning method, and recording medium
CN110580510A (en) clustering result evaluation method and system
CN116229330A (en) Method, system, electronic equipment and storage medium for determining video effective frames
CN116310466A (en) Small sample image classification method based on local irrelevant area screening graph neural network
CN111444816A (en) Multi-scale dense pedestrian detection method based on fast RCNN
CN112818148B (en) Visual retrieval sequencing optimization method and device, electronic equipment and storage medium
CN116664585B (en) Scalp health condition detection method and related device based on deep learning
CN105844299B (en) A kind of image classification method based on bag of words
CN117132910A (en) Vehicle detection method and device for unmanned aerial vehicle and storage medium
CN109376619A (en) A kind of cell detection method
CN115937910A (en) Palm print image identification method based on small sample measurement network
CN111723737B (en) Target detection method based on multi-scale matching strategy deep feature learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination