CN110460840A - Lens boundary detection method based on three-dimensional dense network - Google Patents
Lens boundary detection method based on three-dimensional dense network Download PDFInfo
- Publication number
- CN110460840A CN110460840A CN201910900958.9A CN201910900958A CN110460840A CN 110460840 A CN110460840 A CN 110460840A CN 201910900958 A CN201910900958 A CN 201910900958A CN 110460840 A CN110460840 A CN 110460840A
- Authority
- CN
- China
- Prior art keywords
- layer
- output
- dimensional
- dense network
- convolution
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N17/00—Diagnosis, testing or measuring for television systems or their details
- H04N17/002—Diagnosis, testing or measuring for television systems or their details for television cameras
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Biomedical Technology (AREA)
- General Health & Medical Sciences (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Artificial Intelligence (AREA)
- Biophysics (AREA)
- Signal Processing (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a kind of lens boundary detection method based on three-dimensional dense network, steps are as follows: video is divided into after frame section and is randomly assigned label, then is inputted three-dimensional dense network and completes classification;Three-dimensional dense network includes the Three dimensional convolution layer being linked in sequence, maximum pond layer, four shot boundary detector blocks and linear layer, Three dimensional convolution layer is input layer, linear layer is output layer, shot boundary detector block includes the multiple groups repetitive unit of head and the tail connection, repetitive unit includes bottleneck layer as input and the intensive block by Three dimensional convolution as output, output of the output of upper one group of repetitive unit as next group of repetitive unit, transition zone is connected with after shot boundary detector block, transition zone includes Batch Normalization, RELU, convolution peace pond layer.The present invention improves the space-time characteristic of Three dimensional convolution combination video, carries out feature multiplexing using dense network, not only increases accuracy in detection, also reduce computation complexity.
Description
Technical field
The invention belongs to Video content analysis technique fields, are related to a kind of camera lens side that can be used in video analysis and retrieval
Boundary's detection technique, in particular to a kind of lens boundary detection method based on three-dimensional intensive convolutional network (3D DenseNet).
Background technique
The rapid development of computer and multimedia technology generates multitude of video data.How institute is found in multitude of video
The video retrieval technology of information is needed to have become a hot topic of research problem.The first step of video frequency searching is to extract feature, and it is first to extract feature
First video lens are split, shot boundary detector is exactly a kind of important way of Video segmentation.General camera lens conversion side
Formula is divided into two kinds: gradual change (Gradual) and shear (Shape).Gradual change refers to gradually to change between adjacent camera lens, continues ten
Several or tens frames;Shear refers to that next camera lens occurs at once after a upper camera lens.Shot boundary detector is wide at present
It is general to be applied to the relevant industries such as DTV, traffic monitoring, electronic police, bank monitoring, business information management and national security.
Business application can bring huge economic interests, and the application of national security can safeguard the stability and development of society.
Common lens boundary detection method has histogram method, threshold method, mutual information method, support vector machines method and depth
Habit method etc..Those skilled in the art have done many research work for above method."Fast Video Shot Boundary
Detection Based on SVD and Pattern Matching》(International Workshop on
Systems.IEEE, 2007.) domain the HSV color histogram for extracting video frame is proposed as feature, come using singular value decomposition
Color histogram is described, computation complexity is lower, improves the speed of detection, but detection accuracy is undesirable;
《Information theory-based shot cut/fade detection and video summarization》
(Transactions on Circuits&Systems for Video Technology, 2005,16 (1): 82-91.) use
The method of mutual information and combination entropy describes video frame-to-frame coherence, and the relationship of the similitude and global threshold that compare consecutive frame is found
Camera lens, the method does not account for the variation of local content so that accuracy rate is affected;"Shot Boundary
Detection by a Hierarchical Supervised Approach》(International Workshop on
Systems.IEEE, 2007.) using support vector machines as a classifier differentiation shot boundary and non-shot boundary, effect
It is unsatisfactory;"Learning Spatiotemporal Features with 3D Convolutional Networks"
(International Conference on Computer Vision (ICCV), 2015,4489-4497.) proposes 3D volumes
Product network is more suitable for learning in extensive sets of video data, is easy to training and uses;"Large-scale,Fast and
Accurate Shot Boundary Detection through Spatio-temporal Convolutional Neural
Networks " (arXiv preprint arXiv:1705.03281,2017.) by a C3D network, with regular length
Duan Zuowei input, and gradual change, shear and constant three classes are classified as, it is effective in the task that this method demonstrates ConvNet
Property, but when the gradual change of processing different scale, shot boundary can not be positioned;"Ridiculously Fast Shot Boundary
Detection with Fully Convolutional Neural Networks》(arXiv preprint arXiv:
1705.08214,2017.) full convolutional network is used, it divides positive label in transition using entire video sequence as input
Dispensing frame, thus detector lens boundary, but there is no solve the different orientation problem of scale;"Fast Video Shot
Transition Localization with Deep Structured Models》(arXiv preprint arXiv:
1808.04234,2018.) detection framework for constructing initial filter, change detecte and gradual transition detection three parts composition, uses
The cascade of C3D ConvNet and ResNet-18 network improves real-time speed, but network layer deepens the redundancy occurred
The problems such as do not solve.
The convolutional neural networks of deep learning better understood when the high-layer semantic information of image, be used for video mirror
Head border detection can obtain good testing result.Current feature extraction network mainly uses 2D convolution, commonly used to pair
Image is handled, and timing information can be ignored when analyzing video, causes the loss of inter-frame information, although with net
The intensification of network model depth uses 3D convolution to carry out feature extraction, and detection effect can be better, but network layer intensification will lead to
The problems such as computationally intensive and efficiency reduces.
Therefore, the great reality meaning of lens boundary detection method that a kind of calculation amount is small, high-efficient and detection effect is good is developed
Justice.
Summary of the invention
Prior art detection effect is bad, computationally intensive and inefficiency defect it is an object of the invention to overcoming, and mentions
The good lens boundary detection method of small, high-efficient and detection effect for a kind of calculation amount.
To achieve the above object, the invention provides the following technical scheme:
Based on the lens boundary detection method of three-dimensional dense network, steps are as follows:
(1) after video being divided into frame section, it is randomly assigned label;
(2) the frame section for distributing label is inputted into training in three-dimensional dense network, the frame section that output category is completed;
The three-dimensional dense network (3D DenseNet) includes the Three dimensional convolution layer (Conv3D) being linked in sequence, maximum pond
Change layer (MaxPooling), four shot boundary detector blocks (SBD Block) and linear layer (Linear exports 3 category features), three
Dimension convolutional layer is input layer, and linear layer is output layer, and the shot boundary detector block (SBD Block) includes the more of head and the tail connection
Group repetitive unit, repetitive unit include bottleneck layer (Bottleneck) as input and as output by Three dimensional convolution
Intensive block (Dense Block), output of the output of upper one group of repetitive unit as next group of repetitive unit, using by 3D convolution
Replace the original 2D convolution of DenseBlock, Three dimensional convolution is used to the space-time characteristic in conjunction with video, improves the accuracy rate of detection, often
It is connected with after a shot boundary detector block transition zone (Transition), the transition zone (Transition) includes Batch
Normalization (crowd normalization, BN), RELU (activation primitive), one 1 × 1 convolution sum 2 × 2 average pond layer
(AvgPooling)。
Traditional feature extraction Web vector graphic 2D convolution commonly used to handle image, but is worked as and is divided video
Timing information can be ignored when analysis, cause the loss of inter-frame information.Present invention employs the 3D convolution expanded by 2D convolution to carry out spy
Sign is extracted, and be joined time dimension, can directly be extracted to the time of video and spatial information, and the movement letter of video is captured
Breath.2D convolution is directed to single channel, and the channel of input picture is 1, and the size of input is (1, height, weight), convolution kernel ruler
Very little is (1, k_h, k_w), and convolution kernel carries out sliding window operation on the Spatial Dimension of input picture, each sliding window and (k_h, k_w)
Value in window carries out convolution operation, obtains a value in output image.For multichannel, it is assumed that the channel of input picture is
3, the size of input is (3, height, weight), and convolution kernel is having a size of (3, k_h, k_w), and convolution kernel is over an input image
Sliding window operation is carried out on Spatial Dimension, all values in (k_h, k_w) window in each sliding window and 3 channels carry out convolution behaviour
Make, obtains a value of output image.3D convolution is equally divided into single channel and multichannel.The wherein difference of single channel and 2D convolution
Place is that the size inputted is (1, time, height, weight), more temporal informations.Convolution kernel also increases one
A k_t dimension, therefore convolution kernel carries out sliding window operation on the Spatial Dimension and time dimension of input video.The 3D of multichannel
Convolution is also that joined temporal information in the input of the 2D convolution of multichannel, in each sliding window and 3 channels (k_t, k_h,
K_w) all values in window carry out convolution operation, the value exported.Due to considering time and the space letter of video
Breath, feature extraction of the 3D convolution ratio 2D convolution more suitable for video.Compared with 2D convolution, 3D convolution can pass through 3D convolution sum 3D
Pondization operation preferably models space-time characteristic, is more suitable for classification task.But with the intensification of convolutional neural networks, gradient disappear and
The problem of model degradation, is more serious.
DenseNet utilizes its unique intensive connection in the present invention, ensure that the biography of gradient while obtaining depth
Broadcast, largely alleviate presently, there are these problems.In traditional convolutional neural networks, input layer rearward needs
It when using the output feature of forward certain layer, is extracted again again after needing convolution, and utilizes the intensive connection in DenseNet, be
It does not need to extract again, can directly give subsequent layer and use, which greatly reduces the usage amount of parameter and calculation amounts.
DenseNet of the invention is mainly by Dense Block (intensive block) and Transition (transition zone) two parts group
At.In Dense Block, each layer characteristic pattern size is consistent, can connect on channel dimension.It is different from other networks
It is to export l characteristic pattern in all Dense Block after each layer convolution, the channel of obtained characteristic pattern is l, that is, is used
L convolution kernel, that is, growth rate (growth rate) in DenseNet.Transition layers are for connection to two
Adjacent Dense Block, reduces the size of characteristic pattern, plays the role of compact model, prevent over-fitting.Transition layers
Contain the characteristic pattern from different layers, the unique design of Transition layer of the invention (including average pond layer and convolution
Parameter designing), realize feature reuse, improve efficiency.It is mentioned in addition, SBD Block directly can carry out feature to video
It takes, reduces the loss of video features.
The present invention uses the space-time characteristic of Three dimensional convolution combination video, carries out feature multiplexing using dense network, reduces net
The parameter amount of network, can not only accurate detector lens boundary, and also reduce computation complexity.
As a preferred technical scheme:
Lens boundary detection method as described above based on three-dimensional dense network, the bottleneck layer includes Batch
Normalization (batch normalization, BN), RELU (activation primitive) and one 1 × 1 × 1 convolution, 1 × 1 × 1 convolution are used
To reduce feature quantity, raising computational efficiency.With the increase of the number of plies, the input of Dense Block can be very more, therefore use
Bottleneck reduces calculation amount, can be compressed to greatest extent to model with Transition cooperation.Although
The intensive connection of DenseNet can make full use of all information of front layer, can be big but if not controlling the size of convolution kernel
Increase the complexity of computation system.Therefore, it is carried out using feature of the Bottleneck of 1 × 1 × 1 convolution to different channels linear
Combination achievees the purpose that dimensionality reduction, and the frame section completed in the last layer output category.
Lens boundary detection method as described above based on three-dimensional dense network, the frame segment mark label share three classes, tool
Body is gradual change, shear and constant.
Lens boundary detection method as described above based on three-dimensional dense network, the frame section that the classification is completed need to also be into
Row processing can just obtain final three classes frame section, specific steps are as follows:
(i) merge the frame section with same label in the frame section that classification is completed;
(ii) secondary detection is carried out to the frame section that label is, detects every section of first frame to the color histogram of tail frame, surveys
Pasteur's distance between histogram is measured, apart from sufficiently small, regards as being constant section, detects color histogram and Ba Shi distance
OpenCV, processing speed block can be used in determination;
(iii) merge step (ii) treated frame section, export final three classes frame section.
Lens boundary detection method as described above based on three-dimensional dense network, it is described to refer to apart from sufficiently small apart from small
In equal to 0.5.
Lens boundary detection method as described above based on three-dimensional dense network, the model ginseng of the three-dimensional dense network
Number are as follows:
The size of the Feature Mapping figure of the input of Three dimensional convolution layer is 8 × 3 × 16 × 128 × 128;
The size of the Feature Mapping figure of the output of Three dimensional convolution layer is 8 × 64 × 16 × 64 × 64;
The size of the Feature Mapping figure of the output of maximum pond layer is 8 × 64 × 8 × 32 × 32;
The size of the Feature Mapping figure of the output of first shot boundary detector block is 8 × 32 × 8 × 32 × 32;
The size of the Feature Mapping figure of the output of second shot boundary detector block is 8 × 32 × 4 × 16 × 16;
The size of the Feature Mapping figure of the output of third shot boundary detector block is 8 × 32 × 2 × 8 × 8;
The size of the Feature Mapping figure of the output of 4th shot boundary detector block is 8 × 32 × 1 × 4 × 4;
The size of the Feature Mapping figure of the output of linear layer is 1 × 3.
Its parameter is as shown in table 1 below, and BRA refers to Batch Normalization, RELU and AvgPooling in table 1
Abbreviation:
Table 1
Layer | Kernel | Feature map | Followed by |
Input | - | 8×3×16×128×128 | - |
Conv3D | 7×7×7 | 8×64×16×64×64 | BN、RELU |
Maxpooling | 3×3×3 | 8×64×8×32×32 | BN、RELU |
SBD Block 1 | - | 8×32×8×32×32 | BRA |
SBD Block 2 | - | 8×32×4×16×16 | BRA |
SBD Block 3 | - | 8×32×2×8×8 | BRA |
SBD Block 4 | - | 8×32×1×4×4 | BRA |
Linear | - | 1×3 | - |
Parameter of the invention is not limited to that, only enumerates a kind of feasible parameter, those skilled in the art personnel herein
The above parameter can be set according to actual needs.
Lens boundary detection method as described above based on three-dimensional dense network, it is described by video be divided into frame section refer to by
Video be divided into every 16 frame of segment length, be laminated in 8 frame section.Protection scope of the present invention is not limited to that, those skilled in the art
It can be arranged according to actual needs and be classified as that size is suitable and the suitable frame section of degree of overlapping.
Lens boundary detection method as described above based on three-dimensional dense network, the transition zone put in order as
Under: Batch Normalization → RELU → convolution → average pond layer.
Lens boundary detection method as described above based on three-dimensional dense network, the bottleneck layer put in order as
Under: Batch Normalization → RELU → convolution.
The utility model has the advantages that
Lens boundary detection method based on three-dimensional dense network of the invention, using the space-time of Three dimensional convolution combination video
Feature carries out feature multiplexing using dense network, reduces the parameter amount of network, can not only accurate detector lens boundary (inspection
It is high to survey accuracy), and also reduce computation complexity (it is small calculating force request, low for equipment requirements), great application prospect.
Detailed description of the invention
Fig. 1 is the network structure of three-dimensional dense network of the invention;
Fig. 2 is the schematic diagram of shot boundary detector block (SBD Block) of the invention;
Fig. 3 is the flow chart that the present invention carries out shot boundary detector;
Fig. 4 is the mean accuracy histogram comparison diagram of embodiment 1 and comparative example 1~2;
Fig. 5 is the network parameter amount broken line comparison diagram of embodiment 1 and comparative example 1~2;
Fig. 6 is the comparison diagram of the training speed of embodiment 1 and comparative example 1~2.
Specific embodiment
With reference to the accompanying drawing, a specific embodiment of the invention is further elaborated.
Embodiment 1
A kind of lens boundary detection method based on three-dimensional dense network, step are as shown in Figure 3:
(1) after being divided into every 16 frame of segment length by video, be laminated in 8 frame section, it is randomly assigned label, frame segment mark label share three
Class, specially gradual change, shear and constant;
(2) the frame section for distributing label is inputted into three-dimensional dense network, the frame section that output category is completed;
(3) merge the frame section with same label in the frame section that classification is completed;
(4) secondary detection is carried out to the frame section that label is, detects color histogram of the every section of first frame to tail frame, measurement
Pasteur's distance between histogram, distance are less than or equal to 0.5 and regard as being constant section;
(5) the frame section that merges that step (4) treated, exports final three classes frame section;
Three-dimensional dense network is as shown in Figure 1, include the Three dimensional convolution layer being linked in sequence, maximum pond layer, four camera lens sides
Boundary's detection block and linear layer, Three dimensional convolution layer are input layer, and linear layer is output layer, and shot boundary detector block is as shown in Fig. 2, packet
The multiple groups repetitive unit of head and the tail connection is included, repetitive unit includes bottleneck layer as input and passes through Three dimensional convolution as output
Intensive block, output of the output as next group of repetitive unit of upper one group of repetitive unit, bottleneck layer includes sequential connection
Batch Normalization, RELU and one 1 × 1 × 1 convolution are connected with transition zone after each shot boundary detector block,
Transition zone include Batch Normalization being linked in sequence, RELU, one 1 × 1 convolution sum 2 × 2 average pond
Layer, this example have chosen cross entropy loss function as loss function, make prediction data distribution closer in the distribution of truthful data.
Compressed coefficient θ is set as 0.5, growth rate and is set as 32, and learning rate is set as 0.001;
Wherein, the model parameter of three-dimensional dense network are as follows:
The size of the Feature Mapping figure of the input of Three dimensional convolution layer is 8 × 3 × 16 × 128 × 128;
The size of the Feature Mapping figure of the output of Three dimensional convolution layer is 8 × 64 × 16 × 64 × 64;
The size of the Feature Mapping figure of the output of maximum pond layer is 8 × 64 × 8 × 32 × 32;
The size of the Feature Mapping figure of the output of first shot boundary detector block is 8 × 32 × 8 × 32 × 32;
The size of the Feature Mapping figure of the output of second shot boundary detector block is 8 × 32 × 4 × 16 × 16;
The size of the Feature Mapping figure of the output of third shot boundary detector block is 8 × 32 × 2 × 8 × 8;
The size of the Feature Mapping figure of the output of 4th shot boundary detector block is 8 × 32 × 1 × 4 × 4;
The size of the Feature Mapping figure of the output of linear layer is 1 × 3.
Test environment of the invention are as follows: test video card GTX1080Ti, memory 16G, operating system Linux programming software
Python.The present invention has selected three kinds of data sets (UCF101_SBD data set, TRECVID data set and ClipShots data
Collection) it is tested.
For objective measure detection effect, detection accuracy P (Precision), recall rate R (Recall) and synthesis have been counted
Evaluation index F1.Detection accuracy is that the correct camera lens number detected accounts for all ratios for detecting camera lens, and recall rate indicates detection
Correct camera lens number out accounts for the ratio of camera lens sum, comprehensive evaluation index F1It is the comprehensive performance to detection accuracy and recall rate
Evaluation.
Wherein, TP indicates that all correct samples predicted, FP indicate all error samples predicted, and FN indicates not pre-
The correct sample measured.
Comparative example 1
A kind of lens boundary detection method uses document " Large-scale, Fast and Accurate Shot
Boundary Detection through Spatio-temporal Convolutional Neural Networks " it records
C3D ConvNet method three test sets same as Example 1 are tested.
Comparative example 2
A kind of lens boundary detection method uses document " Fast Video Shot Transition
Localization with Deep Structured Models " record C3D+ResNet method to same as Example 1
Three test sets tested.
The test result of embodiment 1 and comparative example 1~2 is compared as shown in table 2~4:
Table 2
Experimental result on UCF101_SBD data set
Table 3
Experimental result on TRECVID data set
Table 4
Experimental result on ClipShots data set
It is of the invention based on three-dimensional by table 2 and 3 it is found that in the test process of UCF101_SBD and TRECVID data set
The detection accuracy ratio C3D ConvNet method (comparative example 1) of the lens boundary detection method of dense network and the side C3D+ResNet
Method (comparative example 2) is higher;By the experimental result in table 4 it is found that in the case where detection data is more huge, method of the invention
Detection accuracy will be high than other two methods, illustrates that method of the invention also has good detection effect on large data sets
Fruit, Fig. 4 are mean accuracy histogram comparison diagram of the 3 class methods on 3 data sets, and C3D ConvNet is to compare in Fig. 4~6
Example 1, C3D+ResNet are comparative example 2, and Ours is embodiment 1, be also clear that method of the invention in shear and
The method that the mean accuracy of gradual change is better than other two kinds of prior arts;
The number of the network parameter of three kinds of methods of embodiment 1 and comparative example 1~2 is as shown in table 5, and Fig. 5 changes parameter amount
It is counted as the size that M is unit to be compared, it can be seen that the lens boundary detection method of the invention based on three-dimensional dense network
Used parameter amount is far smaller than other two methods, and the video memory space occupied when calculating is greatly saved in this, as shown in Figure 6
Our method training time in the case where identical cycle of training is most short, illustrates the mirror of the invention based on three-dimensional dense network
Head boundary detection method has lower computation complexity, is better than other algorithms, can significantly reduce the requirement to equipment, same to time
The time is about handled, working efficiency, great application prospect are accelerated.
Table 5
The number of network parameter
Methods | Parameters |
Comparative example 1 | 219687875 |
Comparative example 2 | 33205443 |
Embodiment 1 | 4234771 |
Although specific embodiments of the present invention have been described above, it should be appreciated by those skilled in the art these
It is merely illustrative of, under the premise of without prejudice to the principle and substance of the present invention, a variety of changes can be made to these embodiments
More or modify.
Claims (9)
1. the lens boundary detection method based on three-dimensional dense network, which is characterized in that steps are as follows:
(1) after video being divided into frame section, it is randomly assigned label;
(2) the frame section for distributing label is inputted into three-dimensional dense network, the frame section that output category is completed;
The three-dimensional dense network include the Three dimensional convolution layer being linked in sequence, maximum pond layer, four shot boundary detector blocks and
Linear layer, Three dimensional convolution layer are input layer, and linear layer is output layer, and the shot boundary detector block includes the multiple groups of head and the tail connection
Repetitive unit, repetitive unit include bottleneck layer as input and the intensive block by Three dimensional convolution as output, and upper one group
Output of the output as next group of repetitive unit of repetitive unit is connected with transition zone after each shot boundary detector block, described
Transition zone include Batch Normalization, RELU, one 1 × 1 convolution sum 2 × 2 average pond layer.
2. the lens boundary detection method according to claim 1 based on three-dimensional dense network, which is characterized in that the bottle
Neck layer includes the convolution of Batch Normalization, RELU and one 1 × 1 × 1.
3. the lens boundary detection method according to claim 1 based on three-dimensional dense network, which is characterized in that the frame
Segment mark label share three classes, specially gradual change, shear and constant.
4. the lens boundary detection method according to claim 3 based on three-dimensional dense network, which is characterized in that described point
The frame section that class is completed, which also needs to be handled just, can obtain final three classes frame section, specific steps are as follows:
(i) merge the frame section with same label in the frame section that classification is completed;
(ii) secondary detection is carried out to the frame section that label is, detects the color histogram of every section of first frame to tail frame, measurement is straight
Pasteur's distance between square figure is regarded as being constant section apart from sufficiently small;
(iii) merge step (ii) treated frame section, export final three classes frame section.
5. the lens boundary detection method according to claim 4 based on three-dimensional dense network, which is characterized in that it is described away from
It is less than or equal to 0.5 with a distance from sufficiently small refer to.
6. the lens boundary detection method according to claim 1 based on three-dimensional dense network, which is characterized in that described three
Tie up the model parameter of dense network are as follows:
The size of the Feature Mapping figure of the input of Three dimensional convolution layer is 8 × 3 × 16 × 128 × 128;
The size of the Feature Mapping figure of the output of Three dimensional convolution layer is 8 × 64 × 16 × 64 × 64;
The size of the Feature Mapping figure of the output of maximum pond layer is 8 × 64 × 8 × 32 × 32;
The size of the Feature Mapping figure of the output of first shot boundary detector block is 8 × 32 × 8 × 32 × 32;
The size of the Feature Mapping figure of the output of second shot boundary detector block is 8 × 32 × 4 × 16 × 16;
The size of the Feature Mapping figure of the output of third shot boundary detector block is 8 × 32 × 2 × 8 × 8;
The size of the Feature Mapping figure of the output of 4th shot boundary detector block is 8 × 32 × 1 × 4 × 4;
The size of the Feature Mapping figure of the output of linear layer is 1 × 3.
7. the lens boundary detection method according to claim 1 based on three-dimensional dense network, which is characterized in that described to incite somebody to action
Video be divided into frame section refer to by video be divided into every 16 frame of segment length, be laminated in 8 frame section.
8. the lens boundary detection method according to claim 1 based on three-dimensional dense network, which is characterized in that the mistake
It is as follows to cross putting in order for layer: Batch Normalization → RELU → convolution → average pond layer.
9. the lens boundary detection method according to claim 2 based on three-dimensional dense network, which is characterized in that the bottle
Putting in order for neck layer is as follows: Batch Normalization → RELU → convolution.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910900958.9A CN110460840B (en) | 2019-09-23 | 2019-09-23 | Shot boundary detection method based on three-dimensional dense network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910900958.9A CN110460840B (en) | 2019-09-23 | 2019-09-23 | Shot boundary detection method based on three-dimensional dense network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110460840A true CN110460840A (en) | 2019-11-15 |
CN110460840B CN110460840B (en) | 2020-06-26 |
Family
ID=68492588
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910900958.9A Active CN110460840B (en) | 2019-09-23 | 2019-09-23 | Shot boundary detection method based on three-dimensional dense network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110460840B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110942105A (en) * | 2019-12-13 | 2020-03-31 | 东华大学 | Mixed pooling method based on maximum pooling and average pooling |
CN114037874A (en) * | 2021-11-12 | 2022-02-11 | 中国科学院深圳先进技术研究院 | Three-dimensional image classification network and method and image processing equipment |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2003077198A1 (en) * | 2002-03-13 | 2003-09-18 | Vision Optic Co., Ltd. | System and method for registering spectacles image |
CN102800095A (en) * | 2012-07-17 | 2012-11-28 | 南京特雷多信息科技有限公司 | Lens boundary detection method |
CN102982553A (en) * | 2012-12-21 | 2013-03-20 | 天津工业大学 | Shot boundary detecting method |
CN106327513A (en) * | 2016-08-15 | 2017-01-11 | 上海交通大学 | Lens boundary detection method based on convolution neural network |
-
2019
- 2019-09-23 CN CN201910900958.9A patent/CN110460840B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2003077198A1 (en) * | 2002-03-13 | 2003-09-18 | Vision Optic Co., Ltd. | System and method for registering spectacles image |
JP2003271946A (en) * | 2002-03-13 | 2003-09-26 | Vision Megane:Kk | System and method for registering spectacle image |
CN102800095A (en) * | 2012-07-17 | 2012-11-28 | 南京特雷多信息科技有限公司 | Lens boundary detection method |
CN102982553A (en) * | 2012-12-21 | 2013-03-20 | 天津工业大学 | Shot boundary detecting method |
CN106327513A (en) * | 2016-08-15 | 2017-01-11 | 上海交通大学 | Lens boundary detection method based on convolution neural network |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110942105A (en) * | 2019-12-13 | 2020-03-31 | 东华大学 | Mixed pooling method based on maximum pooling and average pooling |
CN110942105B (en) * | 2019-12-13 | 2022-09-16 | 东华大学 | Mixed pooling method based on maximum pooling and average pooling |
CN114037874A (en) * | 2021-11-12 | 2022-02-11 | 中国科学院深圳先进技术研究院 | Three-dimensional image classification network and method and image processing equipment |
CN114037874B (en) * | 2021-11-12 | 2024-07-02 | 中国科学院深圳先进技术研究院 | Three-dimensional image classification network device, method and image processing equipment |
Also Published As
Publication number | Publication date |
---|---|
CN110460840B (en) | 2020-06-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103347167B (en) | A kind of monitor video content based on segmentation describes method | |
CN104866616B (en) | Monitor video Target Searching Method | |
CN104504377B (en) | A kind of passenger on public transport degree of crowding identifying system and method | |
CN104952073B (en) | Scene Incision method based on deep learning | |
CN101833664A (en) | Video image character detecting method based on sparse expression | |
CN103336957A (en) | Network coderivative video detection method based on spatial-temporal characteristics | |
CN105701466A (en) | Rapid all angle face tracking method | |
CN107688830B (en) | Generation method of vision information correlation layer for case serial-parallel | |
CN111027377A (en) | Double-flow neural network time sequence action positioning method | |
CN110460840A (en) | Lens boundary detection method based on three-dimensional dense network | |
CN103853794A (en) | Pedestrian retrieval method based on part association | |
CN108268875A (en) | A kind of image meaning automatic marking method and device based on data smoothing | |
CN112801037A (en) | Face tampering detection method based on continuous inter-frame difference | |
CN106022310B (en) | Human body behavior identification method based on HTG-HOG and STG characteristics | |
CN104537392A (en) | Object detection method based on distinguishing semantic component learning | |
CN110490170A (en) | A kind of face candidate frame extracting method | |
CN106127251A (en) | A kind of computer vision methods for describing face characteristic change | |
Wang et al. | A deep learning-based method for vehicle licenseplate recognition in natural scene | |
CN110363164A (en) | Unified method based on LSTM time consistency video analysis | |
CN114926764A (en) | Method and system for detecting remnants in industrial scene | |
CN110162654A (en) | It is a kind of that image retrieval algorithm is surveyed based on fusion feature and showing for search result optimization | |
Prabakaran et al. | Key frame extraction analysis based on optimized convolution neural network (ocnn) using intensity feature selection (ifs) | |
Li et al. | Pedestrian detection method based on multi-scale fusion inception-SSD model | |
CN107122714A (en) | A kind of real-time pedestrian detection method based on edge constraint | |
CN110689520A (en) | Magnetic core product defect detection system and method based on AI |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |