CN116071691A - Video quality evaluation method based on content perception fusion characteristics - Google Patents

Video quality evaluation method based on content perception fusion characteristics Download PDF

Info

Publication number
CN116071691A
CN116071691A CN202310343979.1A CN202310343979A CN116071691A CN 116071691 A CN116071691 A CN 116071691A CN 202310343979 A CN202310343979 A CN 202310343979A CN 116071691 A CN116071691 A CN 116071691A
Authority
CN
China
Prior art keywords
convolution
quality
video
bottleneck
characteristic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310343979.1A
Other languages
Chinese (zh)
Other versions
CN116071691B (en
Inventor
张诗涵
杨瀚
温序铭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Sobey Digital Technology Co Ltd
Original Assignee
Chengdu Sobey Digital Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Sobey Digital Technology Co Ltd filed Critical Chengdu Sobey Digital Technology Co Ltd
Priority to CN202310343979.1A priority Critical patent/CN116071691B/en
Publication of CN116071691A publication Critical patent/CN116071691A/en
Application granted granted Critical
Publication of CN116071691B publication Critical patent/CN116071691B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a video quality evaluation method based on content perception fusion characteristics, which comprises the following steps: step 1, constructing a multidirectional differential second-order differential Gaussian filter characteristic extraction module for extracting input image characteristics; step 2, building a residual feature extraction network model based on a multi-direction differential second-order differential Gaussian filter feature extraction module and a depth convolution neural network, and inputting video frame by frame into the residual feature extraction network model to obtain content perception features of each frame of image; step 3, reducing the dimension of the content perception characteristics, inputting the content perception characteristics into a gate control recurrent neural network GRU, and modeling long-term dependency relationship to obtain quality elements and weights of the video at different moments; and 4, determining the final quality score of the video based on the quality elements and weights at different moments. The video quality evaluation method provided by the invention can realize more accurate video quality evaluation effect by extracting the content perception characteristics of the video.

Description

Video quality evaluation method based on content perception fusion characteristics
Technical Field
The invention relates to the field of computer vision, in particular to a video quality evaluation method based on content perception fusion characteristics.
Background
In recent years, with the widespread use of intelligent devices in human production and life, a huge amount of video materials are generated every day, but due to the limitations of various real environments and hardware device performances, the quality of the video is inevitably lost to different degrees, so that the video cannot be used in an actual application scene, and therefore, quality evaluation of the video is necessary before the video is applied to the actual scene.
The video quality evaluation method commonly used at present is mainly divided into two categories, namely subjective quality evaluation and objective quality evaluation. The subjective quality evaluation is to subjectively score various videos with different quality by people, and the method is direct and simple, but is limited by resources such as limited manpower, time and the like, and the subjective deviation of different people on the same video segment causes no unified scoring standard, so that large-scale practical application cannot be realized.
Objective video quality assessment can be divided into three categories, namely full-reference, half-reference and no-reference, according to whether original lossless video information exists or not. Since the lossless video is not existed in the real application scene with high probability as real contrast, no reference video quality evaluation has become the key point of the current research. With the continuous progress and development of the deep learning technology, the technology is gradually and widely applied to actual life, and for the non-reference video quality evaluation, although some non-reference quality evaluation methods exist at the present stage, a plurality of barriers cannot be broken through: human visual characteristics are not fully considered, a large amount of manual characteristics are required to be extracted by the traditional method, time and labor are wasted, various characteristic information of an image is not fully considered, and the like.
Disclosure of Invention
Aiming at the problems existing in the prior art, the video quality evaluation method based on the content perception fusion characteristics is provided, the content perception characteristics of a video image are obtained through a multi-direction differential second-order differential Gaussian filter characteristic extraction module and a deep convolution neural network, then the quality score is obtained through modeling of a long-term dependency relationship by a gate control recurrent neural network GRU, and the video quality is determined by combining weights.
The technical scheme adopted by the invention is as follows: a video quality evaluation method based on content perception fusion characteristics comprises the following steps:
step 1, constructing a multidirectional differential second-order differential Gaussian filter characteristic extraction module for extracting input image characteristics;
step 2, building a residual feature extraction network model based on a multi-direction differential second-order differential Gaussian filter feature extraction module and a depth convolution neural network, and inputting video frame by frame into the residual feature extraction network model to obtain content perception features of each frame of image;
step 3, reducing the dimension of the content perception characteristics, inputting the content perception characteristics into a gate control recurrent neural network GRU, and modeling long-term dependency relationship to obtain quality elements and weights of the video at different moments;
and 4, determining the final quality score of the video based on the quality elements and weights at different moments.
Further, the substeps of the step 1 are as follows:
step 1.1, constructing a multidirectional differential second-order differential Gaussian kernel and a directional derivative thereof; in construction, the number of directions is preferably 8;
and 1.2, performing convolution operation on the input image and the multi-direction second-order differential Gaussian directional derivative to finish characteristic information extraction.
Further, the substep of the step 2 is as follows:
step 2.1, frame-by-frame splitting is carried out on an input video to obtain T RGB three-channel color images;
step 2.2, uniformly scaling the obtained image to 224 pixels by 224 pixels;
step 2.3, outputting the image obtained in the step 2.2 through a 2D convolution layer to obtain the image characteristics with the dimension of 112 multiplied by 64;
2.4, inputting the image obtained in the step 2.2 into a multi-directional differential second-order differential Gaussian filter characteristic extraction module for characteristic extraction, fusing the extracted characteristic with the characteristic output in the step 2.3, wherein the dimension of the fused characteristic is 112 multiplied by 72, and recovering the channel number to 64 dimensions by convolution operation on the fused characteristic;
step 2.5, the 64-dimensional fusion features are sent to a maximum pooling layer, and the dimensions of the output features are 56 multiplied by 64;
step 2.6, establishing a Bottleneck convolution structure, and inputting the output characteristic in the step 2.5 into the Bottleneck convolution structure output characteristic W t Feature W t The method comprises a plurality of feature graphs, wherein T is 1-T;
step 2.7, feature W t And carrying out space global pooling on each feature map, and obtaining content perception features in the feature maps through the combined operation of space global average pooling and space global standard deviation pooling.
Further, in the step 2.5, the maximum pooling layer and size are 3×3, the step size is 2, and the filling dimension is 1.
Further, the specific process of establishing the Bottleneck convolution structure in the step 2.6 is as follows:
step 2.6.1, setting a 2D convolution layer Conv_2D_2, wherein the number of convolution kernels is C 1 The convolution kernel size is 1×1, the step size is 1, and the filling dimension is 0;
step 2.6.2, setting a 2D convolution layer Conv_2D_3, wherein the number of convolution kernels is C 1 The convolution kernel size is 7×7, the step size is 1, and the filling dimension is 1;
step 2.6.3, setting 2D convolution layer Conv_2D_4 with number of convolution kernels C 2 The convolution kernel size is 1×1, the step size is 1, and the filling dimension is 0;
step 2.6.4, sequentially connecting the 2D convolution layers Conv_2D_2, conv_2D_3 and Conv_2D_4 to obtain a convolution module named as a Bottleneck-A structure;
step 2.6.5 convolution kernels of three 2D convolution layers in the Bottleneck-A StructureThe number is respectively set as 2C 1 、2C 1 、2C 2 Obtaining a Bottleneck-B structure; similarly, the number of convolution kernels is set to 4C 1 、4C 1 、4C 2 and 8C1 、8C 1 、8C 2 Obtaining a Bottleneck-C structure and Bottleneck-D;
step 2.6.6, connecting 3 Bottleneck-A structures, 4 Bottleneck-B structures, 6 Bottleneck-C structures and 3 Bottleneck-D structures in sequence to obtain Bottleneck convolution structures.
Further, the substep of the step 3 is as follows:
step 3.1, performing dimension reduction on the content perception feature through the full connection layer FC_1 to obtain a dimension reduction feature;
step 3.2, the dimension reduction characteristics are sent into a gate control recurrent neural network GRU which can integrate and adjust and learn long-term dependency;
step 3.3, calculating the hidden layer state at the time t by taking the hidden layer state of the GRU network as the comprehensive characteristic to obtain the integrated characteristic;
step 3.4, integrating the characteristic input full-connection layer FC_2 to obtain the mass fraction at the moment t;
step 3.5, taking the lowest mass fraction in the previous frames as a memory quality element at the time t;
step 3.6, constructing the current quality element in the t-th frame, and weighting the quality score in the next few frames, so as to assign a larger weight to the frames with low quality scores.
Further, in the step 3.5, the memory quality elements are:
Figure SMS_1
wherein ,
Figure SMS_2
representing memory quality element->
Figure SMS_3
Index set representing all moments +.>
Figure SMS_4
、/>
Figure SMS_5
The quality scores at time t and time k are indicated, s being a super parameter associated with time t.
Further, in the step 3.6, the current mass elements are:
Figure SMS_6
Figure SMS_7
wherein ,
Figure SMS_8
for the current quality element->
Figure SMS_9
For the weights, a softmin function definition is used,
Figure SMS_10
an index set indicating the relevant time, e indicating a natural constant.
Further, the substep of the step 4 is as follows:
step 4.1, linearly combining the memory quality element with the current quality element to obtain the approximate quality fraction of the subjective frame moment;
and 4.2, carrying out time global average pooling on the approximate quality score to obtain a final video quality score.
Further, in the step 4.1, the method for calculating the approximate mass fraction is as follows:
Figure SMS_11
wherein ,
Figure SMS_12
representing approximate mass fraction, ++>
Figure SMS_13
Representing memory quality element->
Figure SMS_14
R is a super parameter that balances the contributions of the memory mass element and the current mass element.
Compared with the prior art, the beneficial effects of adopting the technical scheme include:
1. the constructed multi-direction differential second-order differential Gaussian filter characteristic extraction module can extract rich edge characteristic information in the image.
2. The feature extraction network model obtained by combining the constructed feature extraction module with the deep convolutional neural network has the capability of identifying different content information.
3. The recurrent neural network GRU can effectively model long-term dependency relationship of quality elements at different moments in video.
Therefore, the video quality evaluation method provided by the invention can realize more accurate video quality evaluation effect.
Drawings
Fig. 1 is a flowchart of a video quality evaluation method according to the present invention.
Fig. 2 is a schematic diagram of extracting content-aware features according to an embodiment of the present invention.
FIG. 3 is a schematic diagram of modeling long-term dependency and evaluating video quality according to an embodiment of the present invention.
Detailed Description
Embodiments of the present application are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar modules or modules having like or similar functions throughout. The embodiments described below by referring to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present application. On the contrary, the embodiments of the present application include all alternatives, modifications, and equivalents as may be included within the spirit and scope of the appended claims.
Example 1
Aiming at the defects that the prior art does not fully consider the human visual characteristics, the traditional method needs to extract a large amount of manual characteristics, time and labor are wasted, various characteristic information of images are not fully considered, and the like, referring to fig. 1, the embodiment provides a video quality evaluation method based on content perception fusion characteristics, which comprises the following steps:
step 1, constructing a multidirectional differential second-order differential Gaussian filter characteristic extraction module for extracting input image characteristics;
step 2, building a residual feature extraction network model based on a multi-direction differential second-order differential Gaussian filter feature extraction module and a depth convolution neural network, and inputting video frame by frame into the residual feature extraction network model to obtain content perception features of each frame of image;
step 3, reducing the dimension of the content perception characteristics, inputting the content perception characteristics into a gate control recurrent neural network GRU, and modeling long-term dependency relationship to obtain quality elements and weights of the video at different moments;
and 4, determining the final quality score of the video based on the quality elements and weights at different moments.
In step 1 of this embodiment, the number of directions is selected to be 8, and gradient information of different angles of the image is obtained through a multi-direction differential second-order differential gaussian filtering characteristic extraction module.
In the embodiment, gradient information is extracted through the multi-directional differential second-order differential Gaussian filter characteristic extraction module established in the step 1, and content perception characteristics are extracted through cooperation with the deep convolutional neural network, wherein the multi-directional differential second-order differential Gaussian filter characteristic extraction module and the deep convolutional neural network can form a residual characteristic extraction network model.
The gate control recurrent neural network GRU in the step 3 can integrate characteristics and learn long-term dependency.
Example 2
On the basis of embodiment 1, this embodiment further describes a multi-directional differential second order differential gaussian filter feature extraction module and a feature extraction method in step 1, which are specifically as follows:
constructionMultidirectional differential second-order differential Gaussian kernel
Figure SMS_15
And its directional derivative>
Figure SMS_16
The method is specifically as follows:
Figure SMS_17
Figure SMS_18
wherein ,
Figure SMS_21
and />
Figure SMS_23
Respectively representing the abscissa and the ordinate of pixels in the image; />
Figure SMS_25
Representing the differentiation factor; />
Figure SMS_20
;/>
Figure SMS_24
,/>
Figure SMS_26
The selected angle value is represented, and the calculation formula is as follows:
Figure SMS_27
the value range of m is +.>
Figure SMS_19
,/>
Figure SMS_22
The number of the selected directions is represented, and the value range of M is any positive integer.
This embodimentSelecting a direction number M=8 to obtain gradient information of different angles of the image; in the feature extraction, the input image I (x, y) is differentiated with a multidirectional second-order differential Gaussian directional derivative
Figure SMS_28
And carrying out convolution operation to achieve the purpose of extracting characteristic information, wherein the specific operation is as follows:
Figure SMS_29
wherein ,
Figure SMS_30
representing image features.
Example 3
On the basis of embodiment 1 or 2, as shown in fig. 2, the specific process of extracting the content perception feature in step 2 is further described, and it should be noted that the feature extraction module in fig. 2 refers to a multi-direction differential second order differential gaussian filter feature extraction module:
step 2.1, for an input video material, carrying out frame-by-frame splitting on the video material to obtain T RGB three-channel color images;
step 2.2, the image is to be obtained
Figure SMS_31
Wherein the value range of T is 1-T, and the image size is uniformly scaled to 224 pixels by 224 pixels through an image processing size operation;
step 2.3, setting a 2D convolution layer Conv_2D_1, wherein the number of convolution kernels is 64, the size of the convolution kernels is 7 multiplied by 7, the step length is 2, the filling dimension is 3, and the output dimension of an image subjected to B2 operation is 112 multiplied by 64 after the image is subjected to Conv_2D_1;
2.4, inputting the image obtained in the step 2.2 into a multi-directional differential second-order differential Gaussian filter characteristic extraction module to perform characteristic extraction, wherein the dimension of an output characteristic is 112×112×8, then performing concat characteristic fusion operation with the characteristic output in the step 2.3, wherein the dimension of the fusion characteristic is 112×112×72, and then sending the fusion characteristic into a 1×1×64 convolution to restore the channel number to 64 dimensions;
step 2.5, the fusion characteristic with 64 dimensions of channels is sent into a largest pooling layer with a core size of 3 multiplied by 3, a step length of 2 and a filling dimension of 1, and the dimension of the output characteristic is 56 multiplied by 64;
step 2.6, establishing a Bottleneck convolution structure, and inputting the output characteristic in the step 2.5 into the Bottleneck convolution structure output characteristic W t Feature W t The method comprises a plurality of feature graphs, wherein T is 1-T;
step 2.7, feature W t Each feature map in the map is subjected to space global pooling (Spatialgp), and then is subjected to space global average pooling (GP) mean ) And spatial global standard deviation pooling (GP) std ) Obtaining feature F in feature map by joint operation t
Figure SMS_32
Feature F fused by multidirectional differential second-order differential Gaussian filter feature extraction module and deep convolution neural network t Has the ability to distinguish information of different content and thus the feature has content-aware properties.
Example 4
On the basis of embodiment 3, this embodiment proposes a specific bottleck convolution structure construction process, which is specifically as follows:
step 2.6.1, setting a 2D convolution layer Conv_2D_2, wherein the number of convolution kernels is C 1 The convolution kernel size is 1×1, the step size is 1, and the filling dimension is 0;
step 2.6.2, setting a 2D convolution layer Conv_2D_3, wherein the number of convolution kernels is C 1 The convolution kernel size is 7×7, the step size is 1, and the filling dimension is 1;
step 2.6.3, setting 2D convolution layer Conv_2D_4 with number of convolution kernels C 2 The convolution kernel size is 1×1, the step size is 1, and the filling dimension is 0;
step 2.6.4, sequentially connecting the 2D convolution layers Conv_2D_2, conv_2D_3 and Conv_2D_4 to obtain a convolution module named as a Bottleneck-A structure;
step 2.6.5, the number of convolution kernels of three 2D convolution layers in the Bottleneck-A structure is respectively set as 2C 1 、2C 1 、2C 2 Obtaining a Bottleneck-B structure; similarly, the number of convolution kernels is set to 4C 1 、4C 1 、4C 2 and 8C1 、8C 1 、8C 2 Obtaining a Bottleneck-C structure and Bottleneck-D;
step 2.6.6, connecting 3 Bottleneck-A structures, 4 Bottleneck-B structures, 6 Bottleneck-C structures and 3 Bottleneck-D structures in sequence to obtain Bottleneck convolution structures.
Example 5
On the basis of embodiment 3 or 4, as shown in fig. 3, the present embodiment is further described with respect to a specific procedure of modeling long-term dependencies and acquiring quality elements using a recurrent neural network with gate control. Specific:
step 3.1, performing dimension reduction on the content perception feature through a full connection layer FC_1 to obtain a dimension reduction feature X t
Figure SMS_33
wherein ,
Figure SMS_34
and />
Figure SMS_35
For two parameters in the fully connected layer fc_1, scaling and bias terms are represented, respectively.
Step 3.2, the dimension reduction characteristics are sent into a gate control recurrent neural network GRU which can integrate and adjust and learn long-term dependency;
step 3.3, calculating the hidden layer state at the time t by taking the hidden layer state of the GRU network as the comprehensive characteristic to obtain the integrated characteristic;
in this embodiment, the hidden layer has an initial value ofh 0 Hidden layer integration feature at time th t From input features X at time t t And the hidden layer at the previous momenth t-1 And (3) calculating to obtain:
Figure SMS_36
step 3.4, integration of features
Figure SMS_37
Inputting another full connection layer FC_2 to obtain the quality fraction at the moment tq t
Figure SMS_38
wherein ,
Figure SMS_39
and />
Figure SMS_40
For two parameters in the fully connected layer fc_2, scaling and bias terms are represented, respectively.
Step 3.5, taking the lowest quality fraction in the previous frames as a memory quality element at the time t
Figure SMS_41
Figure SMS_42
wherein ,
Figure SMS_43
representing memory quality element->
Figure SMS_44
Index set representing all moments +.>
Figure SMS_45
、/>
Figure SMS_46
The quality scores at time t and time k are indicated, s being a super parameter associated with time t.
Step 3.6, in order to simulate the phenomenon that human beings have deep memory for video quality degradation and have weak perceptibility for video quality enhancement, in this embodiment, the current quality element is constructed in the t-th frame
Figure SMS_47
And weighting the quality score in the next few frames (may be made of +.>
Figure SMS_48
Determined) a greater weight is assigned to frames with low quality scores by:
Figure SMS_49
Figure SMS_50
wherein ,
Figure SMS_51
for the current quality element->
Figure SMS_52
For the weights, a softmin function definition is used,
Figure SMS_53
an index set indicating the relevant time, e indicating a natural constant. />
Example 6
On the basis of embodiment 5, this embodiment further describes the method for obtaining the final quality score of the video in step 4, specifically:
step 4.1, obtaining the approximate mass fraction of the subjective frame moment by linearly combining the memory mass element and the current mass element
Figure SMS_54
Figure SMS_55
Wherein r is a super parameter balancing the contributions of the memory mass element and the current mass element.
Step 4.2, approximate mass fraction
Figure SMS_56
And obtaining the final video quality fraction Q after time global average pooling.
Figure SMS_57
The invention can be better realized based on any of the embodiments 1-6, and the quality score of one end video can be accurately obtained.
It should be noted that, in the description of the embodiments of the present invention, unless explicitly specified and limited otherwise, the terms "disposed," "connected," and "connected" are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; may be directly connected or indirectly connected through an intermediate medium. The specific meaning of the above terms in the present invention will be understood in detail by those skilled in the art; the accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.
Although embodiments of the present application have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the application, and that variations, modifications, alternatives, and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the application.

Claims (10)

1. A video quality assessment method based on content-aware fusion features, comprising:
step 1, constructing a multidirectional differential second-order differential Gaussian filter characteristic extraction module for extracting input image characteristics;
step 2, building a residual feature extraction network model based on a multi-direction differential second-order differential Gaussian filter feature extraction module and a depth convolution neural network, and inputting video frame by frame into the residual feature extraction network model to obtain content perception features of each frame of image;
step 3, reducing the dimension of the content perception characteristics, inputting the content perception characteristics into a gate control recurrent neural network GRU, and modeling long-term dependency relationship to obtain quality elements and weights of the video at different moments;
and 4, determining the final quality score of the video based on the quality elements and weights at different moments.
2. The method for evaluating video quality based on content aware fusion feature according to claim 1, wherein the substeps of step 1 are:
step 1.1, constructing a multidirectional differential second-order differential Gaussian kernel and a directional derivative thereof;
and 1.2, performing convolution operation on the input image and the multi-direction second-order differential Gaussian directional derivative to finish characteristic information extraction.
3. The video quality evaluation method based on content aware fusion feature according to claim 1 or 2, wherein the sub-steps of step 2 are:
step 2.1, frame-by-frame splitting is carried out on an input video to obtain T RGB three-channel color images;
step 2.2, uniformly scaling the obtained image to 224 pixels by 224 pixels;
step 2.3, outputting the image obtained in the step 2.2 through a 2D convolution layer to obtain the image characteristics with the dimension of 112 multiplied by 64;
2.4, inputting the image obtained in the step 2.2 into a multi-directional differential second-order differential Gaussian filter characteristic extraction module for characteristic extraction, fusing the extracted characteristic with the characteristic output in the step 2.3, wherein the dimension of the fused characteristic is 112 multiplied by 72, and recovering the channel number to 64 dimensions by convolution operation on the fused characteristic;
step 2.5, the 64-dimensional fusion features are sent to a maximum pooling layer, and the dimensions of the output features are 56 multiplied by 64;
step 2.6, establishing a Bottleneck convolution structure, and inputting the output characteristic in the step 2.5 into the Bottleneck convolution structure output characteristic W t Feature W t The method comprises a plurality of feature graphs, wherein T is 1-T;
step 2.7, feature W t And carrying out space global pooling on each feature map, and obtaining content perception features in the feature maps through the combined operation of space global average pooling and space global standard deviation pooling.
4. The video quality evaluation method based on content aware fusion feature according to claim 3, wherein in the step 2.5, the maximum pooling layer and size is 3×3, the step size is 2, and the filling dimension is 1.
5. The video quality evaluation method based on content aware fusion feature according to claim 3, wherein in the step 2.6, the specific process of creating the bottleck convolution structure is as follows:
step 2.6.1, setting a 2D convolution layer Conv_2D_2, wherein the number of convolution kernels is C 1 The convolution kernel size is 1×1, the step size is 1, and the filling dimension is 0;
step 2.6.2, setting a 2D convolution layer Conv_2D_3, wherein the number of convolution kernels is C 1 The convolution kernel size is 7×7, the step size is 1, and the filling dimension is 1;
step 2.6.3, setting 2D convolution layer Conv_2D_4 with number of convolution kernels C 2 The convolution kernel size is 1×1, the step size is 1, and the filling dimension is 0;
step 2.6.4, sequentially connecting the 2D convolution layers Conv_2D_2, conv_2D_3 and Conv_2D_4 to obtain a convolution module named as a Bottleneck-A structure;
step 2.6.5, the number of convolution kernels of three 2D convolution layers in the Bottleneck-A structure is respectively set as 2C 1 、2C 1 、2C 2 Obtaining a Bottleneck-B structure; similarly, the number of convolution kernels is set to 4C 1 、4C 1 、4C 2 and 8C1 、8C 1 、8C 2 Obtaining a Bottleneck-C structure and Bottleneck-D;
step 2.6.6, connecting 3 Bottleneck-A structures, 4 Bottleneck-B structures, 6 Bottleneck-C structures and 3 Bottleneck-D structures in sequence to obtain Bottleneck convolution structures.
6. The video quality evaluation method based on content aware fusion feature according to claim 1 or 2, wherein the sub-steps of step 3 are:
step 3.1, performing dimension reduction on the content perception feature through the full connection layer FC_1 to obtain a dimension reduction feature;
step 3.2, the dimension reduction characteristics are sent into a gate control recurrent neural network GRU which can integrate and adjust and learn long-term dependency;
step 3.3, calculating the hidden layer state at the time t by taking the hidden layer state of the GRU network as the comprehensive characteristic to obtain the integrated characteristic;
step 3.4, integrating the characteristic input full-connection layer FC_2 to obtain the mass fraction at the moment t;
step 3.5, taking the lowest mass fraction in the previous frames as a memory quality element at the time t;
step 3.6, constructing the current quality element in the t-th frame, and weighting the quality score in the next few frames, so as to assign a larger weight to the frames with low quality scores.
7. The method for evaluating video quality based on content aware fusion feature according to claim 6, wherein in the step 3.5, the memory quality elements are:
Figure QLYQS_1
wherein ,
Figure QLYQS_2
representing memory quality element->
Figure QLYQS_3
Index set representing all moments +.>
Figure QLYQS_4
、/>
Figure QLYQS_5
The quality scores at time t and time k are respectively represented, and s is a super parameter related to time t.
8. The method for evaluating video quality based on content aware fusion feature according to claim 7, wherein in the step 3.6, the current quality element is:
Figure QLYQS_6
Figure QLYQS_7
wherein ,
Figure QLYQS_8
for the current quality element->
Figure QLYQS_9
For weight, use softmin function definition, +.>
Figure QLYQS_10
The index set representing the time of interest, e represents a natural constant, s is a time-dependent hyper-parameter.
9. The method for evaluating video quality based on content aware fusion feature according to claim 6, wherein the sub-steps of step 4 are:
step 4.1, linearly combining the memory quality element with the current quality element to obtain the approximate quality fraction of the subjective frame moment;
and 4.2, carrying out time global average pooling on the approximate quality score to obtain a final video quality score.
10. The method for evaluating video quality based on content aware fusion feature according to claim 9, wherein in the step 4.1, the approximate quality score calculating method is as follows:
Figure QLYQS_11
wherein ,
Figure QLYQS_12
representing approximate mass fraction, ++>
Figure QLYQS_13
Representing memory quality element->
Figure QLYQS_14
R is a super parameter that balances the contributions of the memory mass element and the current mass element. />
CN202310343979.1A 2023-04-03 2023-04-03 Video quality evaluation method based on content perception fusion characteristics Active CN116071691B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310343979.1A CN116071691B (en) 2023-04-03 2023-04-03 Video quality evaluation method based on content perception fusion characteristics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310343979.1A CN116071691B (en) 2023-04-03 2023-04-03 Video quality evaluation method based on content perception fusion characteristics

Publications (2)

Publication Number Publication Date
CN116071691A true CN116071691A (en) 2023-05-05
CN116071691B CN116071691B (en) 2023-06-23

Family

ID=86171795

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310343979.1A Active CN116071691B (en) 2023-04-03 2023-04-03 Video quality evaluation method based on content perception fusion characteristics

Country Status (1)

Country Link
CN (1) CN116071691B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140044197A1 (en) * 2012-08-10 2014-02-13 Yiting Liao Method and system for content-aware multimedia streaming
CN111833246A (en) * 2020-06-02 2020-10-27 天津大学 Single-frame image super-resolution method based on attention cascade network
CN112784698A (en) * 2020-12-31 2021-05-11 杭州电子科技大学 No-reference video quality evaluation method based on deep spatiotemporal information
CN113554599A (en) * 2021-06-28 2021-10-26 杭州电子科技大学 Video quality evaluation method based on human visual effect
US20220101564A1 (en) * 2020-09-25 2022-03-31 Adobe Inc. Compressing digital images utilizing deep learning-based perceptual similarity
CN115511858A (en) * 2022-10-08 2022-12-23 杭州电子科技大学 Video quality evaluation method based on novel time sequence characteristic relation mapping

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140044197A1 (en) * 2012-08-10 2014-02-13 Yiting Liao Method and system for content-aware multimedia streaming
CN111833246A (en) * 2020-06-02 2020-10-27 天津大学 Single-frame image super-resolution method based on attention cascade network
US20220101564A1 (en) * 2020-09-25 2022-03-31 Adobe Inc. Compressing digital images utilizing deep learning-based perceptual similarity
CN112784698A (en) * 2020-12-31 2021-05-11 杭州电子科技大学 No-reference video quality evaluation method based on deep spatiotemporal information
CN113554599A (en) * 2021-06-28 2021-10-26 杭州电子科技大学 Video quality evaluation method based on human visual effect
CN115511858A (en) * 2022-10-08 2022-12-23 杭州电子科技大学 Video quality evaluation method based on novel time sequence characteristic relation mapping

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
ROGER IMMICH等: "Adaptive motion-aware FEC-based mechanism to ensure video transmission", 《2014 IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATIONS (ISCC)》, pages 1 - 6 *
TAN LU等: "A Novel Contractive GAN Model for a Unified Approach Towards Blind Quality Assessment of Images from Heterogeneous Sources", 《ISVC 2020: ADVANCES IN VISUAL COMPUTING》, pages 27 - 38 *
张坤源: "基于视觉掩蔽和注意力机制的无参考视频质量评价研究", 《中国优秀硕士学位论文全文数据库 (信息科技辑)》, no. 04, pages 138 - 930 *
贺然: "基于视频内容感知的视频质量评价方法研究", 《中国优秀硕士学位论文全文数据库 (信息科技辑)》, no. 02, pages 136 - 622 *

Also Published As

Publication number Publication date
CN116071691B (en) 2023-06-23

Similar Documents

Publication Publication Date Title
CN110555434B (en) Method for detecting visual saliency of three-dimensional image through local contrast and global guidance
Yang et al. 3D panoramic virtual reality video quality assessment based on 3D convolutional neural networks
CN108230278B (en) Image raindrop removing method based on generation countermeasure network
WO2021022929A1 (en) Single-frame image super-resolution reconstruction method
CN102507592B (en) Fly-simulation visual online detection device and method for surface defects
CN109671023A (en) A kind of secondary method for reconstructing of face image super-resolution
CN110349087B (en) RGB-D image high-quality grid generation method based on adaptive convolution
CN110059728B (en) RGB-D image visual saliency detection method based on attention model
CN110263813B (en) Significance detection method based on residual error network and depth information fusion
CN112288627B (en) Recognition-oriented low-resolution face image super-resolution method
CN111612708B (en) Image restoration method based on countermeasure generation network
CN107944437B (en) A kind of Face detection method based on neural network and integral image
CN111709900A (en) High dynamic range image reconstruction method based on global feature guidance
CN113112416B (en) Semantic-guided face image restoration method
CN114863236A (en) Image target detection method based on double attention mechanism
CN110009700B (en) Convolutional neural network visual depth estimation method based on RGB (red, green and blue) graph and gradient graph
CN114897742B (en) Image restoration method with texture and structural features fused twice
CN110717921A (en) Full convolution neural network semantic segmentation method of improved coding and decoding structure
CN113554599A (en) Video quality evaluation method based on human visual effect
CN111882516B (en) Image quality evaluation method based on visual saliency and deep neural network
CN107909565A (en) Stereo-picture Comfort Evaluation method based on convolutional neural networks
CN116071691B (en) Video quality evaluation method based on content perception fusion characteristics
CN113411566A (en) No-reference video quality evaluation method based on deep learning
CN111524060B (en) System, method, storage medium and device for blurring portrait background in real time
CN115953330B (en) Texture optimization method, device, equipment and storage medium for virtual scene image

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant