CN113610862B - Screen content image quality assessment method - Google Patents

Screen content image quality assessment method Download PDF

Info

Publication number
CN113610862B
CN113610862B CN202110831904.9A CN202110831904A CN113610862B CN 113610862 B CN113610862 B CN 113610862B CN 202110831904 A CN202110831904 A CN 202110831904A CN 113610862 B CN113610862 B CN 113610862B
Authority
CN
China
Prior art keywords
image
region
text
screen content
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110831904.9A
Other languages
Chinese (zh)
Other versions
CN113610862A (en
Inventor
王同罕
廖静
何月顺
周书民
徐洪珍
李祥
何剑锋
贾惠珍
李广
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
East China Institute of Technology
Original Assignee
East China Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by East China Institute of Technology filed Critical East China Institute of Technology
Priority to CN202110831904.9A priority Critical patent/CN113610862B/en
Publication of CN113610862A publication Critical patent/CN113610862A/en
Application granted granted Critical
Publication of CN113610862B publication Critical patent/CN113610862B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/10Machine learning using kernel methods, e.g. support vector machines [SVM]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/136Segmentation; Edge detection involving thresholding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30168Image quality inspection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to the technical field of image processing, and discloses a screen content image quality evaluation method, which comprises the following steps: dividing the screen content image into a text area and an image area; extracting texture features and image structural features of an image area; extracting definition and text structural characteristics of a text region; inputting the texture features, the image structure features, the definition, the text structure features and the subjective quality score of the screen content image into LIBSVM software for training to obtain a quality assessment model; and inputting the screen content image to be evaluated, and inputting the screen content image to be evaluated into a quality evaluation model after processing to obtain the quality score. The invention can sense the quality of the image, dynamically detect and adjust the image processing system to output high-quality images according to the quality score of the image, and provide more effective basis for the parameter optimization of the real-time client communication system.

Description

Screen content image quality assessment method
Technical Field
The invention relates to the technical field of image processing, in particular to a screen content image quality assessment method.
Background
At present, with the development of the internet and multimedia technologies, the real-time image communication system and the screen sharing technology are also mature, so that a large number of screen content images are filled on the internet, and how to evaluate the quality of the images becomes a troublesome problem. The quality evaluation of the screen content image plays a great role in the image communication transmission and the real-time multi-client communication system, and the quality of the current image can be obtained through an evaluation algorithm so as to optimize various parameters of the image transmission system to improve the performance. In order to be more consistent with the visual perception effect of human beings, the evaluation method of the screen content image is used for evaluating the characteristics related to the quality of the image extraction. The feature extraction method of the image is mainly divided into two types, one type is traditional manual extraction, and people extract a plurality of features (such as natural scene statistics, edge structure features and the like) in the image according to priori knowledge for calculation; the other type is a deep learning-based method, and effective quality characteristics are automatically obtained through training, so that quality scores are obtained. Different feature extraction methods determine to some extent the efficiency and time complexity of the algorithm. The evaluation method is also classified into three types according to whether or not there is an original reference image: full-reference, half-reference, no-reference quality evaluation methods. Different types of evaluation methods can limit the application scenarios. The screen content image comprises text and image areas, and the conventional evaluation method mainly starts from the whole area of the image, but does not consider that the visual perception of different areas to human eyes has large difference.
There are several methods, the first, a graphic and text region based stacked automatic encoder (SAE, stacked AutoEncoders) method, which uses a fast document layout analysis algorithm based on convolutional neural network (CNN, convolutional Neural Networks) to divide the content in the image into blocks, input a 1-D CNN model, then classify it into text, form and image, then extract quality perception features from the text region and image region, respectively, then train two different SAE by an unsupervised method for extracting quality perception features from these two regions, then input the features and their corresponding subjective scores into two regressors, each regressor can obtain one output prediction score, finally calculate the final perceived quality score of the test SCI from these two prediction scores by a weighted model. The method uses a CNN convolution model to classify the image content, which greatly increases the complexity of the algorithm, and the human eye is mainly interested in text regions as well as image regions. The classification of the images of the text form into several classes and their merging into text and image areas makes the above classification step somewhat superfluous.
Second, the CNN-SQE method improves quality prediction performance by dividing the blurred classification of the screen content image into plain text, computer graphics/cartoon and natural image regions, and performing quality estimation on the different regions, respectively. The operation is mainly performed through three stages: (1) image segmentation; (2) quality assessment of each segmented region; (3) mass combination. The method classifies computer graphics/cartoons, plain text, and natural image regions, while all content of the screen content image is digitally generated by a computer without excessive classification.
Third, a method based on image structural features and uncertainty weighting (SFUW, structure Features And Uncertainty Weighting) first divides a screen content image (SCI, screen Content Image) into text and image regions, then extracts gradient information of the text region as structural features and brightness features, and obtains visual quality of the text region by calculating structural similarity of image blocks, and then effectively fuses the visual quality of the text and image regions to obtain a final quality score using an uncertainty weighting method based on perceptual theory. The method needs to acquire an original reference image, but the original image in the actual situation is difficult to acquire, the algorithm has limitation, and how the weight is set needs to be considered.
In summary, since the information storage and transmission process is limited by the transmission device and the interference of external electrons, the transmitted image is polluted to a certain extent, so that in most cases, the original reference image without damage cannot be obtained, and therefore, the full-reference method is limited in application level. Meanwhile, the screen content image mainly consists of text and image areas, and the complexity of an algorithm is increased easily due to too much area division, so that the calculation efficiency is reduced, and therefore, the design of which image segmentation method is selected is particularly important.
Disclosure of Invention
In order to solve the above-mentioned drawbacks of the prior art, an object of the present invention is to provide a method for evaluating the quality of a screen content image, which firstly performs region segmentation on the screen content image to obtain text and image regions, extracts feature vectors representing the quality of the image according to different region characteristics, and finally trains a predictive model from quality perception function to SCI vision quality by using support vector regression (SVR, support Vector Regression) with radial basis function (RBF, radial Basis Function) kernel. The high-efficiency and convenient evaluation algorithm can sense the quality of the image, can dynamically detect and adjust the image processing system to output high-quality images according to the quality scores of the images, and provides more effective basis for the parameter optimization of the real-time client communication system.
In order to achieve the above purpose, the present invention provides the following technical solutions: a screen content image quality assessment method comprising the steps of:
(1) Constructing a screen content image database;
(2) Executing a text segmentation function on the screen content image, and segmenting the screen content image into a text region and an image region;
(3) Executing an image area quality evaluation function on the image area, and extracting texture features and image structure features of the image area;
(4) Executing a text region quality evaluation function on the text region, and extracting definition and text structure characteristics of the text region;
(5) Randomly selecting screen content images of a thousand screen content image databases, inputting texture features, image structure features, definition, text structure features and subjective quality scores of the thousand screen content images into LIBSVM software for training, and obtaining a quality assessment model;
(6) Inputting a screen content image to be evaluated, and inputting texture features, image structural features, definition, text structural features and subjective quality scores of the screen content image to be evaluated into a quality evaluation model after processing in the steps (2), (3) and (4) to obtain quality scores.
Further, the step (2) specifically includes the following steps: first, a first threshold is dynamically set byFind all the maximum stable extremum regions, where Q i Represents a certain communication region when the first threshold value is i, Δ represents a minute first threshold value change, v (i) is a region Q when the first threshold value is i i Rate of change of v (i) If the value is smaller than the given first threshold value, the region Qi is considered as the maximum stable extremum region; secondly, setting a second threshold value of the eccentricity of an ellipse with the same standard second-order central moment as the region, a third threshold value of Euler number, a fourth threshold value and a fifth threshold value of the ratio of the number of pixels in the region to the total pixels in the boundary box, and a sixth threshold value of the proportion of pixels in the convex hull, calculating the eccentricity of the ellipse with the same standard second-order central moment as the region in the maximum stable extremum region, the Euler number, the ratio of the number of pixels in the region to the total pixels in the boundary box, and the proportion of pixels in the convex hull, and determining the first text region when the calculated eccentricity of the ellipse with the same standard second-order central moment as the region is larger than the second threshold value and the Euler number is smaller than the third threshold value, and the proportion of the number of pixels in the region to the total pixels in the boundary box is lower than the fourth threshold value or larger than the fifth threshold value and the sixth threshold value; then, a seventh threshold value of the stroke width change rate is set, the stroke width change rate of the first text region is calculated, when the change rate is larger than the seventh threshold value, the second text region can be confirmed, finally, the second text regions in all the maximum stable extremum regions are extracted and combined to serve as text regions, and the rest regions of the screen content image are combined to serve as image regions.
Further, the step (3) specifically includes the following steps:
s1: texture features of an image region are extracted by, firstThe Scharr operator calculates a gradient map g (i, j) of the image region and normalizes the gradient map:wherein [ among others ]]To get the whole operation, g max The maximum value of the original gradient value is L, and the normalized maximum gray level number is L; then, the gray scale f (i, j) of the image area is normalized: />f max Is the gray maximum value in the original gray map; then, a gray-gradient co-occurrence matrix M is constructed, the horizontal increment is a gradient value, the vertical increment is a gray value, and the origin is positioned at the sitting vertex of the matrix. M is defined as M (i, j) = # g (M, N) = i, f (M, N) = j, m=0, 1, 2..m-1, n=0, 1, 2..n-1 }, where M x N is the size of the gradient and gray map, # { } is expressed as the number of elements in the set, and finally extracting the statistical features of the gray-gradient co-occurrence matrix comprises: gradient entropy->Gray entropyEnergy->Gray scale mean valueGradient mean->Standard deviation of gradientGray standard deviation->As texture features of image areas, wherein the total number of occurrences of (i, j) is normalizedThe probability of occurrence P (i, j);
s2: extracting image structural features of an image area, firstly, partitioning the image area into n multiplied by n partial image blocks with equal size, performing partial two-dimensional discrete cosine transform on each image block to obtain DCT coefficients, then fitting the DCT coefficients by using a generalized Gaussian distribution model, obtaining shape parameters gamma of the image block after fitting, taking the average value of the gamma values of the lowest 10% as a first structural feature, taking the average value of all gamma values as a second structural feature, and then calculating frequency change coefficientsWhere σ|X| is the variance of the block, μ|X| is the mean of the block, taken +.>As a third structural feature, the maximum 10% of the mean value of>As a fourth structural feature, then, to acquire direction information from the partial image block, the DCT coefficient block is divided into low, medium, and high 3 frequency bands, and then the average energy in each frequency band is calculated: />Wherein n is a positive integer, sigma n For the variance of band n, the ratio of sub-band energies is calculated: />R is taken n Taking the highest 10% of the mean value of R as the fifth structural feature n As a sixth structural feature, finally, in order to extract the direction information, the DCT coefficients are divided into 3 parts in 3 directions according to the vertical direction of the radial frequency variation, and then the frequency variation coefficient +_ in 3 parts is calculated>Calculate->The variance of (2), the mean of the highest 10% of the differences was taken as the seventh structural feature, and +.>As an eighth structural feature.
Further, the step (4) specifically includes the following steps:
s1: extracting the definition of a text region, firstly, filtering in x and y directions, normalizing the filtered image compared with the maximum value in the filtered image, and when the normalized pixel point value is greater than a preset threshold value (such as 0.0001), taking the pixel point as a possible edge pixel, and then calculating the difference delta DoM of median filtering image difference in the horizontal direction and the vertical direction respectively, wherein the horizontal direction is as follows: ΔDoM x (i,j)=[I M (i+2,j)-I M (i,j)]-[I M (i,j)-I M (i-2,j)]Vertical direction: ΔDoM y (i,j)=[I M (i,j+2)-I M (i,j)]-[I M (i,j)-I M (i,j-2)]Wherein I M (i, j) is the gray value of the median filtered image at pixel (i, j), using the difference of deviation 2, the sharpness in the x-direction at pixel (i, j) is defined as:the same applies to the definition calculation in the y-direction, wherein Σ i-w≤k≤i+w |ΔDoM x (k, j) indicates summing ΔDoM over a window of size 2w+1, normalizing contrast at edges, Σ i-w≤k≤i+w I (k, j) -I (k-1, j) is the contrast at a window size of 2w+1, when S x If (i, j) is greater than the preset threshold, the pixel point at (i, j) is clear, and finally, the definition of the image of the region is defined as:wherein: /># sharpPixels is the number of clear pixels and# edgePixels is the number of edge pixels;
s2: extracting text structural features of a text region, firstly, calculating a gradient map GM of the text region, and calculating gradients at image pixels (i, j) as follows:wherein the method comprises the steps of h represents the gradient operator and,representing a convolution operation. A local binary pattern LBP of rotation invariance is then calculated on the gradient map GM,where delta represents a unified metric, U represents the number of neighboring pixels, S represents the radius value of the field, ρ is defined as a threshold function,G k ,G C the GM values, expressed as center coordinates and their neighborhoods, are then computed, and it is observed that GMLBP may contain u+2 different modes, which may be combined into one bin of the histogram, set U to 8, so that the histogram has 10 bins in total, and are computed separately at three scales, the original image, the downsampled image with a downsampling factor of 2, and the downsampled image with a downsampling factor of 4, so that 30 text structural features are extracted in total.
The invention has the beneficial effects that: a segmentation method based on digital text is used for dividing a text region and an image region of a screen content image, the text region is extracted to effectively represent definition of the text region and text structural features of a gradient domain as feature vectors of the text region, the image region is based on statistical features extracted by a gray-gradient co-occurrence matrix as texture features, the structural features extracted by a DCT domain are used for representing the feature vectors of the image region, a regression model is trained by an SVM method to obtain more accurate quality scores, rapid and efficient assessment can be performed on the screen content image existing on the Internet, and more reliable method basis is provided for subsequent image quality optimization, denoising, fusion and other directions according to the assessed effect.
Drawings
Fig. 1 is a screen content image evaluation flow of a screen content image quality evaluation method of the present invention.
Fig. 2 is a text segmentation flow of a screen content image quality evaluation method according to the present invention.
Fig. 3 is a block frequency division diagram of DCT coefficients in a screen content image quality evaluation method according to the present invention.
Fig. 4 is a graph showing the division of DCT coefficients according to radial frequency variation in a screen content image quality evaluation method according to the present invention.
Detailed Description
In order that the above-recited objects, features and advantages of the present invention will be more clearly understood, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced in other ways than those described herein, and therefore the present invention is not limited to the specific embodiments disclosed below.
As shown in fig. 1-2, a screen content image quality evaluation method includes the steps of:
(1) Constructing a screen content image database, installing matlabR2016a, and creating a matt test file;
(2) Performing a text segmentation function on the screen content image; for efficient segmentation of text regions a text localization method for native Digital (BD) text is used, since this method is text extraction for Digital text images, we areThe text in the screen content image is mostly digitized text, the text has larger difference with the document text and the scene text, the text region positioning is more targeted and more accurate by using the method, firstly, a first threshold value is dynamically set, and the text region is positioned by using the methodFind all the maximally stable extremal regions (MSER, maximally Stable Extremal Regions), where Q i Represents a certain communication region when the first threshold value is i, Δ represents a minute first threshold value change, v (i) is a region Q when the first threshold value is i i Rate of change of v (i) If the value is smaller than the given first threshold value, the region Qi is considered as the maximum stable extremum region; secondly, setting a second threshold value of the eccentricity of an ellipse with the same standard second-order central moment as the region to be 0.995, a third threshold value of Euler number to be-4, a fourth threshold value of the ratio of the number of pixels in the region to the total pixels in the boundary box to be 0.2 and a fifth threshold value to be 0.9, and a sixth threshold value of the proportion of the pixels in the convex hull to be 0.3, calculating the eccentricity of the ellipse with the same standard second-order central moment as the region in the maximum stable extremum region, euler number, the ratio of the number of pixels in the region to the total pixels in the boundary box, and the proportion of the pixels in the convex hull to be limited to the sixth threshold value, and determining the first text region when the calculated eccentricity of the ellipse with the same standard second-order central moment as the region is larger than the second threshold value, euler number is smaller than the third threshold value, and the ratio of the number of pixels in the region to the total pixels in the boundary box is smaller than the fourth threshold value or larger than the fifth threshold value; then, setting a seventh threshold value of the stroke width change rate to be 0.3, calculating the stroke width change rate of the first text region, when the change rate is larger than the seventh threshold value, confirming a second text region, finally, extracting and merging all the second text regions with the maximum stable extremum regions as text regions, and merging the rest regions of the screen content image as image regions;
(3) Performing an image region quality assessment function on an image region, comprising the steps of:
s1: extracting texture features of the image area; the image region contains a large number of textures and structures, so we use the statistical features of the gray-gradient co-occurrence matrix as the texture features of the image region, first, calculate the gradient map of the image region by Scharr operator, and normalize the gradient map:wherein [ among others ]]To get the whole operation, g max The maximum value of the original gradient value is L, and the normalized maximum gray level number is L; then, the gray scale map of the image area is normalized: />f max Is the gray maximum value in the original gray map; then, a gray-gradient co-occurrence matrix M is constructed, the horizontal increment is a gradient value, the vertical increment is a gray value, and the origin is positioned at the sitting vertex of the matrix. M is defined as M (i, j) = # g (M, N) = i, f (M, N) = j, m=0, 1, 2..m-1, n=0, 1, 2..n-1 }, where M x N is the size of the gradient and gray map, # { } is expressed as the number of elements in the set, and finally extracting the statistical features of the gray-gradient co-occurrence matrix comprises: gradient entropy->Gray entropyEnergy->Gray scale mean valueGradient mean->Standard deviation of gradientGray standard deviation->As a texture feature of the image area, wherein the total number of occurrences of (i, j) is normalized to the probability of occurrence P (i, j);
s2: extracting image structural features of the image area; image region has a lot of structural information besides texture information, so that the image structural features of the image region need to be extracted for quality assessment features, firstly, the image region is divided into n×n partial image blocks with equal size, and each image block is subjected to partial two-dimensional discrete cosine transform (DCT, discrete Cosine Tansform) to obtain DCT coefficients, then the DCT coefficients are fitted by using generalized Gaussian distribution (GGD, generalized Gaussian Distribution) model, after the fitting, the shape parameters gamma of the image block are obtained, the average value of the gamma value of the lowest 10% is taken as the first structural feature, the average value of all gamma values is taken as the second structural feature, which is pooling, the prior study shows that the pooling can improve the correlation with subjective quality perception, the following operation is the same, and then the frequency change coefficient is calculatedWhere σ|X| is the variance of the block, μ|X| is the mean of the block, taken +.>As a third structural feature, the maximum 10% of the mean value of>As a fourth structural feature, after that, in order to acquire direction information from the partial image block, the DCT coefficient block is divided into low, medium, and high 3 frequency bands, the hatching represents the division of three frequency bands as shown in fig. 3, and then the average energy in each frequency band is calculated: />Wherein n is a positive integer, sigma n For the variance of band n, the ratio of sub-band energies is calculated:/>R is taken n Taking the highest 10% of the mean value of R as the fifth structural feature n As a sixth structural feature, finally, in order to extract direction information, the DCT coefficients are divided into 3 parts in 3 directions according to the vertical direction of radial frequency variation, divided as shown by hatching in fig. 4, and then the frequency variation coefficients in 3 directions are calculatedCalculate->The variance of (2), the mean of the highest 10% of the differences was taken as the seventh structural feature, and +.>As an eighth structural feature;
(4) Executing a text region quality assessment function on the text region, comprising the steps of:
s1: extracting the definition of the text region; since the sharpness of a text region affects the visual perception quality of human eyes, we have to make a measure of the sharpness of the text region, we use the difference Δdom (Difference Of Differences In Grayscale Values Of A Median-filtered Image) of median filtered Image difference as a measure feature of whether the edge is sharp, which can determine whether the edge is sharp by whether the slope changes rapidly, firstly, filter in x and y directions, and normalize the maximum value in the filtered Image, and when the normalized pixel value is greater than a preset threshold value of 0.0001, the pixel is used as a possible edge pixel, and then calculate the difference Δdom of median filtered Image difference in horizontal direction and vertical direction, respectively: ΔDoM x (i,j)=[I M (i+2,j)-I M (i,j)]-[I M (i,j)-I M (i-2,j)]Vertical direction: ΔDoM y (i,j)=[I M (i,j+2)-I M (i,j)]-[I M (i,j)-I M (i,j-2)]Wherein I M (i, j) is the gray value of the median filtered image at pixel (i, j) and in order to make the variation of the edge intensity more stable and thus use the difference of deviation 2, the sharpness in x-direction at pixel (i, j) is defined as:the same applies to the definition calculation in the y-direction, wherein Σ i-w≤k≤i+w |ΔDoM x (k, j) indicates summing ΔDoM over a window of size 2w+1, normalizing contrast at edges, Σ i-w≤k≤i+w I (k, j) -I (k-1, j) is the contrast at a window size of 2w+1, when S x If (i, j) is greater than the preset threshold, the pixel point at (i, j) is clear, and finally, the definition of the image of the region is defined as:wherein: /># sharpPixels is the number of clear pixels and# edgePixels is the number of edge pixels;
s2: extracting text structural features of a text region, firstly, calculating a gradient map GM of the text region, and calculating gradients at image pixels (i, j) as follows:wherein the method comprises the steps of h represents the gradient operator and,representing a convolution operation. A local binary pattern LBP of rotation invariance is then calculated on the gradient map,where delta represents a unified metric, U represents the number of neighboring pixels, S represents the radius value of the field, ρ is defined as a threshold function,G k ,G C the GM values expressed as center coordinates and their neighborhoods, then calculate the GMLBP histogram, observe that GMLBP may contain u+2 different modes, which may be combined into one bin of the histogram, set U to 8, so that the histogram has 10 bins in total, and calculate each at three scales, the three scales being the original image, the downsampled image with a downsampling factor of 2, the downsampled image with a downsampling factor of 4, so that 30 text structural features are extracted in total;
(5) Downloading LIBSVM software package, setting parameters of SVM model, wherein the set model is SVR regression model, kernel function type is RBF kernel function, randomly selecting screen content images of one thousand screen content image databases, inputting texture features, image structure features, definition, text structure features and subjective quality score of one thousand screen content images into LIBSVM software for training: selecting 80% of images in a screen content image database as a training set for one thousand times randomly, using 20% of images as a test set, obtaining a quality evaluation model after regression training by using training set data, using the test set to verify the quality evaluation model, and calculating a correlation coefficient between the score of the training set image and the subjective quality score estimated by using the quality evaluation model each time, wherein the correlation coefficient comprises SROCC, PLCC, KRCC and RMSE, the correlation coefficient can reflect the error and correlation of the score obtained by the quality evaluation model and the subjective quality score, can be used as an evaluation index of algorithm quality, and taking a median value as a final correlation coefficient value respectively after the training of thousands times randomly;
(6) Inputting a screen content image to be evaluated, and inputting texture features, image structural features, definition, text structural features and subjective quality scores of the screen content image to be evaluated into a quality evaluation model after processing in the steps (2), (3) and (4) to obtain quality scores.
The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (2)

1. A screen content image quality evaluation method, comprising the steps of:
(1) Constructing a screen content image database;
(2) Executing a text segmentation function on the screen content image, and segmenting the screen content image into a text region and an image region;
(3) Executing an image area quality evaluation function on the image area, and extracting texture features and image structure features of the image area;
(4) Executing a text region quality evaluation function on the text region, and extracting definition and text structure characteristics of the text region;
(5) Randomly selecting screen content images of a thousand screen content image databases, inputting texture features, image structure features, definition, text structure features and subjective quality scores of the thousand screen content images into LIBSVM software for training, and obtaining a quality assessment model;
(6) Inputting a screen content image to be evaluated, and inputting texture features, image structural features, definition, text structural features and subjective quality scores of the screen content image to be evaluated into a quality evaluation model after the screen content image to be evaluated is processed in the steps (2), (3) and (4), so as to obtain quality scores;
wherein, the step (2) specifically comprises the following steps: first, a first threshold is dynamically set byFind all the maximum stable extremum regions, where Q i Representing the first threshold value asin a certain communication region in i, Δ represents a slight first threshold change, v (i) is a region Q when the first threshold is i i Rate of change of v (i) If the value is smaller than the given first threshold value, the region Qi is considered as the maximum stable extremum region; secondly, setting a second threshold value of the eccentricity of an ellipse with the same standard second-order central moment as the region, a third threshold value of Euler number, a fourth threshold value and a fifth threshold value of the ratio of the number of pixels in the region to the total pixels in the boundary box, and a sixth threshold value of the proportion of pixels in the convex hull, calculating the eccentricity of the ellipse with the same standard second-order central moment as the region in the maximum stable extremum region, the Euler number, the ratio of the number of pixels in the region to the total pixels in the boundary box, and the proportion of pixels in the convex hull, and determining the first text region when the calculated eccentricity of the ellipse with the same standard second-order central moment as the region is larger than the second threshold value and the Euler number is smaller than the third threshold value, and the proportion of the number of pixels in the region to the total pixels in the boundary box is lower than the fourth threshold value or larger than the fifth threshold value and the sixth threshold value; then, setting a seventh threshold value of the stroke width change rate, calculating the stroke width change rate of the first text region, when the change rate is larger than the seventh threshold value, confirming a second text region, finally, extracting and combining all the second text regions with the maximum stable extremum regions as text regions, and combining the rest regions of the screen content image as image regions;
the step (3) specifically comprises the following steps:
s1: extracting texture features of an image region, firstly, calculating a gradient map of the image region through a Scharr operator, and normalizing the gradient map:wherein [ among others ]]To get the whole operation, g max The maximum value of the original gradient value is L, and the normalized maximum gray level number is L; then, the gray scale map of the image area is normalized:f max is the gray maximum value in the original gray map; then, constructing a gray-gradient co-occurrence matrix M, horizontally increasing the matrix to be a gradient value, vertically increasing the matrix to be a gray value, and positioning an origin at a sitting vertex of the matrix; m is defined as M (i, j) = # g (M, N) = i, f (M, N) = j, m=0, 1, 2..m-1, n=0, 1, 2..n-1 }, where M x N is the size of the gradient and gray map, # { } is expressed as the number of elements in the set, and finally extracting the statistical features of the gray-gradient co-occurrence matrix comprises: gradient entropy->Gray entropyEnergy->Gray scale mean valueGradient mean->Standard deviation of gradientGray standard deviation->As a texture feature of the image area, wherein the total number of occurrences of (i, j) is normalized to the probability of occurrence P (i, j);
s2: extracting image structure characteristics of an image area, firstly, partitioning the image area into n multiplied by n partial image blocks with equal size, carrying out partial two-dimensional discrete cosine transform on each image block to obtain DCT coefficients, then fitting the DCT coefficients by using a generalized Gaussian distribution model, obtaining shape parameters gamma of the image blocks after fitting, and obtaining the minimum 10 percent of the shape parameters gammaThe average value of the gamma values is used as a first structural feature, the average value of all the gamma values is used as a second structural feature, and then the frequency change coefficient is calculatedWhere σ|X| is the variance of the block, μ|X| is the mean of the block, taken +.>As a third structural feature, the maximum 10% of the mean value of>As a fourth structural feature, then, to acquire direction information from the partial image block, the DCT coefficient block is divided into low, medium, and high 3 frequency bands, and then the average energy in each frequency band is calculated: />Wherein n is a positive integer, sigma n For the variance of band n, the ratio of sub-band energies is calculated: />R is taken n Taking the highest 10% of the mean value of R as the fifth structural feature n As a sixth structural feature, finally, in order to extract the direction information, the DCT coefficients are divided into 3 parts in 3 directions according to the vertical direction of the radial frequency variation, and then the frequency variation coefficients +_ in 3 directions are calculated>Calculate->The variance of (2), the mean of the highest 10% of the differences was taken as the seventh structural feature, and +.>As an eighth structural feature.
2. The method of evaluating the image quality of screen contents according to claim 1, wherein said step (4) comprises the steps of:
s1: extracting the definition of a text region, firstly, filtering in x and y directions, normalizing the filtered image compared with the maximum value in the filtered image, and when the normalized pixel point value is larger than a preset threshold value, taking the pixel point as a possible edge pixel, and then calculating the difference delta DoM of median filtering image difference in the horizontal direction and the vertical direction respectively, wherein the horizontal direction is as follows: ΔDoM x (i,j)=[I M (i+2,j)-I M (i,j)]-[I M (i,j)-I M (i-2,j)]Vertical direction: ΔDoM y (i,j)=[I M (i,j+2)-I M (i,j)]-[I M (i,j)-I M (i,j-2)]Wherein I M (i, j) is the gray value of the median filtered image at pixel (i, j), using the difference of deviation 2, the sharpness in the x-direction at pixel (i, j) is defined as:the same applies to the definition calculation in the y-direction, wherein Σ i-w≤k≤i-w |ΔDoM x (k, j) indicates summing ΔDoM over a window of size 2w+1, normalizing contrast at edges, Σ i-w≤k≤i+w I (k, j) -I (k-1, j) is the contrast at a window size of 2w+1, when S x If (i, j) is greater than the preset threshold, the pixel point at (i, j) is clear, and finally, the definition of the image of the region is defined as: />Wherein:# sharpPixels is the number of clear pixels and# edgePixels is the number of edge pixels;
S2: extracting text structural features of a text region, firstly, calculating a gradient map GM of the text region, and calculating gradients at image pixels (i, j) as follows:wherein the method comprises the steps ofh represents the gradient operator,>representing convolution operation; a local binary pattern LBP of rotation invariance is then calculated on the gradient map,where delta represents a unified metric, U represents the number of neighboring pixels, S represents the radius value of the field, ρ is defined as a threshold function,G k ,G C the GM values, expressed as center coordinates and their neighborhoods, are then computed, and it is observed that GMLBP may contain u+2 different modes, which may be combined into one bin of the histogram, set U to 8, so that the histogram has 10 bins in total, and are computed separately at three scales, the original image, the downsampled image with a downsampling factor of 2, and the downsampled image with a downsampling factor of 4, so that 30 text structural features are extracted in total.
CN202110831904.9A 2021-07-22 2021-07-22 Screen content image quality assessment method Active CN113610862B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110831904.9A CN113610862B (en) 2021-07-22 2021-07-22 Screen content image quality assessment method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110831904.9A CN113610862B (en) 2021-07-22 2021-07-22 Screen content image quality assessment method

Publications (2)

Publication Number Publication Date
CN113610862A CN113610862A (en) 2021-11-05
CN113610862B true CN113610862B (en) 2023-08-01

Family

ID=78305174

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110831904.9A Active CN113610862B (en) 2021-07-22 2021-07-22 Screen content image quality assessment method

Country Status (1)

Country Link
CN (1) CN113610862B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114067006B (en) * 2022-01-17 2022-04-08 湖南工商大学 Screen content image quality evaluation method based on discrete cosine transform
CN116578763B (en) * 2023-07-11 2023-09-15 卓谨信息科技(常州)有限公司 Multisource information exhibition system based on generated AI cognitive model
CN116664551B (en) * 2023-07-21 2023-10-31 深圳市长荣科机电设备有限公司 Display screen detection method, device, equipment and storage medium based on machine vision
CN116863492B (en) * 2023-09-04 2023-11-21 山东正禾大教育科技有限公司 Mobile digital publishing system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107123122A (en) * 2017-04-28 2017-09-01 深圳大学 Non-reference picture quality appraisement method and device
CN108596906A (en) * 2018-05-10 2018-09-28 嘉兴学院 It is a kind of to refer to screen image quality evaluating method entirely based on sparse locality preserving projections
WO2018195891A1 (en) * 2017-04-28 2018-11-01 深圳大学 Method and apparatus for evaluating quality of non-reference image
EP3422254A1 (en) * 2017-06-29 2019-01-02 Samsung Electronics Co., Ltd. Method and apparatus for separating text and figures in document images
CN110400307A (en) * 2019-07-29 2019-11-01 青岛大学 A kind of screen picture method for evaluating quality based on area differentiation
CN111047618A (en) * 2019-12-25 2020-04-21 福州大学 Multi-scale-based non-reference screen content image quality evaluation method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8515171B2 (en) * 2009-01-09 2013-08-20 Rochester Institute Of Technology Methods for adaptive and progressive gradient-based multi-resolution color image segmentation and systems thereof

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107123122A (en) * 2017-04-28 2017-09-01 深圳大学 Non-reference picture quality appraisement method and device
WO2018195891A1 (en) * 2017-04-28 2018-11-01 深圳大学 Method and apparatus for evaluating quality of non-reference image
EP3422254A1 (en) * 2017-06-29 2019-01-02 Samsung Electronics Co., Ltd. Method and apparatus for separating text and figures in document images
CN108596906A (en) * 2018-05-10 2018-09-28 嘉兴学院 It is a kind of to refer to screen image quality evaluating method entirely based on sparse locality preserving projections
CN110400307A (en) * 2019-07-29 2019-11-01 青岛大学 A kind of screen picture method for evaluating quality based on area differentiation
CN111047618A (en) * 2019-12-25 2020-04-21 福州大学 Multi-scale-based non-reference screen content image quality evaluation method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于稀疏表示特征差异性的屏幕图像质量评估;赵红梦等;青岛大学学报(自然科学版);第第32卷卷(第第4期期);全文 *

Also Published As

Publication number Publication date
CN113610862A (en) 2021-11-05

Similar Documents

Publication Publication Date Title
CN113610862B (en) Screen content image quality assessment method
Li et al. Blind image quality assessment using statistical structural and luminance features
Venkatanath et al. Blind image quality evaluation using perception based features
US10339643B2 (en) Algorithm and device for image processing
CN108921800B (en) Non-local mean denoising method based on shape self-adaptive search window
CN111612741B (en) Accurate reference-free image quality evaluation method based on distortion recognition
CN110807757B (en) Image quality evaluation method and device based on artificial intelligence and computer equipment
CN109978848B (en) Method for detecting hard exudation in fundus image based on multi-light-source color constancy model
Zhang et al. Opinion-unaware blind quality assessment of multiply and singly distorted images via distortion parameter estimation
Zheng et al. No-reference quality assessment for screen content images based on hybrid region features fusion
CN110717922A (en) Image definition evaluation method and device
CN113781510B (en) Edge detection method and device and electronic equipment
Tian et al. Quality assessment of DIBR-synthesized views: An overview
Bahrami et al. A novel approach for partial blur detection and segmentation
Rajevenceltha et al. An efficient approach for no-reference image quality assessment based on statistical texture and structural features
CN108830829B (en) Non-reference quality evaluation algorithm combining multiple edge detection operators
CN102609903B (en) A kind of method of the movable contour model Iamge Segmentation based on marginal flow
Freitas et al. Blind image quality assessment based on multiscale salient local binary patterns
Alaei et al. Document image quality assessment based on improved gradient magnitude similarity deviation
Jakhetiya et al. Perceptually unimportant information reduction and Cosine similarity-based quality assessment of 3D-synthesized images
Feng et al. Low-light image enhancement algorithm based on an atmospheric physical model
CN117635615B (en) Defect detection method and system for realizing punching die based on deep learning
CN113837965B (en) Image definition identification method and device, electronic equipment and storage medium
Heydari et al. A low complexity wavelet-based blind image quality evaluator
Alaei A new document image quality assessment method based on hast derivations

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant