WO2018058090A1 - Method for no-reference image quality assessment - Google Patents

Method for no-reference image quality assessment Download PDF

Info

Publication number
WO2018058090A1
WO2018058090A1 PCT/US2017/053393 US2017053393W WO2018058090A1 WO 2018058090 A1 WO2018058090 A1 WO 2018058090A1 US 2017053393 W US2017053393 W US 2017053393W WO 2018058090 A1 WO2018058090 A1 WO 2018058090A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
electronic image
computing
values
data structure
Prior art date
Application number
PCT/US2017/053393
Other languages
French (fr)
Inventor
Dapeng Oliver Wu
Ruigang FANG
Original Assignee
University Of Florida Research Foundation Incorporated
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University Of Florida Research Foundation Incorporated filed Critical University Of Florida Research Foundation Incorporated
Publication of WO2018058090A1 publication Critical patent/WO2018058090A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30168Image quality inspection

Definitions

  • the present disclosure is generally related to the field of image quality assessment for electronic images processed in computing devices.
  • both video and image media comprising electronic images have become a widely popular form of internet traffic and are displayed on a variety of electronic devices to media consumers.
  • a media consumer's experience during the consumption of video/image media can be negatively impacted by distortions induced by compression and/or transmission losses over the internet or other computer networks.
  • the media consumer's experience of viewing video/image media may also be impacted by characteristics of the electronic device displaying the media to the consumer, such as display pixel resolution and range of color gamut.
  • a source image is available, or other image is available as a reference of what the electronic image is supposed to look like without the effects of compression, transmission or other processes that impact image quality
  • an automated assessment of image quality for the electronic image might be made by comparing the electronic image with the reference image.
  • a no-reference image quality assessment may be performed to assess the quality of the electronic image.
  • no-reference image quality assessment techniques have not adequately reflected the image quality as perceived by a user.
  • a computer-implemented method for assessing quality of an electronic image comprises computing values for a plurality of attributes of the electronic image; selecting a plurality of entries from a computer data structure. Each entry of the data structure comprises values of the plurality of attributes and a score. The selecting comprises selecting based on a vector difference of the values of the plurality of attributes of the electronic image relative to the values of the plurality of attributes of entries in the data structure. The method further comprises computing a quality score for the electronic image as a combination of the scores of the selected entries from the data structure.
  • a non-transitory computer readable medium comprising computer readable instructions that when executed by a processor, cause the processor to perform a method.
  • the method comprises the acts of computing values for a plurality of attributes of the electronic image; selecting a plurality of entries from a computer data structure. Each entry of the data structure comprises values of the plurality of attributes and a score.
  • the selecting comprises selecting based on a vector difference of the values of the plurality of attributes of the electronic image relative to the values of the plurality of attributes of entries in the data structure.
  • the method further comprises computing a quality score for the electronic image as a combination of the scores of the selected entries from the data structure.
  • FIG. 1A is an image
  • FIG. IB is a data plot illustrating the Laplace distribution of the image of FIG. 1A;
  • FIG. 2A is an image
  • FIGs. 2B, 2C are data plots illustrating two sub-image Laplace distributions of the image of FIG. 2A;
  • FIG. 3A is an image
  • FIGs. 3B are data plots illustrating the relationship between pixel distance and the variance of the Laplace distribution for the image of FIG. 3A;
  • FIG. 4A shows an illustrative image with a 3x3 window with variable names denoting the image intensity at the corresponding positions;
  • FIG. 4B shows a data plot illustrating Laplace distribution showing a threshold and the probability of being over the threshold;
  • FIGs. 5A-C show data plots illustrating the relationship between blurriness and three features
  • FIG. 6 shows a schematic diagram illustrating the relationship between frame sampling structure for vertical direction
  • FIG. 7 shows a data plot illustrating the relationship between blockiness feature value and blockiness percentage
  • FIGs. 8A-C are data plots illustrating the comparison between clear and distorted images
  • FIGs 9A-D are exemplary images with Gaussian blur
  • FIGs 10A-D are exemplary images with white noise
  • FIGs. 11A-D are exemplary images with block artifacts
  • FIG. 12 shows a data plot illustrating BNB performance for LIVE image database
  • FIG. 13 is a schematic diagram of an illustrative computer 5000 on which any aspect of the present disclosure may be implemented.
  • Quality scores as computed herein may accurately reflect human perception of the quality of an electronic image. As these techniques can be performed without a reference image, they may be applied in settings in which arbitrary images are processed, including many settings in which images are transmitted for storage or display in a computer network, such as the Internet.
  • the image quality scores may be used, for example, to automatically select parameters of image processing or display in a way that reduces computer resources without unacceptably degrading image quality as perceived by a human viewer of the images.
  • a plurality of image quality attributes for a given image can be quantified to create a vector of metrics for the attributes.
  • the vector of metrics may be used to select entries from a data structure storing multiple examples of those metrics linked to human perception scores.
  • the selected entries may be used to compute a quality score representing a human perception of quality of the given image.
  • the entries in the data set may be precomputed based on a set of images, such as a known library of images used in image processing research, but do not have to represent the same scene as the electronic image being processed. Accordingly, the quality score may be computed without the use of a reference image for the electronic image being processed.
  • the process of deriving an image quality score may be applied in settings in which automated processing is desirable.
  • the data structure may act as a "codebook" for processing other images, which are potentially unrelated to the images used to create the codebook.
  • the codebook may be constructed using a library of images with known human perception scores.
  • the codebook may be a data structure with a plurality of entries, each entry comprising a vector of metrics for a plurality of image quality attributes as well as a human perception score assigned to a sample image.
  • the codebook may be constructed using a library of popular images as sample images. Each entry in the codebook comprises a vector of metrics for a sample image in the library and a human perception score obtained by rating the sample image with real humans.
  • each entry in the codebook need not correspond to any single sample image.
  • each entry in the codebook may reflect an average of multiple sample images that were distorted to produce the same metrics of image quality.
  • the codebook may be created heuristically based on observations of human sensitivity to variations in the metrics.
  • the codebook may be constructed to reflect conditions under which the image is to be presented to a human viewer.
  • the human perception score for each sample image in the library may correspond substantially to the sample image displayed on a particular class of electronic devices.
  • the codebook may be constructed by rating a library of images on smartphone screens, high resolution flat-panel TVs, or computer monitors to reflect human perception scores when viewed on the respective class of devices, and different codebooks may be used in different automated systems, depending on the intended display format or, in some embodiments, a codebook may be selected automatically based on hardware or other configuration information obtained from a device to display an image.
  • a vector of metrics for a plurality of attributes is computed for the electronic image and compared with the vectors for entries in the codebook. Entries from the codebook may be selected for further processing based on similarity between the vector for the reference image and vectors for entries in the codebook.
  • a vector distance is computed between the vector of the electronic image the vector of entries in the codebook.
  • a number of "nearest neighbor" entries from the codebook with vector distances smaller than a certain threshold may be selected as they represent a group of sample images most similar to the electronic image.
  • the human perception scores of the selected codebook entries may then be used to synthesize a human perception score for the electronic image that most represent how a human user would perceive the quality of the electronic image.
  • the human perception score for the electronic image may be computed by a weighted average of human perception scores of the selected codebook entries.
  • a higher weight is assigned to a codebook entry with a smaller vector distance to the electronic image, such that more similar sample images are given more weight in the comparison.
  • the vector distance may be computed in any suitable fashion, such as an
  • the vector distance is a weighted Euclidean distance.
  • any image quality attribute may be used in assessing image quality according to aspects of the present application.
  • quantitative measurements for specific artifacts that affect image quality may be used.
  • the values for each of blurriness, noisiness and blockiness (hereinafter also referred to as "BNB”) measurements may be used to form the vector of metrics to assess the quality of a given electronic image using a codebook of vectors of BNB metrics and human perception scores of known sample images, according to the sections discussed above.
  • the BNB metrics quantifies the blurriness, noisiness and blockiness of a given image, which are considered three critical factors affecting users' quality of experience (QoE).
  • a metric for each BNB artifact features for each type of artifact are first extracted from the changing Laplace distribution, and then the quantitative relationship between the feature value and the variation of the artifact is identified. This method is rooted in the observation that for any image the difference between any two adjacent pixel values follows a generalized Laplace distribution with zero mean. This Laplace distribution changes differently when the image experiences various types of artifacts such as BNB.
  • a k-Nearest Neighbors algorithm (k-NN) is used to map a vector of three BNB metrics of an electronic image to a human perception score.
  • the computation of BNB metrics and the k-NN approach require less computation as compared with more complex no-reference image quality assessment methods and may lead to cost savings in terms of reduced requirement of processor computing resources.
  • the human perception score may be used to indicate the quality of a given image displayed on an electronic device as perceived by an end user of the electronic device.
  • the given image may be part of media transmitted via teh internet for consumption by the end user, such as a still image of a video.
  • the no- reference human perception score assessed from the image may be used in real time by the media supplier to gauge the quality of media delivered to the end user. For example a media supplier may adjust encoding parameters and/or increase transmission data rate if image quality is poor. In another example, a media supplier may determine that the image quality as perceived by the end-user on a portable electronic device with a low resolution display is sufficiently high and proceed to reduce media transmission data rate by e.g. down sampling to a lower resolution to save related bandwidth and storage costs without affecting the user's experience.
  • quality scores as described herein may be computed by a sending computing device preparing information for transmission to a receiving device.
  • the sending device may use a codebook selected based on information about the display conditions sent by the receiving device to compute the quality scores.
  • an image quality score may be computed by the receiving device, and transmitted to the sending device for use by the sending device in selecting parameters of the images to be sent.
  • the receiving device may select the image parameters from the quality score and send these parameters instead of or in addition to the quality score or may use the quality scores to otherwise control the display of images.
  • the techniques introduced above and discussed in greater detail below may be implemented in any of numerous ways, as the techniques are not limited to any particular manner of implementation. Examples of details of implementation are provided herein solely for illustrative purposes.
  • the techniques disclosed herein may be used individually or in any suitable combination, as aspects of the present disclosure are not limited to the use of any particular technique or combination of techniques.
  • Decoded frame assessment offers three different methods to judge the image quality based on the availability of the original image: full-reference (FR), reduced- reference (RR), and no-reference (NR).
  • FR methods have access to the original image, which may provide a means to offer certain connections to human visual perception using mean squared error (MSE), peak signal to noise ratio (PSNR), or structural similarity index (SSIM)[17].
  • MSE mean squared error
  • PSNR peak signal to noise ratio
  • SSIM structural similarity index
  • RRED Reduced reference entropic differencing
  • NR handle the instances where information regarding the original image is unavailable. With the rise of scenarios that do not offer a mechanism for access to information from the original video/image, no-reference image quality assessment is both an essential and urgently needed technique.
  • AS methods are based on the assumption that distortions of an image are caused by specific artifacts; hence it is straightforward to take a divide-and-conquer approach, i.e., modeling the effect of each individual artifact on image quality and combining the effects of individual artifacts into a single image quality score (the key idea behind all AS methods). Therefore, the advantage of AS methods lies in directly identifying and quantifying the physical causes of image distortion. However, existing AS methods are not able to characterize the complicated interactions among multiple artifacts. These AS algorithms may perform well if a test image experiences only one type of artifact, but in reality an image may experience a mixture of multiple artifacts.
  • NAS methods are inherently independent of specific types of artifacts since they derive features from different kinds of transformed domains, such as Wavelet[26], DCT [16], Spatial[l l], Curvelet [7], and Gradient [24], which are all non- artifact- specific. In most cases the features are entropies, or statistic parameters of the transformed coefficients.
  • NAS methods utilize more complex projection techniques when transforming from feature vectors to quality scores, such as
  • SVM Support Vector Regression
  • NNR Neural Network Regression
  • IQA image quality assessment
  • NAS methods have recently become the forefront of image quality assessment.
  • the BIQI method was proposed, which is a two-step no-reference image quality assessment framework. Given a distorted image, the first step performs the wavelet transform and extracts features for estimation of the presence of a set of distortions which include those introduced by JPEG, JPEG2000, white noise, Gaussian blur, and fast fading. The probability of each distortion in the image is then estimated.
  • This first step is considered a classification step.
  • the second step evaluates the quality of the image across each of these distortions by applying support vector regression on the wavelet coefficients.
  • BIQI considers image distortion, the features it uses are derived from NSS.
  • SSEQ [8], CurveletQA [7], and DIIVINE [14] also utilize the same type of two-step framework as described in BIQI, however the features used for each are from the spectral entropy and local spatial domain, curvelet domain, and wavelet domain, respectively.
  • a method proposed in [11], BRISQUE derives features from the empirical distribution of locally normalized luminance values and their products under a spatial natural scene statistic model. These features are then used in support vector regression to map image features to an image quality score.
  • BRISQUE belongs to a one-step framework which does not require distortion classification.
  • Other methods such as those in both [16] and [24] are similar to BRISQUE in that regard.
  • the major difference between these 3 one-step methods is the feature space.
  • the authors extract features from the Discrete
  • a NR scheme which combines the best features of both the AS and NAS methods. This is accomplished by developing three artifact- specific metrics and nonlinearly combining them.
  • the effect of a single type of artifact (i.e., blockiness, blurriness, or noisiness) on the Laplace distribution is similar and independent of image content.
  • BNB metrics blockiness, noisiness, and blurriness
  • HVS human visual system
  • Section 2 of this disclosure explores several key properties of the Laplace distribution for natural scene images, which are the basis for the design of our method.
  • Section 3 we describe the three BNB metrics.
  • Section 4 we verify our metrics using two aspects of experimentation and provide detailed results. The algorithm developed and used to perform supervised learning to predict the perceptual value is discussed in Section 5.
  • Section 7 concludes this paper with a discussion of final remarks.
  • Property 1 For a difference set, D, of any natural scene image, the statistical properties of any two equally down-sampled sub-sets, D 1 and D 2 , have the same statistical properties.
  • FIG. 2 displays an experimental result showing the original image in FIG. 2A and the Laplace distributions of set / 0 and set / t having variances 130.56 and 131.05, respectively in FIG. 2B.
  • Property 2 For any two pixels of a natural scene image, the difference in pixel values follows a Laplace distribution that is related to the spatial distance between the pixels; an increased distance corresponds to a larger variance in the Laplace distribution.
  • FIG. 3B visualizes the Laplace distributions of some such increases in the spatial distances between pixels, d .
  • Property 3 After convolving a natural scene image with a low-pass filter, f x , the difference of values of the same pixel in the original image and the processed image will also follow a Laplace distribution.
  • f x as a simple low-pass filter which can both lessen Gaussian noise and blur an image. Processing an image using this filter causes the variance of the difference of two adjacent pixel values to decrease.
  • P a threshold in the distribution and define P as the probability of the difference of adjacent pixel values being larger than the threshold
  • P becomes smaller when the image is convolved with f x .
  • processing / with f x creates x 0 .
  • the difference between x 0 and x 0 follows a new Laplace distribution with zero mean and variance of approximately—
  • Property 4 By processing a natural scene image with a high pass filter f 2 , any pixel value from the processed image will also follow a Laplace distribution. 0 - 1 0
  • f 2 is another very important tool which can be used to find high frequency content of images.
  • the pixel x i is labeled x after filtering the image / with f 2 .
  • the value of x 0 will follow another Laplace distribution with zero mean and variance— .
  • V As can be observed in FIG. 5a, although different image content produces various specific relationships between V and blurriness, we notice a regularity consisting of an increasing blurriness accompanied by a decreasing V . Having identified this regularity, we use V as a feature to model the blurriness of images containing the same content, since it does not necessarily differentiate blurriness well for images with different content. To provide a more robust feature for handling different image content, we adopt an alternative feature: V—V i . Given any image / , we blur using f x to obtain the blurred image I x . We form V 1 by subsequently filtering I x with f 2 and calculating the variance of the resulting pixel values.
  • V—V 1 is a better blurriness feature than V since it reduces the scale along the feature axis, however it is still not robust to image content. Normalizing V—V 1 by V , we obtain our desired blurriness feature, ⁇ ⁇ , as defined in (7).
  • FIG. 5c A visualization of this feature is shown in FIG. 5c.
  • the curves representing different image content have a more regular relationship between blurriness and our blurriness feature. This denser coupling signifies that this feature can help alleviate the image content issue discussed earlier.
  • Section 4 we use the LIVE image database to show that our feature provides a better characterization of blurriness. Independent of the content of an image, we observe a decrease in our blurriness feature, ⁇ ⁇ , as the blurriness of an image increases.
  • V the variance of the coefficients, V , by processing a noisy image / with filter f 2 . Further processing / with filter f x we get / j , a denoised version of / .
  • V 1 is calculated as the variance of the pixel values after processing I x with f 2 . Since I x has less noise than image / , V will be larger than V x .
  • V—V 1 as a noise feature, which maintains a good response to noise for images with the same content. Experimentation verifies that the larger the value of V—V x , the noisier the image. For images containing different content, the value of V— V 1 will not always be the same even though images may have the same noisiness or level of human perception. To address this problem, we apply normalization to obtain (8).
  • Blockiness appears at a block boundary as a byproduct of encoding, decoding, or transmission. If there appears to be block-like artifacts in a frame, Property 1 states that the statistical relationship between two adjacent pixels in the same block will be different than that of two adjacent pixels from different blocks. To make use of this statistical property, the image is partitioned into b s xb s blocks and sampled in the horizontal and vertical directions as shown in (9) and (10). Methods for constructing these two types of down- sampling are shown below.
  • the dark symbols inside a grid correspond to pixels in the resulting sampled sub- images.
  • the different symbols correspond to different sub-images.
  • Another two data sets D 1 and D 2 can be obtained by taking the difference of sub-images s 7 and s 6 and the difference of sub-images s 0 and s 7 , respectively. If blockiness is not present in the image, the pixel values of the data sets D 1 and D 2 should follow a similar Laplace distribution. If we set the same threshold in two Laplace distributions, ⁇ ⁇ ) represents the numbers of pixels which are larger than the threshold in D i . It is apparent that the values of ⁇ ⁇ ) and 3 ⁇ 4 v) should be close for a non-blocky image.
  • is introduced as a tuning parameter to allow f Mockiness to be tailored for a variety of situations and is chosen to be 1 in our experimentation.
  • the value of f blockiness should be close to one.
  • Introducing blockiness into a frame will increase the value of f Mockiness ⁇
  • the five curves represent five different images.
  • we randomly add different percentages of blockiness, which we call ⁇ 3 while measuring the value of f hlockiness .
  • f hlockiness increases with respect to increases in ⁇ .
  • the relationship between f Mockiness and ⁇ is modeled as a quadratic function, as shown in (12).
  • a good distortion metric should offer a delineation between clear and distorted images.
  • For the distorted images we employ Gaussian blur, white noise, and JPEG images from the LIVE image database which correspond to blurriness C ), noisiness ⁇ J 2 ) an d blockiness ( ⁇ 3 ) metrics, respectively.
  • the metric values are calculated for both clear and distorted images and displayed in FIG. 8.
  • the dashed line is the density of ⁇ ⁇ for 85 clear images
  • the solid line is the density of ⁇ ⁇ for the same number of Gaussian blur images.
  • These two densities are estimated as a Gaussian kernel by using the two ⁇ ⁇ histograms of clear and Gaussian blur images, respectively.
  • the overlap of the two densities is relatively small, so our blurriness metric is a good candidate to classify between clear and blurry images.
  • Table 4 Comparison of Srocc for three metrics [0108] In Table 4, we calculate the metric values for all blurry, noisy and blocky images from the LIVE image database and obtain their Spearman Rank Order correlation (Srocc). Mylene[4] tried to measure the same distortions: noisiness, blockiness, and blurriness, so we compare the Srocc for metrics from our model with Mylene's model. Our comparison reveals that our metrics are more correlated with human perception.
  • Codebook Construction One element of the codebook is a vector including four values: ⁇ ⁇ , ⁇ 2 , ⁇ 3 and perceptual values denoted as ( , j , C i 2 , C i 3 , C, 4 ).
  • the codebook model there are four important parameters: p , q , r , and k .
  • the first three parameters represent the distance weights for the three artifact types, and the fourth is the number of nearest neighbors used for prediction.
  • FIG. 13 shows, schematically, an illustrative computer 5000 on which any aspect of the present disclosure may be implemented.
  • the computer 5000 includes a processing unit 5001 having one or more processors and a non-transitory computer-readable storage medium 5002 that may include, for example, volatile and/or non-volatile memory.
  • the memory 5002 may store one or more instructions to program the processing unit 5001 to perform any of the functions described herein.
  • the computer 5000 may also include other types of non-transitory computer-readable medium, such as storage 5005 (e.g., one or more disk drives) in addition to the system memory 5002.
  • storage 5005 may also store one or more application programs and/or external components used by application programs (e.g., software libraries), which may be loaded into the memory 5002.
  • the computer 5000 may have one or more input devices and/or output devices, such as devices 5006 and 5007 illustrated in FIG. 13. These devices can be used, among other things, to present a user interface. Examples of output devices that can be used to provide a user interface include printers or display screens for visual presentation of output and speakers or other sound generating devices for audible presentation of output. Examples of input devices that can be used for a user interface include keyboards and pointing devices, such as mice, touch pads, and digitizing tablets. As another example, the input devices 5007 may include a microphone for capturing audio signals, and the output devices 5006 may include a display screen for visually rendering, and/or a speaker for audibly rendering, recognized text.
  • input devices 5006 and 5007 illustrated in FIG. 13 These devices can be used, among other things, to present a user interface. Examples of output devices that can be used to provide a user interface include printers or display screens for visual presentation of output and speakers or other sound generating devices for audible presentation of output. Examples of input devices that can be used for
  • the computer 5000 may also comprise one or more network interfaces (e.g., the network interface 5010) to enable communication via various networks (e.g., the network 5020).
  • networks include a local area network or a wide area network, such as an enterprise network or the Internet.
  • Such networks may be based on any suitable technology and may operate according to any suitable protocol and may include wireless networks, wired networks or fiber optic networks.
  • the quality score is computed for an image to be displayed.
  • a collection of similar images may be displayed on the same device, such as may occur, for example, when a stream of images is displayed as a video.
  • the quality score may be computed for one or more images in the collection.
  • Those quality scores may be used to select parameters of image processing or display, which may be applied to the collection of images.
  • processors may be implemented as integrated circuits, with one or more processors in an integrated circuit component, including commercially available integrated circuit components known in the art by names such as CPU chips, GPU chips, microprocessor, microcontroller, or co-processor.
  • a processor may be implemented in custom circuitry, such as an ASIC, or semicustom circuitry resulting from configuring a programmable logic device.
  • a processor may be a portion of a larger circuit or semiconductor device, whether commercially available, semi-custom or custom.
  • some commercially available microprocessors have multiple cores such that one or a subset of those cores may constitute a processor.
  • a processor may be implemented using circuitry in any suitable format.
  • a computer may be embodied in any of a number of forms, such as a rack-mounted computer, a desktop computer, a laptop computer, or a tablet computer. Additionally, a computer may be embedded in a device not generally regarded as a computer but with suitable processing capabilities, including a Personal Digital Assistant (PDA), a smart phone or any other suitable portable or fixed electronic device.
  • PDA Personal Digital Assistant
  • a computer may have one or more input and output devices. These devices can be used, among other things, to present a user interface. Examples of output devices that can be used to provide a user interface include printers or display screens for visual presentation of output and speakers or other sound generating devices for audible presentation of output. Examples of input devices that can be used for a user interface include keyboards, and pointing devices, such as mice, touch pads, and digitizing tablets. As another example, a computer may receive input information through speech recognition or in other audible format. In the embodiment illustrated, the input/output devices are illustrated as physically separate from the computing device. In some embodiments, however, the input and/or output devices may be physically integrated into the same unit as the processor or other elements of the computing device. For example, a keyboard might be implemented as a soft keyboard on a touch screen. Alternatively, the input/output devices may be entirely disconnected from the computing device, and functionally integrated through a wireless connection.
  • Such computers may be interconnected by one or more networks in any suitable form, including as a local area network or a wide area network, such as an enterprise network or the Internet.
  • networks may be based on any suitable technology and may operate according to any suitable protocol and may include wireless networks, wired networks or fiber optic networks.
  • the various methods or processes outlined herein may be coded as software that is executable on one or more processors that employ any one of a variety of operating systems or platforms. Additionally, such software may be written using any of a number of suitable programming languages and/or programming or scripting tools, and also may be compiled as executable machine language code or intermediate code that is executed on a framework or virtual machine.
  • the invention may be embodied as a computer readable storage medium (or multiple computer readable media) (e.g., a computer memory, one or more floppy discs, compact discs (CD), optical discs, digital video disks (DVD), magnetic tapes, flash memories, circuit configurations in Field Programmable Gate Arrays or other semiconductor devices, or other tangible computer storage medium) encoded with one or more programs that, when executed on one or more computers or other processors, perform methods that implement the various embodiments of the invention discussed above.
  • a computer readable storage medium may retain information for a sufficient time to provide computer-executable instructions in a non-transitory form.
  • Such a computer readable storage medium or media can be transportable, such that the program or programs stored thereon can be loaded onto one or more different computers or other processors to implement various aspects of the present invention as discussed above.
  • the term "computer-readable storage medium” encompasses only a computer-readable medium that can be considered to be a manufacture (i.e., article of manufacture) or a machine.
  • the invention may be embodied as a computer readable medium other than a computer-readable storage medium, such as a propagating signal.
  • code means any type of computer code or set of computer-executable instructions that can be employed to program a computer or other processor to implement various aspects of the present invention as discussed above. Additionally, it should be appreciated that according to one aspect of this embodiment, one or more computer programs that when executed perform methods of the present invention need not reside on a single computer or processor, but may be distributed in a modular fashion amongst a number of different computers or processors to implement various aspects of the present invention.
  • Computer-executable instructions may be in many forms, such as program modules, executed by one or more computers or other devices.
  • program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types.
  • functionality of the program modules may be combined or distributed as desired in various embodiments.
  • data structures may be stored in computer-readable media in any suitable form.
  • data structures may be shown to have fields that are related through location in the data structure. Such relationships may likewise be achieved by assigning storage for the fields with locations in a computer-readable medium that conveys relationship between the fields.
  • any suitable mechanism may be used to establish a relationship between information in fields of a data structure, including through the use of pointers, tags or other mechanisms that establish relationship between data elements.
  • the invention may be embodied as a method, of which an example has been provided.
  • the acts performed as part of the method may be ordered in any suitable way. Accordingly, embodiments may be constructed in which acts are performed in an order different than illustrated, which may include performing some acts simultaneously, even though shown as sequential acts in illustrative embodiments.
  • a reference to "A and/or B", when used in conjunction with open-ended language such as “comprising” can refer, in one embodiment, to A only (optionally including elements other than B); in another embodiment, to B only (optionally including elements other than A); in yet another embodiment, to both A and B (optionally including other elements); etc.
  • the phrase "at least one,” in reference to a list of one or more elements, should be understood to mean at least one element selected from any one or more of the elements in the list of elements, but not necessarily including at least one of each and every element specifically listed within the list of elements and not excluding any combinations of elements in the list of elements.
  • At least one of A and B can refer, in one embodiment, to at least one, optionally including more than one, A, with no B present (and optionally including elements other than B); in another embodiment, to at least one, optionally including more than one, B, with no A present (and optionally including elements other than A); in yet another embodiment, to at least one, optionally including more than one, A, and at least one, optionally including more than one, B (and optionally including other elements); etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Image Analysis (AREA)

Abstract

Aspects of the present disclosure are related to methods for automated image quality assessment without the use of a reference image. In some embodiments, quantitative measurements for specific artifacts that impact image quality such as blurriness, noisiness and blockiness (BNB) metrics are used to form a vector to represent the quality of a given electronic image. Based on this vector, entries in a data structure are selected. In some embodiments, a k-Nearest Neighbors algorithm (k-NN) is used to map the vector of BNB metrics of the electronic image to a human perception score based on vector differences between the quantitative measurements for the electronic image and similar quantitative measurements for known images to which human perception scores have been assigned. The human perception scores for the selected entries in the data set may then be combined to yield a quality score for the electronic image, emulating a quality score that would be assigned by human image evaluators.

Description

METHOD FOR NO-REFERENCE IMAGE QUALITY ASSESSMENT
RELATED APPLICATION
[0001] This application claims priority to and the benefit of U.S. Provisional Patent Application Number 62/399,985, filed September 26, 2016, the entire contents of which are incorporated herein by reference.
BACKGROUND
[0002] The present disclosure is generally related to the field of image quality assessment for electronic images processed in computing devices.
[0003] In recent years, both video and image media comprising electronic images have become a widely popular form of internet traffic and are displayed on a variety of electronic devices to media consumers. A media consumer's experience during the consumption of video/image media can be negatively impacted by distortions induced by compression and/or transmission losses over the internet or other computer networks. The media consumer's experience of viewing video/image media may also be impacted by characteristics of the electronic device displaying the media to the consumer, such as display pixel resolution and range of color gamut.
[0004] If a source image is available, or other image is available as a reference of what the electronic image is supposed to look like without the effects of compression, transmission or other processes that impact image quality, an automated assessment of image quality for the electronic image might be made by comparing the electronic image with the reference image. When there is no access to a source image to act as a reference, a no-reference image quality assessment may be performed to assess the quality of the electronic image. Heretofore, such no-reference image quality assessment techniques have not adequately reflected the image quality as perceived by a user.
SUMMARY
[0005] According to some embodiments, a computer-implemented method for assessing quality of an electronic image is disclosed. The method comprises computing values for a plurality of attributes of the electronic image; selecting a plurality of entries from a computer data structure. Each entry of the data structure comprises values of the plurality of attributes and a score. The selecting comprises selecting based on a vector difference of the values of the plurality of attributes of the electronic image relative to the values of the plurality of attributes of entries in the data structure. The method further comprises computing a quality score for the electronic image as a combination of the scores of the selected entries from the data structure.
[0006] According to some embodiments, a non-transitory computer readable medium comprising computer readable instructions that when executed by a processor, cause the processor to perform a method. The method comprises the acts of computing values for a plurality of attributes of the electronic image; selecting a plurality of entries from a computer data structure. Each entry of the data structure comprises values of the plurality of attributes and a score. The selecting comprises selecting based on a vector difference of the values of the plurality of attributes of the electronic image relative to the values of the plurality of attributes of entries in the data structure. The method further comprises computing a quality score for the electronic image as a combination of the scores of the selected entries from the data structure.
BRIEF DESCRIPTION OF DRAWINGS
[0007] The accompanying drawings are not intended to be drawn to scale. In the drawings, each identical or nearly identical component that is illustrated in various figures is represented by a like numeral. For purposes of clarity, not every component may be labeled in every drawing. In the drawings:
[0008] FIG. 1A is an image;
[0009] FIG. IB is a data plot illustrating the Laplace distribution of the image of FIG. 1A;
[0010] FIG. 2A is an image;
[0011] FIGs. 2B, 2C are data plots illustrating two sub-image Laplace distributions of the image of FIG. 2A;
[0012] FIG. 3A is an image;
[0013] FIGs. 3B are data plots illustrating the relationship between pixel distance and the variance of the Laplace distribution for the image of FIG. 3A;
[0014] FIG. 4A shows an illustrative image with a 3x3 window with variable names denoting the image intensity at the corresponding positions; [0015] FIG. 4B shows a data plot illustrating Laplace distribution showing a threshold and the probability of being over the threshold;
[0016] FIGs. 5A-C show data plots illustrating the relationship between blurriness and three features;
[0017] FIG. 6 shows a schematic diagram illustrating the relationship between frame sampling structure for vertical direction;
[0018] FIG. 7 shows a data plot illustrating the relationship between blockiness feature value and blockiness percentage;
[0019] FIGs. 8A-C are data plots illustrating the comparison between clear and distorted images;
[0020] FIGs 9A-D are exemplary images with Gaussian blur;
[0021] FIGs 10A-D are exemplary images with white noise;
[0022] FIGs. 11A-D are exemplary images with block artifacts;
[0023] FIG. 12 shows a data plot illustrating BNB performance for LIVE image database;
[0024] FIG. 13 is a schematic diagram of an illustrative computer 5000 on which any aspect of the present disclosure may be implemented.
DETAILED DESCRIPTION
[0025] The inventors have recognized and appreciated techniques to improve the automated computation of image quality scores without a reference image. Quality scores as computed herein may accurately reflect human perception of the quality of an electronic image. As these techniques can be performed without a reference image, they may be applied in settings in which arbitrary images are processed, including many settings in which images are transmitted for storage or display in a computer network, such as the Internet. The image quality scores may be used, for example, to automatically select parameters of image processing or display in a way that reduces computer resources without unacceptably degrading image quality as perceived by a human viewer of the images.
[0026] In accordance with some embodiments, a plurality of image quality attributes for a given image can be quantified to create a vector of metrics for the attributes. The vector of metrics may be used to select entries from a data structure storing multiple examples of those metrics linked to human perception scores. The selected entries may be used to compute a quality score representing a human perception of quality of the given image. The entries in the data set may be precomputed based on a set of images, such as a known library of images used in image processing research, but do not have to represent the same scene as the electronic image being processed. Accordingly, the quality score may be computed without the use of a reference image for the electronic image being processed. Moreover, as the image quality attributes may be computed without human input, the process of deriving an image quality score may be applied in settings in which automated processing is desirable.
[0027] The data structure may act as a "codebook" for processing other images, which are potentially unrelated to the images used to create the codebook. According to some aspects of the present application, to accurately reflect human perceptions of image quality, the codebook may be constructed using a library of images with known human perception scores. The codebook may be a data structure with a plurality of entries, each entry comprising a vector of metrics for a plurality of image quality attributes as well as a human perception score assigned to a sample image. For example, the codebook may be constructed using a library of popular images as sample images. Each entry in the codebook comprises a vector of metrics for a sample image in the library and a human perception score obtained by rating the sample image with real humans. However, it should be appreciated that each entry in the codebook need not correspond to any single sample image. In some embodiments, for example, each entry in the codebook may reflect an average of multiple sample images that were distorted to produce the same metrics of image quality. In other embodiments, for example, the codebook may be created heuristically based on observations of human sensitivity to variations in the metrics.
[0028] In some embodiments, the codebook may be constructed to reflect conditions under which the image is to be presented to a human viewer. For example, the human perception score for each sample image in the library may correspond substantially to the sample image displayed on a particular class of electronic devices. For example, the codebook may be constructed by rating a library of images on smartphone screens, high resolution flat-panel TVs, or computer monitors to reflect human perception scores when viewed on the respective class of devices, and different codebooks may be used in different automated systems, depending on the intended display format or, in some embodiments, a codebook may be selected automatically based on hardware or other configuration information obtained from a device to display an image.
[0029] According to some aspects of the present application, to assess the quality of an electronic image, a vector of metrics for a plurality of attributes is computed for the electronic image and compared with the vectors for entries in the codebook. Entries from the codebook may be selected for further processing based on similarity between the vector for the reference image and vectors for entries in the codebook. In some embodiments, a vector distance is computed between the vector of the electronic image the vector of entries in the codebook. A number of "nearest neighbor" entries from the codebook with vector distances smaller than a certain threshold may be selected as they represent a group of sample images most similar to the electronic image. The human perception scores of the selected codebook entries may then be used to synthesize a human perception score for the electronic image that most represent how a human user would perceive the quality of the electronic image.
[0030] In some embodiments, the human perception score for the electronic image may be computed by a weighted average of human perception scores of the selected codebook entries. In one example, a higher weight is assigned to a codebook entry with a smaller vector distance to the electronic image, such that more similar sample images are given more weight in the comparison.
[0031] The vector distance may be computed in any suitable fashion, such as an
Euclidean distance. In one example, the vector distance is a weighted Euclidean distance.
[0032] The inventors have recognized and appreciated that any image quality attribute may be used in assessing image quality according to aspects of the present application. In some embodiments, quantitative measurements for specific artifacts that affect image quality may be used. In one non-limiting example, the values for each of blurriness, noisiness and blockiness (hereinafter also referred to as "BNB") measurements may be used to form the vector of metrics to assess the quality of a given electronic image using a codebook of vectors of BNB metrics and human perception scores of known sample images, according to the sections discussed above. The BNB metrics quantifies the blurriness, noisiness and blockiness of a given image, which are considered three critical factors affecting users' quality of experience (QoE). [0033] In some embodiments, to construct a metric for each BNB artifact, features for each type of artifact are first extracted from the changing Laplace distribution, and then the quantitative relationship between the feature value and the variation of the artifact is identified. This method is rooted in the observation that for any image the difference between any two adjacent pixel values follows a generalized Laplace distribution with zero mean. This Laplace distribution changes differently when the image experiences various types of artifacts such as BNB.
[0034] As a specific example, starting with a data structure storing human perception scores of a popular image database with corresponding metrics for images in that database, a k-Nearest Neighbors algorithm (k-NN) is used to map a vector of three BNB metrics of an electronic image to a human perception score. The computation of BNB metrics and the k-NN approach require less computation as compared with more complex no-reference image quality assessment methods and may lead to cost savings in terms of reduced requirement of processor computing resources.
[0035] The human perception score may be used to indicate the quality of a given image displayed on an electronic device as perceived by an end user of the electronic device. In some applications, the given image may be part of media transmitted via teh internet for consumption by the end user, such as a still image of a video. The no- reference human perception score assessed from the image may be used in real time by the media supplier to gauge the quality of media delivered to the end user. For example a media supplier may adjust encoding parameters and/or increase transmission data rate if image quality is poor. In another example, a media supplier may determine that the image quality as perceived by the end-user on a portable electronic device with a low resolution display is sufficiently high and proceed to reduce media transmission data rate by e.g. down sampling to a lower resolution to save related bandwidth and storage costs without affecting the user's experience.
[0036] Accordingly, quality scores as described herein may be computed by a sending computing device preparing information for transmission to a receiving device. The sending device may use a codebook selected based on information about the display conditions sent by the receiving device to compute the quality scores. Alternatively or additionally, an image quality score may be computed by the receiving device, and transmitted to the sending device for use by the sending device in selecting parameters of the images to be sent. In some embodiments, the receiving device may select the image parameters from the quality score and send these parameters instead of or in addition to the quality score or may use the quality scores to otherwise control the display of images. However, it should be appreciated that the techniques introduced above and discussed in greater detail below may be implemented in any of numerous ways, as the techniques are not limited to any particular manner of implementation. Examples of details of implementation are provided herein solely for illustrative purposes. Furthermore, the techniques disclosed herein may be used individually or in any suitable combination, as aspects of the present disclosure are not limited to the use of any particular technique or combination of techniques.
[0037] 1 Introduction
[0038] The two kinds of image quality assessment methods are subjective and objective. For subjective methods, image quality is evaluated by human observers. While this method can offer accurate scoring, it is noisy, expensive, and can be time-consuming to acquire. These drawbacks make this method impractical for real-time applications. For objective methods, the goal is to provide computational models which automatically predict perceptual image quality. Typically, there are two main approaches in conducting objective assessments: content inspection and network-based measurement[22] . The former operates on the decoded content and can be used to design metrics ranging from simple pixel-to-pixel comparisons to sophisticated Human Visual System (HVS) frame level artifact analysis. The latter aims to predict the multimedia quality level based on the information gathered from the network conditions and packets without accessing the decoded video.
[0039] Decoded frame assessment offers three different methods to judge the image quality based on the availability of the original image: full-reference (FR), reduced- reference (RR), and no-reference (NR). FR methods have access to the original image, which may provide a means to offer certain connections to human visual perception using mean squared error (MSE), peak signal to noise ratio (PSNR), or structural similarity index (SSIM)[17]. Reduced reference entropic differencing (RRED) [20] is one example of an RR assessment scheme which only has access to partial information from the original image [1]. Since reference image information is often not available, both FR and RR methods have limited application range. NR methods handle the instances where information regarding the original image is unavailable. With the rise of scenarios that do not offer a mechanism for access to information from the original video/image, no-reference image quality assessment is both an essential and urgently needed technique.
[0040] No-reference techniques have been researched in the literature. These methods can be classified into two categories: 1) Artifact-Specific (AS) methods that measure the effect of specific artifacts such as blockiness [21], blurriness [10], noisiness [15], or ringing [25] on image quality, and 2) Non-Artifact-Specific (NAS) methods that do not measure the effect of specific artifacts on image quality. NAS methods are based on the idea of Natural Scene Statistics (NSS), which assumes that natural (undistorted) images occupy a small subspace of the space of all possible images. Using NSS, the quality of a test image can be represented by modeling its distance to the subspace of natural images.
[0041] AS methods are based on the assumption that distortions of an image are caused by specific artifacts; hence it is straightforward to take a divide-and-conquer approach, i.e., modeling the effect of each individual artifact on image quality and combining the effects of individual artifacts into a single image quality score (the key idea behind all AS methods). Therefore, the advantage of AS methods lies in directly identifying and quantifying the physical causes of image distortion. However, existing AS methods are not able to characterize the complicated interactions among multiple artifacts. These AS algorithms may perform well if a test image experiences only one type of artifact, but in reality an image may experience a mixture of multiple artifacts. In the Human Vision System, image quality perception is effected by nonlinear interactions among multiple artifacts. Since existing AS methods use a linear weighted sum to combine multiple artifact metrics into one single quality score [3], their performance is not satisfactory in characterizing these nonlinear relationships.
[0042] In contrast, NAS methods are inherently independent of specific types of artifacts since they derive features from different kinds of transformed domains, such as Wavelet[26], DCT [16], Spatial[l l], Curvelet [7], and Gradient [24], which are all non- artifact- specific. In most cases the features are entropies, or statistic parameters of the transformed coefficients. After feature extraction, NAS methods utilize more complex projection techniques when transforming from feature vectors to quality scores, such as
Support Vector Regression (SVM) and Neural Network Regression (NNR) [9], as opposed to linear weighted-sum methods. Recently, NAS methods are excelling in image quality assessment (IQA) due to their superior performance on several popular image databases. However, it is still unknown as to whether or not these NAS techniques would work well in other instances, since there is no evidence that the extracted features describe the image space completely. Without a complete description of the total image space, NSS methods still perform with uncertainty.
[0043] NAS methods have recently become the forefront of image quality assessment. In [13], the BIQI method was proposed, which is a two-step no-reference image quality assessment framework. Given a distorted image, the first step performs the wavelet transform and extracts features for estimation of the presence of a set of distortions which include those introduced by JPEG, JPEG2000, white noise, Gaussian blur, and fast fading. The probability of each distortion in the image is then estimated.
This first step is considered a classification step. The second step evaluates the quality of the image across each of these distortions by applying support vector regression on the wavelet coefficients. Although BIQI considers image distortion, the features it uses are derived from NSS. SSEQ [8], CurveletQA [7], and DIIVINE [14] also utilize the same type of two-step framework as described in BIQI, however the features used for each are from the spectral entropy and local spatial domain, curvelet domain, and wavelet domain, respectively. A method proposed in [11], BRISQUE, derives features from the empirical distribution of locally normalized luminance values and their products under a spatial natural scene statistic model. These features are then used in support vector regression to map image features to an image quality score. Unlike the two-step framework like that of BIQI, BRISQUE belongs to a one-step framework which does not require distortion classification. Other methods such as those in both [16] and [24] are similar to BRISQUE in that regard. The major difference between these 3 one-step methods is the feature space. In [16], the authors extract features from the Discrete
Cosine Transform (DCT) domain, whereas in [24] the authors utilize the joint statistics of two types of commonly used local contrast features: 1) the gradient magnitude (GM) map and 2) the Laplacian of Gaussian (LOG) response. In [6], image patches are taken as input and a Convolutional Neural Network (CNN) model is designed to predict the image quality score. This technique works directly in the spatial domain of the input without the need to hand-craft features, as is the case with most existing methods. A blind image quality assessment model was derived in [12] that only makes use of measurable deviations from statistical regularities observed in natural images, without training on human-rated distorted images or having any exposure to distorted images in general. In summary, various NR methods have different advantages and disadvantages.
[0044] In the present disclosure, a NR scheme is provided which combines the best features of both the AS and NAS methods. This is accomplished by developing three artifact- specific metrics and nonlinearly combining them. First, we employ the Natural Scene Statistics image property that the difference of two adjacent pixel values in an image follows a generalized Laplace distribution with zero mean and variance σ [5]. We observe that although different images may have different values of σ , the effect of a single type of artifact (i.e., blockiness, blurriness, or noisiness) on the Laplace distribution is similar and independent of image content. We leverage this image- content-invariant property to design metrics for three types of artifacts: blockiness, noisiness, and blurriness (hereinafter referred to as BNB metrics), which are considered the three most important types of artifacts induced in image compression and transmission. Second, when combining these three metrics, we abandon the usual inductive curve-fitting approaches, since we do not possess the required information to determine the exact relationship between these three artifacts and the human visual system (HVS) in general. Instead, we apply the transductive k-Nearest Neighbor algorithm to map the three BNB metrics of an image to a human perception score. We apply our scheme to the LIVE image quality assessment database [18, 19]. Our experimental results reveal a high correlation between the quality score obtained by our scheme and the provided subjective quality score.
[0045] Section 2 of this disclosure explores several key properties of the Laplace distribution for natural scene images, which are the basis for the design of our method. In this section, we will use experimental results to show the relationships between the variance of the Laplace distribution and pixel distance, low-pass filter, and high-pass filter, which explain why we use this natural scene property to assess image quality. In Section 3, we describe the three BNB metrics. In Section 4, we verify our metrics using two aspects of experimentation and provide detailed results. The algorithm developed and used to perform supervised learning to predict the perceptual value is discussed in Section 5. We compare our results from the LIVE database with existing models in Section 6. Finally, Section 7 concludes this paper with a discussion of final remarks.
[0046] 2 Laplace distribution characterization
[0047] For any pixel p(i, j) from a natural grayscale image 1 with size m x n , where i on and j < n , we construct two values s + = p(i, j) - p(i, j + 1) and s + = p(i, j) - p(i + 1, j) . A difference set, D , is constructed as the set of all such s .
We gather salient features from this difference set towards developing useful metrics. We observe that for different combinations of i and j , two ί are independent and follow the same Laplace distribution with zero mean and variance σ when i is not equal to j . Separate images can present different statistical properties ( <r in particular). Extensive experimentation supports these observations, with FIG. IB showing one such verification result. There are many properties related to Laplace distribution characterization which are important and useful in assessing image quality. We explore four properties in particular which are the motivation of the design of our metrics and include representative experimental results.
[0048] Property 1 : For a difference set, D, of any natural scene image, the statistical properties of any two equally down-sampled sub-sets, D1 and D2 , have the same statistical properties.
[0049] In order to support this property, the following experiment was designed: [0050] I0 (i ) = I(3i ), z = 0,l,...,LyJ (1)
[0051] Il (i ) = I(3i + l ), / = 0,1 L- J (2)
[0052] I2 (i ) = I(3i + 2 ), / = 0, 1 L— J (3)
[0053] For a given image, / , we down-sample horizontally into 3 sub-images: /0 , Ix , I2 . We obtain these sub-images using (1), (2), and (3). With these sub-images, we create another two data sets /0 = Ι - IQ and I = I2 - I . The statistical properties of these two sets are approximately the same. FIG. 2 displays an experimental result showing the original image in FIG. 2A and the Laplace distributions of set /0 and set /t having variances 130.56 and 131.05, respectively in FIG. 2B.
[0054] Property 2: For any two pixels of a natural scene image, the difference in pixel values follows a Laplace distribution that is related to the spatial distance between the pixels; an increased distance corresponds to a larger variance in the Laplace distribution.
[0055] The distance between any two separate pixels p(i, j) and p(k, l) is defined as the Euclidean distance of the two pixel positions as shown in (4).
[0056] dp{iMl) = J(k - i)2 + (l - j)2 (4)
[0057] In the last experiment, the results are shown for pixel differences when d - 1 , or rather, adjacent pixels. Our next experiment further exhibits this property by showing the effects on the variance as you increase this distance. FIG. 3B visualizes the Laplace distributions of some such increases in the spatial distances between pixels, d .
[0058] Property 3: After convolving a natural scene image with a low-pass filter, fx , the difference of values of the same pixel in the original image and the processed image will also follow a Laplace distribution.
Figure imgf000014_0001
[0060] In (5), we introduce fx as a simple low-pass filter which can both lessen Gaussian noise and blur an image. Processing an image using this filter causes the variance of the difference of two adjacent pixel values to decrease. As shown in FIG. 4b, if we set a threshold λ in the distribution and define P as the probability of the difference of adjacent pixel values being larger than the threshold, P becomes smaller when the image is convolved with fx . For a given pixel x0 of an image / with pixel difference variance σ , processing / with fx creates x0 . The difference between x0 and x0 follows a new Laplace distribution with zero mean and variance of approximately—
81
[0061] Property 4: By processing a natural scene image with a high pass filter f2 , any pixel value from the processed image will also follow a Laplace distribution. 0 - 1 0
[0062] - 1 4 - 1 (6)
0 - 1 0
[0063] In (6), f2 is another very important tool which can be used to find high frequency content of images. For the structure shown in FIG. 4a, the pixel xi is labeled x after filtering the image / with f2 . The value of x0 will follow another Laplace distribution with zero mean and variance— .
4
[0064] 3 NR Artifact Metrics
[0065] When testing image quality, we must consider the effects of image content. Many existing methods do not consider these effects when building metrics, which leads to instability and low accuracy. Although some methods try to mitigate this concern, they do so by simply reducing the content using 1-D filters. The use of 1-D filters causes two issues: (i) they do not remove image content well, (ii) they cause a loss of potentially valuable image information for quality assessment. To avoid these issues, we aim to be mindful of the content information by extracting features that are independent of the image content and related to image artifacts.
[0066] 3.1 Blurriness Metric
[0067] Many existing blurriness metrics are based on the idea that blurring reduces the sharpness of image edges. These metrics usually first find all edge points and their related edge directions, then calculate their average width [4]. These metrics can be problematic. First, defining an edge is challenging. Different definitions can vary the calculated edge extent, subsequently varying the average width value, and certain definitions may not even yield edges in an image. Additionally, blurriness affects more than the high frequency components of pixels in an image. An example of this would be when the average edge width of a clear image with fewer high frequency components is smaller than that of a blurred image containing more high frequency components.
[0068] With our method, we propose a framework for utilizing all pixels rather than the edge points within a frame. As previously mentioned in Property 4, for any clear or blurred image filtered using f2 , the pixel values of this filtered image can be approximately modeled by a Laplace distribution. We now use V to denote the variance of the Laplace distribution. Convolving an image with f blurs the image, decreasing the value of V . The more times an image is processed using fx , the blurrier the image becomes and the smaller the value of V . In FIG. 5, different curves are shown for different images with various image content. The horizontal axis represents the blurriness (the number of times the filter fx was applied to the image), and the vertical axis signifies the variance V .
[0069] As can be observed in FIG. 5a, although different image content produces various specific relationships between V and blurriness, we notice a regularity consisting of an increasing blurriness accompanied by a decreasing V . Having identified this regularity, we use V as a feature to model the blurriness of images containing the same content, since it does not necessarily differentiate blurriness well for images with different content. To provide a more robust feature for handling different image content, we adopt an alternative feature: V—Vi . Given any image / , we blur using fx to obtain the blurred image Ix . We form V1 by subsequently filtering Ix with f2 and calculating the variance of the resulting pixel values.
[0070] As shown in FIG. 5b, V—V1 is a better blurriness feature than V since it reduces the scale along the feature axis, however it is still not robust to image content. Normalizing V—V1 by V , we obtain our desired blurriness feature, γχ , as defined in (7).
V - V,
[0071] γχ =— (7)
[0072] A visualization of this feature is shown in FIG. 5c. The curves representing different image content have a more regular relationship between blurriness and our blurriness feature. This denser coupling signifies that this feature can help alleviate the image content issue discussed earlier.
[0073] In Section 4, we use the LIVE image database to show that our feature provides a better characterization of blurriness. Independent of the content of an image, we observe a decrease in our blurriness feature, γχ , as the blurriness of an image increases.
[0074] 3.2 Noisiness Metric [0075] We use and improve upon the following intuitive idea to design our noisiness metric for quality assessment. In using an image-denoising method to remove the noise of two separately distorted images containing differing levels of noise from the same original frame, the image with the larger difference between itself and the denoised frame is the noisier of the two.
[0076] To utilize and analyze this idea, we still make the assumption that the difference between any two adjacent pixels follows a Laplace distribution and pixel values contain additive, independent Gaussian noise. This assumption follows from Property 2, and for generalization and ease of computing, we maintain the use of the average filter fx which can remove part of the Gaussian noise.
[0077] Similar to the blurriness metric, we obtain the variance of the coefficients, V , by processing a noisy image / with filter f2 . Further processing / with filter fx we get /j , a denoised version of / . Again, V1 is calculated as the variance of the pixel values after processing Ix with f2 . Since Ix has less noise than image / , V will be larger than Vx . We employ V—V1 as a noise feature, which maintains a good response to noise for images with the same content. Experimentation verifies that the larger the value of V—Vx , the noisier the image. For images containing different content, the value of V— V1 will not always be the same even though images may have the same noisiness or level of human perception. To address this problem, we apply normalization to obtain (8).
V - V,
[0078] ^2 =—^ (8)
[0079] After normalization, the value of γ2 increases as the level of noise increases, or alternatively, as the level of human perception decreases for any type of image content. We verify this property of γ2 in Section 4 by using noisy images from the LIVE image database. Although the blurriness and noisiness metrics are defined in the same fashion, they have different influences on image quality assessment. The distinction will be further elaborated using experimental results in Section 4.
[0080] 3.3 Blockiness Metric [0081] Blockiness appears at a block boundary as a byproduct of encoding, decoding, or transmission. If there appears to be block-like artifacts in a frame, Property 1 states that the statistical relationship between two adjacent pixels in the same block will be different than that of two adjacent pixels from different blocks. To make use of this statistical property, the image is partitioned into bs xbs blocks and sampled in the horizontal and vertical directions as shown in (9) and (10). Methods for constructing these two types of down- sampling are shown below.
[0082] 1. Horizontally down-sample the frame to get sub-sampled images Ih . Here, k is all the required k'h rows in the bs xbs block.
[0083] Ih = I(b + k, j), / = 0,1 L-^J (9)
[0084] 2. Vertically down-sample the frame to get sub-sampled images Iv . Here, k is all the required kth columns in the bs xbs block.
[0085] Iv = I(i, b + k), 7 = 0,1, ... , L-^-J (10)
[0086] The size of a block can be adjusted according to application requirements. In our work, we use bs = 8 . An example of the vertical sampling structure is shown in FIG.
6. The dark symbols inside a grid correspond to pixels in the resulting sampled sub- images. The different symbols correspond to different sub-images. Another two data sets D1 and D2 can be obtained by taking the difference of sub-images s7 and s6 and the difference of sub-images s0 and s7 , respectively. If blockiness is not present in the image, the pixel values of the data sets D1 and D2 should follow a similar Laplace distribution. If we set the same threshold in two Laplace distributions, β ν) represents the numbers of pixels which are larger than the threshold in Di . It is apparent that the values of β ν) and ¾v) should be close for a non-blocky image. However, if there is blockiness present in the frame where the boundary pixel values are similar to the contiguous pixels of the block artifact portion, but different from the contiguous pixels in other blocks, it is straightforward to observe that β ν) will decrease while ¾v) increases. This will cause the ratio of ¾v) to ¾(v) to become larger. Similarly, we can also easily construct this ratio for the horizontal case as well to see that this ratio also increases. We combine these two directional assessment ratios to obtain the following expression for blockiness:
J f blockiness
Figure imgf000019_0001
[0087] In (11), ζ is introduced as a tuning parameter to allow fMockiness to be tailored for a variety of situations and is chosen to be 1 in our experimentation. For a frame without blockiness, the value of fblockiness should be close to one. Introducing blockiness into a frame will increase the value of fMockiness■ In FIG. 7, the five curves represent five different images. For each image, we randomly add different percentages of blockiness, which we call γ3 , while measuring the value of fhlockiness . We observe that fhlockiness increases with respect to increases in γ . The relationship between fMockiness and γ is modeled as a quadratic function, as shown in (12). The reason for the appearance of noise about the quadratic curve stems from our method of randomly inserting block artifacts. As we increase our probability of insertion, we are inadvertently increasing the probability of spatially coincident block artifacts. The curves appear to converge to a value of 1 as γ approaches 0, which verifies our assumption. Due to the different statistical properties of separate images, each image will have a different quadratic function related to the response of blockiness, which should not be ignored when performing quality assessment. For any image / we can quickly calculate fMockiness , which we will now refer to as / . We randomly add 5% and 10% blockiness into the original frame separately to obtain two new images and calculate their / as f5% and fl0% , respectively. The values of the parameters a and b of the quadratic function and γ3 of / can be obtained by solving (12) through (14) where / , f5%, and fl0% are known.
[0088] fblockmess = f = a 3 + b r3 + l (12)
[0089] f5% = a (γ3 + 5) 2 + b (γ3 + 5) + 1 (13)
[0090] f10% = a (r3 + i0) 2 + b (γ3 + 10) + 1 (14) [0091] 4 Metric Verification
[0092] In the previous section, we proposed three distortion metrics. We now validate their feasibility through extensive experimentation.
[0093] 4.1 Classification between clear and distorted images
[0094] A good distortion metric should offer a delineation between clear and distorted images. For this verification, we collect 85 distortionless images from several image databases. For the distorted images, we employ Gaussian blur, white noise, and JPEG images from the LIVE image database which correspond to blurriness C ), noisiness { J2 ) and blockiness ( γ3) metrics, respectively. The metric values are calculated for both clear and distorted images and displayed in FIG. 8.
[0095] In FIG. 8a, the dashed line is the density of γλ for 85 clear images, and the solid line is the density of γλ for the same number of Gaussian blur images. These two densities are estimated as a Gaussian kernel by using the two γχ histograms of clear and Gaussian blur images, respectively. As can be observed in FIG. 8a, the overlap of the two densities is relatively small, so our blurriness metric is a good candidate to classify between clear and blurry images. We obtain similar results for our noisiness { J2 ) and blockiness ( γ3) metrics, and show them in FIG. 8b and 8c, respectively, using dashed lines for the densities of clear images and solid lines for the densities of noisy and blocky images.
[0096] 4.2 Relationship between Blurriness and Noisiness
[0097] Previously, we introduced our blurriness and noisiness metrics, and they appeared to be equivalent. The noisiness metric provides a smaller value for a better quality image. However, the blurriness metric offers a larger value for a better quality image. The noisiness (blurriness) metric value of a clear image lies between the metric values of a completely blurred image and completely noisy image. For better use of these two metrics, we first estimate the range (R ) of the noisiness (blurriness) metric value for clear images through extensive experimentation and then refine these two metrics into (15) and (16) below. γ - γ V - V
1 if 1 < max(R)
V v
[0098] o otherwise (15)
V - V, V - V,
1 1 1 < /MfiU (R)
V
[0099] 72 0 otherwise (16)
[0100] 4.3 Correlation between metric value and human perception
[0101] For a distortion metric, it is important to have agreement between the metric value obtained from an image and a human perception score. As a simple verification, we list 4 images for each distortion and give their related metric values and human perceived scores, as displayed in Tables 0, 1 and 2. The human perceived scores of an image are in the range [0,100], with a larger score signifying a poorer quality image.
[0102] Table 1: Blurriness and human perception for blurry images
Figure imgf000021_0001
[0106] Table 4: Comparison of Srocc for three metrics
Figure imgf000021_0002
[0108] In Table 4, we calculate the metric values for all blurry, noisy and blocky images from the LIVE image database and obtain their Spearman Rank Order correlation (Srocc). Mylene[4] tried to measure the same distortions: noisiness, blockiness, and blurriness, so we compare the Srocc for metrics from our model with Mylene's model. Our comparison reveals that our metrics are more correlated with human perception.
[0109] 5 Overall Perceptual Value Estimation
[0110] If an image is distorted by one or more artifacts, the overall distortion can be measured as a combination of distortions due to individual artifacts. In general, there are many ways to combine features to find a good quality assessment metric. The weighted Minkowski metric used by [3] is shown in (17). When p = 1 , it becomes a linear combination metric:
Q = ( - Blockiness p + β Blurriness p + γ Noisiness p ) p (17)
[0111] Another metric, (18), is given by [23] as:
Q = a + fiBlockiness71 Blurrinessy 'Noisiness73 (18)
[0112] The parameters of these two models can be estimated by using curve fitting and subjective data. One critical problem when using these models is that we do not know the best curve fitting function a priori. Due to the difficulty in determining the interplay between the three artifacts and how they influence human perception, we find reasonable functions greedily (one by one). Since in this scenario the result accuracies are limited, we were not able to find a good parametric method to form a model to provide universally valid results. Due to this realization, we propose the use of non- parametric methods to predict the human perceptual score. We employ the codebook method, creating the following as our algorithm.
[0113] 1. Codebook Construction: One element of the codebook is a vector including four values: γχ , γ2 , γ3 and perceptual values denoted as ( , j , Ci 2 , Ci 3 , C, 4 ).
Given a training image and matching human perceptual score, we map the score to one element of the codebook using our proposed three artifact metrics to extract the related feature values. In our experiments, we employ the LIVE image database. Because we focus on significant blockiness, blurriness, and noisiness artifacts, we utilize the JPEG, Gaussian blur, and white noise portions of the database to build our codebook. [0114] 2. Neighborhood Construction: For any test image / , the artifact metric values ( / , / , / ) are calculated and used to find the k-NN's from the codebook. We define the distance between a test image and the images in the codebook as the weighted Euclidean distance assigning different weights for different artifacts. This distance between the test image / and the image Ci from the codebook, where p , q , r are the weights for the three artifact metric values, is shown in (19):
d(I,Ci) = ^M l + Mia + M 3 (19), where M i X = p(l^ - Ci X f
M 2 = q(Ir2 - C,.2 )2 ,.3 = r(/ 3 - C,.3 )2 .
[0115] 3. Perceptual Score Prediction: After finding the & -nearest neighbors, we use the k perceptual values provided by the codebook to predict the test image perceptual value. Because the distances of the k-NN's are different, it is not necessarily reasonable to simply use the mean of their values to make the prediction. With this consideration, we assign weights related to the distances. Suppose di is the distance between the test image / and the neighboring image Ci , the weight wi of Ci is defined in (20). With these weights, we can predict the test image perceptual value P, by using (21).
J_
[0116] W = - L— (20)
[0117] Ρ, =∑^ίΑ (21)
=l
[0118] 6 Experimental Results
[0119] We use the images from the LIVE database to verify our model. Some examples of images used from this database are shown in FIGs. 9, 10, and 11. The experimental details and results are given below.
[0120] 6.1 Experiment Description [0121] Because our focus is on the three significant artifact types, there are only 581 images from this dataset available to test our model. We were unable to divide these images into both a training and a testing set since building a codebook requires a large number of images to train. Because of this limitation, we apply the one-vs-all model that selects one virgin image as test data, constructs the codebook and trains the model parameters using the rest of the images, predicts the perceptual value of the test data using the codebook and well-trained model, and repeats this process until all the images are used for prediction once. Using this method, we can ultimately have 581 test images which is much more appealing. In our codebook model, there are four important parameters: p , q , r , and k . The first three parameters represent the distance weights for the three artifact types, and the fourth is the number of nearest neighbors used for prediction. We utilize genetic algorithms to find the most suitable parameters in such a large parameter space.
[0122] 6.2 Experiment Results
[0123] The predicted results for the 581 images are presented in FIG. 12. This representation allows for the visualization of outliers in the results. We believe the outliers originate from one of three causes:
[0124] 1. There are only a select amount of images whose perceptual values are in the range [0, 20] in the database.
[0125] 2. The size of the database is not sufficiently large enough to construct a well represented codebook.
[0126] 3. Some perceptual values provided by the database seem unreliable, such as in the JPEG images "img28" and "img30" which are nearly indistinguishable, however the difference between the two images' perceptual values is 23.
[0127] We expect better prediction results from the use of a larger data set, which would help mitigate these three outlier causes.
[0128] 6.3 Comparison Study
[0129] Ensuring the same computational environment and datasets, we compare the results of our proposed model with the best existing no-reference image quality assessment methods: Blindll [16], NIQE [12], BRISQUE [11], and BIQI [13]. In Table
7, Srocc is the Spearman's rank correlation coefficient and Pear is the Pearson correlation coefficient, where both have the property that the larger the coefficient value, the stronger the correlation. Time is the amount of time in seconds for computation to process an image from feature extraction to quality score prediction. The comparison results in Table 7 show that our BNB method achieves better correlation with human perception than existing methods. To predict the quality of an image, our BNB method requires the least amount of computation time, better allowing for real-time applications. It should be appreciated that we can further decrease the algorithm's computation time due to the fact that our method uses statistical information from the rather large pixel space (there are about 400,000 pixels for one experimental image), and using fewer pixels could still provide statistically relevant information without a loss in performance. In other words, we could use much fewer representative blocks to process the assessment. Accordingly, when we discuss processing of an image, it should be understood that, unless otherwise indicated by context, the processing may be on the entire image or a relevant part of the image.
[0130] Table 5 Comparison on Live Image Database
Figure imgf000025_0001
[0131] As discussed above, image quality scores as described herein - though they represent human perception of an image - may be computed without human input about image quality once the data structure acting as the codebook has been created. Accordingly, these techniques may be executed by hardware and/or software components in one or more computing devices that process images for display, including computing devices that transmit devices for display over a network or that receive images from another device for display. FIG. 13 shows, schematically, an illustrative computer 5000 on which any aspect of the present disclosure may be implemented. In the embodiment shown in FIG. 13, the computer 5000 includes a processing unit 5001 having one or more processors and a non-transitory computer-readable storage medium 5002 that may include, for example, volatile and/or non-volatile memory. The memory 5002 may store one or more instructions to program the processing unit 5001 to perform any of the functions described herein. The computer 5000 may also include other types of non-transitory computer-readable medium, such as storage 5005 (e.g., one or more disk drives) in addition to the system memory 5002. The storage 5005 may also store one or more application programs and/or external components used by application programs (e.g., software libraries), which may be loaded into the memory 5002.
[0132] The computer 5000 may have one or more input devices and/or output devices, such as devices 5006 and 5007 illustrated in FIG. 13. These devices can be used, among other things, to present a user interface. Examples of output devices that can be used to provide a user interface include printers or display screens for visual presentation of output and speakers or other sound generating devices for audible presentation of output. Examples of input devices that can be used for a user interface include keyboards and pointing devices, such as mice, touch pads, and digitizing tablets. As another example, the input devices 5007 may include a microphone for capturing audio signals, and the output devices 5006 may include a display screen for visually rendering, and/or a speaker for audibly rendering, recognized text.
[0133] As shown in FIG. 13, the computer 5000 may also comprise one or more network interfaces (e.g., the network interface 5010) to enable communication via various networks (e.g., the network 5020). Examples of networks include a local area network or a wide area network, such as an enterprise network or the Internet. Such networks may be based on any suitable technology and may operate according to any suitable protocol and may include wireless networks, wired networks or fiber optic networks.
[0134] Having thus described several aspects of at least one embodiment of this invention, it is to be appreciated that various alterations, modifications, and improvements will readily occur to those skilled in the art.
[0135] For example, it was described that the quality score is computed for an image to be displayed. However, it should be appreciated that in many settings, a collection of similar images may be displayed on the same device, such as may occur, for example, when a stream of images is displayed as a video. In some embodiments, the quality score may be computed for one or more images in the collection. Those quality scores may be used to select parameters of image processing or display, which may be applied to the collection of images. [0136] Such alterations, modifications, and improvements are intended to be part of this disclosure, and are intended to be within the spirit and scope of the invention. Further, though advantages of the present invention are indicated, it should be appreciated that not every embodiment of the invention will include every described advantage. Some embodiments may not implement any features described as advantageous herein and in some instances. Accordingly, the foregoing description and drawings are by way of example only.
[0137] The above-described embodiments of the present invention can be implemented in any of numerous ways. When implemented in software, the software code can be executed on any suitable processor or collection of processors, whether provided in a single computer or distributed among multiple computers. Such processors may be implemented as integrated circuits, with one or more processors in an integrated circuit component, including commercially available integrated circuit components known in the art by names such as CPU chips, GPU chips, microprocessor, microcontroller, or co-processor. Alternatively, a processor may be implemented in custom circuitry, such as an ASIC, or semicustom circuitry resulting from configuring a programmable logic device. As yet a further alternative, a processor may be a portion of a larger circuit or semiconductor device, whether commercially available, semi-custom or custom. As a specific example, some commercially available microprocessors have multiple cores such that one or a subset of those cores may constitute a processor. Though, a processor may be implemented using circuitry in any suitable format.
[0138] Further, it should be appreciated that a computer may be embodied in any of a number of forms, such as a rack-mounted computer, a desktop computer, a laptop computer, or a tablet computer. Additionally, a computer may be embedded in a device not generally regarded as a computer but with suitable processing capabilities, including a Personal Digital Assistant (PDA), a smart phone or any other suitable portable or fixed electronic device.
[0139] Also, a computer may have one or more input and output devices. These devices can be used, among other things, to present a user interface. Examples of output devices that can be used to provide a user interface include printers or display screens for visual presentation of output and speakers or other sound generating devices for audible presentation of output. Examples of input devices that can be used for a user interface include keyboards, and pointing devices, such as mice, touch pads, and digitizing tablets. As another example, a computer may receive input information through speech recognition or in other audible format. In the embodiment illustrated, the input/output devices are illustrated as physically separate from the computing device. In some embodiments, however, the input and/or output devices may be physically integrated into the same unit as the processor or other elements of the computing device. For example, a keyboard might be implemented as a soft keyboard on a touch screen. Alternatively, the input/output devices may be entirely disconnected from the computing device, and functionally integrated through a wireless connection.
[0140] Such computers may be interconnected by one or more networks in any suitable form, including as a local area network or a wide area network, such as an enterprise network or the Internet. Such networks may be based on any suitable technology and may operate according to any suitable protocol and may include wireless networks, wired networks or fiber optic networks.
[0141] Also, the various methods or processes outlined herein may be coded as software that is executable on one or more processors that employ any one of a variety of operating systems or platforms. Additionally, such software may be written using any of a number of suitable programming languages and/or programming or scripting tools, and also may be compiled as executable machine language code or intermediate code that is executed on a framework or virtual machine.
[0142] In this respect, the invention may be embodied as a computer readable storage medium (or multiple computer readable media) (e.g., a computer memory, one or more floppy discs, compact discs (CD), optical discs, digital video disks (DVD), magnetic tapes, flash memories, circuit configurations in Field Programmable Gate Arrays or other semiconductor devices, or other tangible computer storage medium) encoded with one or more programs that, when executed on one or more computers or other processors, perform methods that implement the various embodiments of the invention discussed above. As is apparent from the foregoing examples, a computer readable storage medium may retain information for a sufficient time to provide computer-executable instructions in a non-transitory form. Such a computer readable storage medium or media can be transportable, such that the program or programs stored thereon can be loaded onto one or more different computers or other processors to implement various aspects of the present invention as discussed above. As used herein, the term "computer-readable storage medium" encompasses only a computer-readable medium that can be considered to be a manufacture (i.e., article of manufacture) or a machine. Alternatively or additionally, the invention may be embodied as a computer readable medium other than a computer-readable storage medium, such as a propagating signal.
[0143] The terms "code", "program" or "software" are used herein in a generic sense to refer to any type of computer code or set of computer-executable instructions that can be employed to program a computer or other processor to implement various aspects of the present invention as discussed above. Additionally, it should be appreciated that according to one aspect of this embodiment, one or more computer programs that when executed perform methods of the present invention need not reside on a single computer or processor, but may be distributed in a modular fashion amongst a number of different computers or processors to implement various aspects of the present invention.
[0144] Computer-executable instructions may be in many forms, such as program modules, executed by one or more computers or other devices. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. Typically the functionality of the program modules may be combined or distributed as desired in various embodiments.
[0145] Also, data structures may be stored in computer-readable media in any suitable form. For simplicity of illustration, data structures may be shown to have fields that are related through location in the data structure. Such relationships may likewise be achieved by assigning storage for the fields with locations in a computer-readable medium that conveys relationship between the fields. However, any suitable mechanism may be used to establish a relationship between information in fields of a data structure, including through the use of pointers, tags or other mechanisms that establish relationship between data elements.
[0146] Various aspects of the present invention may be used alone, in combination, or in a variety of arrangements not specifically discussed in the embodiments described in the foregoing and is therefore not limited in its application to the details and arrangement of components set forth in the foregoing description or illustrated in the drawings. For example, aspects described in one embodiment may be combined in any manner with aspects described in other embodiments.
[0147] Also, the invention may be embodied as a method, of which an example has been provided. The acts performed as part of the method may be ordered in any suitable way. Accordingly, embodiments may be constructed in which acts are performed in an order different than illustrated, which may include performing some acts simultaneously, even though shown as sequential acts in illustrative embodiments.
[0148] The indefinite articles "a" and "an," as used herein in the specification and in the claims, unless clearly indicated to the contrary, should be understood to mean "at least one."
[0149] The phrase "and/or," as used herein in the specification and in the claims, should be understood to mean "either or both" of the elements so conjoined, i.e., elements that are conjunctively present in some cases and disjunctively present in other cases. Multiple elements listed with "and/or" should be construed in the same fashion, i.e., "one or more" of the elements so conjoined. Other elements may optionally be present other than the elements specifically identified by the "and/or" clause, whether related or unrelated to those elements specifically identified. Thus, as a non-limiting example, a reference to "A and/or B", when used in conjunction with open-ended language such as "comprising" can refer, in one embodiment, to A only (optionally including elements other than B); in another embodiment, to B only (optionally including elements other than A); in yet another embodiment, to both A and B (optionally including other elements); etc.
[0150] As used herein in the specification and in the claims, the phrase "at least one," in reference to a list of one or more elements, should be understood to mean at least one element selected from any one or more of the elements in the list of elements, but not necessarily including at least one of each and every element specifically listed within the list of elements and not excluding any combinations of elements in the list of elements.
This definition also allows that elements may optionally be present other than the elements specifically identified within the list of elements to which the phrase "at least one" refers, whether related or unrelated to those elements specifically identified. Thus, as a non-limiting example, "at least one of A and B" (or, equivalently, "at least one of A or B," or, equivalently "at least one of A and/or B") can refer, in one embodiment, to at least one, optionally including more than one, A, with no B present (and optionally including elements other than B); in another embodiment, to at least one, optionally including more than one, B, with no A present (and optionally including elements other than A); in yet another embodiment, to at least one, optionally including more than one, A, and at least one, optionally including more than one, B (and optionally including other elements); etc.
[0151] Use of ordinal terms such as "first," "second," "third," etc., in the claims to modify a claim element does not by itself connote any priority, precedence, or order of one claim element over another or the temporal order in which acts of a method are performed, but are used merely as labels to distinguish one claim element having a certain name from another element having a same name (but for use of the ordinal term) to distinguish the claim elements.
[0152] Also, the phraseology and terminology used herein is for the purpose of description and should not be regarded as limiting. The use of "including," "comprising," or "having," "containing," "involving," and variations thereof herein, is meant to encompass the items listed thereafter and equivalents thereof as well as additional items.
[0153] LIST OF REFERENCES
[0154] The following references are hereby incorporated by reference in their entireties for all they teach.
[0155] [1] Guangquan Cheng, Jincai Huang, Zhong Liu, and Cheng Lizhi. Image quality assessment using natural image statistics in gradient domain. AEU - International Journal of Electronics and Communications, 65(5):392 - 397, 2011.
[0156] [2] Ruigang Fang and Dapeng Wu. No-reference image quality assessment based on bnb measurement. In Proceedings of IEEE China Summit & International Conference on Signal and Information Processing (ChinaSIP), pages 528-532, 2013.
[0157] [3] M.C.Q. Farias, S.K. Mitra, and J.M Foley. Perceptual contributions of blocky, blurry and noisy artifacts to overall annoyance. IEEE International Conference on Multimedia and Expo, 1:1 - 529-32, 2003.
[0158] [4] M.C.Q. Farias and S.K Mitra. No-reference video quality metric based on artifact measurements. IEEE International Conference on Image Processing, 3:111 - 141-4, sept 2005. [0159] [5] Jinggang Huang and D Mumford. Statistics of natural images and models. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, l:xxiii+637+663, 1999.
[0160] [6] Le Kang, Peng Ye, Yi Li, and D. Doermann. Convolutional neural networks for no-reference image quality assessment. In 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1733-1740, June 2014.
[0161] [7] Lixiong Liu, Hongping Dong, Hua Huang, and A.C. Bovik. No-reference image quality assessment in curvelet domain. Signal Processing: Image Communication, 29(4):494 -505, 2014.
[0162] [8] Lixiong Liu, Bao Liu, Hua Huang, and A.C. Bovik. No-reference image quality assessment based on spatial and spectral entropies. Signal Processing: Image Communication, 29(8):856 -863, 2014.
[0163] [9] Chaofeng Li, A.C. Bovik, and Xiaojun Wu. Blind image quality assessment using a general regression neural network. IEEE Transactions on Neural Networks, 22(5):793-799, May 2011.
[0164] [10] P. Marziliano, F. Dufaux, S. Winkler, and T Ebrahimi. A no-reference perceptual blur metric. IEEE International Conference on Image Processing, 3:111-57 - 111-60, 2002.
[0165] [11] A. Mittal, A.K. Moorthy, and A.C. Bovik. No-reference image quality assessment in the spatial domain. IEEE Transactions on Image Processing, 21(12):4695 -4708, Dec 2012.
[0166] [12] A. Mittal, R. Soundararajan, and A.C. Bovik. Making a completely blind image quality analyzer. IEEE Signal Processing Letters, (3):209-212, Mar 2013.
[0167] [13] A.K. Moorthy and A.C. Bovik. A two-step framework for constructing blind image quality indices. IEEE Signal Processing Letters, 17(5):513 -516, May 2010.
[0168] [14] A.K. Moorthy and A.C. Bovik. Blind image quality assessment: From natural scene statistics to perceptual quality. IEEE Transactions on Image Processing, 20(12):3350-3364, Dec 2011.
[0169] [15] K. Rank, M. Lendl, and R Unbehauen. Estimation of image noise variance. IEE Proceedings on Vision, Image and Signal Processing, 146(2):80 -84, aug 1999. [0170] [16] M.A. Saad, A.C. Bovik, and C. Charrier. Blind image quality assessment: A natural scene statistics approach in the dct domain. IEEE Transactions on Image Processing, 21(8):3339-3352, Aug 2012.
[0171] [17] R. Serral-Gracia, E. Cerqueira, M. Curado, M. Yannuzzi, E. Monteiro, and X Masip-Bruin. An overview of quality of experience measurement challenges for video applications in ip networks. Wired/Wireless Internet Communications, 6074:252- 263, april 2010.
[0172] [18] H.R. Sheikh, M.F. Sabir, and A.C. Bovik. A statistical evaluation of recent full reference image quality assessment algorithms. IEEE Transactions on Image Processing, 15(11):3440 -3451, nov. 2006.
[0173] [19] H.R. Sheikh, Cormack.L Wang.Z, and A.C. Bovik. Live image quality assessment database release 2. http://live.ece.utexas.edu/research/quality, 2006.
[0174] [20] R. Soundararajan and A.C. Bovik. Rred indices: Reduced reference entropic differencing for image quality assessment. IEEE Transactions on Image Processing, 21(2):517 -526, feb 2012.
[0175] [21] Zhou Wang, A.C. Bovik, and B.L Evan. Blind measurement of blocking artifacts in images. IEEE International Conference on Image Processing, 3:981 -984, 2000.
[0176] [22] Zhou Wang, A.C. Bovik, H.R. Sheikh, and E.P. Simoncelli. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, 13:600 -612, 2004.
[0177] [23] Zhou Wang, H.R. Sheikh, and A.C Bovik. No-reference perceptual quality assessment of jpeg compressed images. IEEE International Conference on Image Processing, 1:1-477 - 1-480, 2002.
[0178] [24] W. Xue, X. Mou, L. Zhang, A.C. Bovik, and X. Feng. Blind image quality assessment using joint statistics of gradient magnitude and laplacian features. IEEE Transactions on Image Processing, 23(11), Nov 2014.
[0179] [25] Feng X. and J.P Allebach. Measurement of ringing artifacts in jpeg images. Proc. SPIE, 6076:74-83, january 2006.
[0180] [26] Peng Ye and D. Doermann. No-reference image quality assessment based on visual codebook. IEEE International Conference on Image Processing (ICIP), pages 3089 -3092, sept. 2011.

Claims

CLAIMS What is claimed is:
1. A computer-implemented method for assessing quality of an electronic image, the method comprising:
computing values for a plurality of attributes of the electronic image;
selecting a plurality of entries from a computer data structure, wherein:
each entry of the data structure comprises values of the plurality of attributes and a score; and
selecting comprises selecting based on a vector difference of the values of the plurality of attributes of the electronic image relative to the values of the plurality of attributes of entries in the data structure; and
computing a quality score for the electronic image as a combination of the scores of the selected entries from the data structure.
2. The method of claim 1, wherein:
computing the quality score comprises computing a weighted average of the selected entries.
3. The method of claim 2, wherein:
a weight for computing the weighted average of the selected entries is based on the vector difference between the values of the plurality of attributes of the electronic image relative to the values of the plurality of attributes of an entry in the data structure.
4. The method of claim 1, wherein:
the vector difference is a weighted Euclidean distance.
5. The method of claim 1, wherein:
the score in each entry of the data structure is indicative of human perception of image quality of one or more sample images.
6. The method of claim 5, wherein:
the electronic image is a compressed image transmitted over internet computer network for displaying on a user device, and an amount of compression of the compressed image is automatically selected based in part on the quality score.
7. The method of claim 6, wherein:
the electronic image is an still image of a video.
8. The method of claim 6, wherein:
scores in entries of the data structure are indicative of human perception of image quality of sample images when displayed on a type of electronic device similar to the user device.
9. The method of claim 1, wherein:
the plurality of attributes comprise blurriness, noisiness and blockiness.
10. A non-transitory computer readable medium comprising computer readable instructions that when executed by a processor, cause the processor to perform a method comprising the acts of:
(A) computing values for a plurality of attributes of the electronic image;
(B) selecting a plurality of entries from a computer data structure, wherein:
each entry of the data structure comprises values of the plurality of attributes and a score; and
selecting comprises selecting based on a vector difference of the values of the plurality of attributes of the electronic image relative to the values of the plurality of attributes of entries in the data structure; and
(C) computing a quality score for the electronic image as a combination of the scores of the selected entries from the data structure.
11. The non-transitory computer readable medium of claim 10, wherein:
computing the quality score comprises computing a weighted average of the selected entries.
12. The non-transitory computer readable medium of claim 10, wherein:
a weight for computing the weighted average of the selected entries is based on the vector difference between the values of the plurality of attributes of the electronic image relative to the values of the plurality of attributes of an entry in the data structure.
13. The non-transitory computer readable medium of claim 10, wherein:
the vector difference is a weighted Euclidean distance.
14. The non-transitory computer readable medium of claim 10, wherein:
the score in each entry of the data structure is indicative of human perception of image quality of one or more sample images.
15. The non-transitory computer readable medium of claim 14, wherein:
the electronic image is a compressed image transmitted over a computer network for displaying on a user device, and an amount of compression of the compressed image is automatically selected based in part on the quality score.
16. The non-transitory computer readable medium of claim 15, wherein:
the electronic image is an still image of a video.
17. The non-transitory computer readable medium of claim 15, wherein:
scores in entries of the data structure are indicative of human perception of image quality of sample images when displayed on a type of electronic device similar to the user device.
18. The non-transitory computer readable medium of claim 10, wherein:
the plurality of attributes comprise blurriness, noisiness and blockiness.
19. The non-transitory computer readable medium of claim 18, wherein act (A) comprises computing a value for blurriness of the electronic image and wherein computing a value for blurriness comprises: (D) determining a variance V of a Laplace distribution of differences between intensity values of all adjacent pairs of pixels in the electronic image;
(E) applying a low pass filter to the electronic image;
(F) applying a high pass filter to the result of act (E);
(G) determining a variance VI of a Laplace distribution of differences between intensity values of all adjacent pairs of pixels in the result of act (F);
(H) computing the value for blurriness based on the result of act (G).
20. The non-transitory computer readable medium of claim 19, wherein act (A) comprises computing a value for noisiness of the electronic image and wherein computing a value for noisiness of the electronic image comprises:
(I) determining a variance V of a Laplace distribution of differences between intensity values of all adjacent pairs of pixels in the electronic image;
(J) applying a high pass filter to the electronic image;
(K) applying a low pass filter to the result of act (J);
(L) determining a variance VI of a Laplace distribution of differences between intensity values of all adjacent pairs of pixels in the result of act (K);
(M) computing the value for noisiness based on the result of act (L).
21. The non-transitory computer readable medium of claim 20, wherein act (A) comprises computing a value for blockiness of the electronic image and wherein computing a value for blockiness of the electronic image comprises:
(N) determining a number of pixels β 1 with intensity above a threshold in a first sub image of the electronic image;
(O) determining a number of pixels β2 with intensity above the threshold in a second sub image of the electronic image;
(P) computing the value for blockiness based at least in part on a ratio between βΐ and β2.
PCT/US2017/053393 2016-09-26 2017-09-26 Method for no-reference image quality assessment WO2018058090A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201662399985P 2016-09-26 2016-09-26
US62/399,985 2016-09-26

Publications (1)

Publication Number Publication Date
WO2018058090A1 true WO2018058090A1 (en) 2018-03-29

Family

ID=61691105

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2017/053393 WO2018058090A1 (en) 2016-09-26 2017-09-26 Method for no-reference image quality assessment

Country Status (1)

Country Link
WO (1) WO2018058090A1 (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110070539A (en) * 2019-04-28 2019-07-30 重庆大学 Image quality evaluating method based on comentropy
WO2020014862A1 (en) * 2018-07-17 2020-01-23 深圳大学 No-reference image quality evaluation system and method
US20200118029A1 (en) * 2018-10-14 2020-04-16 Troy DeBraal General Content Perception and Selection System.
CN111179245A (en) * 2019-12-27 2020-05-19 成都中科创达软件有限公司 Image quality detection method, device, electronic equipment and storage medium
CN111489333A (en) * 2020-03-31 2020-08-04 天津大学 No-reference night natural image quality evaluation method
US10730293B1 (en) 2019-02-27 2020-08-04 Ricoh Company, Ltd. Medium classification mechanism
CN111507426A (en) * 2020-04-30 2020-08-07 中国电子科技集团公司第三十八研究所 No-reference image quality grading evaluation method and device based on visual fusion characteristics
CN111583213A (en) * 2020-04-29 2020-08-25 西安交通大学 Image generation method based on deep learning and no-reference quality evaluation
CN111652854A (en) * 2020-05-13 2020-09-11 中山大学 No-reference image quality evaluation method based on image high-frequency information
CN111862000A (en) * 2020-06-24 2020-10-30 天津大学 Image quality evaluation method based on local average characteristic value
CN113450319A (en) * 2021-06-15 2021-09-28 宁波大学 KLT (karhunen-Loeve transform) technology-based super-resolution reconstruction image quality evaluation method
CN113519165A (en) * 2019-03-01 2021-10-19 皇家飞利浦有限公司 Apparatus and method for generating image signal
CN113784113A (en) * 2021-08-27 2021-12-10 中国传媒大学 No-reference video quality evaluation method based on short-term and long-term time-space fusion network and long-term sequence fusion network
CN114445386A (en) * 2022-01-29 2022-05-06 泗阳三江橡塑有限公司 PVC pipe quality detection and evaluation method and system based on artificial intelligence
CN114782422A (en) * 2022-06-17 2022-07-22 电子科技大学 SVR feature fusion non-reference JPEG image quality evaluation method

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100040152A1 (en) * 2004-07-15 2010-02-18 At&T Intellectual Property I.L.P. Human Factors Based Video Compression
US20120155765A1 (en) * 2010-12-21 2012-06-21 Microsoft Corporation Image quality assessment
US20130293725A1 (en) * 2012-05-07 2013-11-07 Futurewei Technologies, Inc. No-Reference Video/Image Quality Measurement with Compressed Domain Features
US20140270464A1 (en) * 2013-03-15 2014-09-18 Mitek Systems, Inc. Systems and methods for assessing standards for mobile image quality
US20150288874A1 (en) * 2012-10-23 2015-10-08 Ishay Sivan Real time assessment of picture quality
WO2016146038A1 (en) * 2015-03-13 2016-09-22 Shenzhen University System and method for blind image quality assessment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100040152A1 (en) * 2004-07-15 2010-02-18 At&T Intellectual Property I.L.P. Human Factors Based Video Compression
US20120155765A1 (en) * 2010-12-21 2012-06-21 Microsoft Corporation Image quality assessment
US20130293725A1 (en) * 2012-05-07 2013-11-07 Futurewei Technologies, Inc. No-Reference Video/Image Quality Measurement with Compressed Domain Features
US20150288874A1 (en) * 2012-10-23 2015-10-08 Ishay Sivan Real time assessment of picture quality
US20140270464A1 (en) * 2013-03-15 2014-09-18 Mitek Systems, Inc. Systems and methods for assessing standards for mobile image quality
WO2016146038A1 (en) * 2015-03-13 2016-09-22 Shenzhen University System and method for blind image quality assessment

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020014862A1 (en) * 2018-07-17 2020-01-23 深圳大学 No-reference image quality evaluation system and method
US20200118029A1 (en) * 2018-10-14 2020-04-16 Troy DeBraal General Content Perception and Selection System.
US10730293B1 (en) 2019-02-27 2020-08-04 Ricoh Company, Ltd. Medium classification mechanism
CN113519165A (en) * 2019-03-01 2021-10-19 皇家飞利浦有限公司 Apparatus and method for generating image signal
CN110070539A (en) * 2019-04-28 2019-07-30 重庆大学 Image quality evaluating method based on comentropy
CN111179245A (en) * 2019-12-27 2020-05-19 成都中科创达软件有限公司 Image quality detection method, device, electronic equipment and storage medium
CN111179245B (en) * 2019-12-27 2023-04-21 成都中科创达软件有限公司 Image quality detection method, device, electronic equipment and storage medium
CN111489333B (en) * 2020-03-31 2022-06-03 天津大学 No-reference night natural image quality evaluation method
CN111489333A (en) * 2020-03-31 2020-08-04 天津大学 No-reference night natural image quality evaluation method
CN111583213A (en) * 2020-04-29 2020-08-25 西安交通大学 Image generation method based on deep learning and no-reference quality evaluation
CN111583213B (en) * 2020-04-29 2022-06-07 西安交通大学 Image generation method based on deep learning and no-reference quality evaluation
CN111507426A (en) * 2020-04-30 2020-08-07 中国电子科技集团公司第三十八研究所 No-reference image quality grading evaluation method and device based on visual fusion characteristics
CN111507426B (en) * 2020-04-30 2023-06-02 中国电子科技集团公司第三十八研究所 Non-reference image quality grading evaluation method and device based on visual fusion characteristics
CN111652854A (en) * 2020-05-13 2020-09-11 中山大学 No-reference image quality evaluation method based on image high-frequency information
CN111862000A (en) * 2020-06-24 2020-10-30 天津大学 Image quality evaluation method based on local average characteristic value
CN111862000B (en) * 2020-06-24 2022-03-15 天津大学 Image quality evaluation method based on local average characteristic value
CN113450319A (en) * 2021-06-15 2021-09-28 宁波大学 KLT (karhunen-Loeve transform) technology-based super-resolution reconstruction image quality evaluation method
CN113784113A (en) * 2021-08-27 2021-12-10 中国传媒大学 No-reference video quality evaluation method based on short-term and long-term time-space fusion network and long-term sequence fusion network
CN114445386A (en) * 2022-01-29 2022-05-06 泗阳三江橡塑有限公司 PVC pipe quality detection and evaluation method and system based on artificial intelligence
CN114445386B (en) * 2022-01-29 2023-02-24 泗阳三江橡塑有限公司 PVC pipe quality detection and evaluation method and system based on artificial intelligence
CN114782422B (en) * 2022-06-17 2022-10-14 电子科技大学 SVR feature fusion non-reference JPEG image quality evaluation method
CN114782422A (en) * 2022-06-17 2022-07-22 电子科技大学 SVR feature fusion non-reference JPEG image quality evaluation method

Similar Documents

Publication Publication Date Title
WO2018058090A1 (en) Method for no-reference image quality assessment
Li et al. Which has better visual quality: The clear blue sky or a blurry animal?
Zhang et al. C-DIIVINE: No-reference image quality assessment based on local magnitude and phase statistics of natural scenes
Chandler Seven challenges in image quality assessment: past, present, and future research
Shen et al. Hybrid no-reference natural image quality assessment of noisy, blurry, JPEG2000, and JPEG images
Bovik Automatic prediction of perceptual image and video quality
US8660364B2 (en) Method and system for determining a quality measure for an image using multi-level decomposition of images
Ciancio et al. No-reference blur assessment of digital pictures based on multifeature classifiers
Vu et al. ViS 3: An algorithm for video quality assessment via analysis of spatial and spatiotemporal slices
Li et al. Content-partitioned structural similarity index for image quality assessment
Lin et al. Perceptual visual quality metrics: A survey
Liang et al. No-reference perceptual image quality metric using gradient profiles for JPEG2000
Kim et al. Gradient information-based image quality metric
WO2018186991A1 (en) Assessing quality of images or videos using a two-stage quality assessment
Wang et al. Quaternion representation based visual saliency for stereoscopic image quality assessment
US11310475B2 (en) Video quality determination system and method
Fang et al. BNB method for no-reference image quality assessment
WO2011097696A1 (en) Method and system for determining a quality measure for an image using a variable number of multi-level decompositions
WO2014070489A1 (en) Recursive conditional means image denoising
Wang et al. Gradient-based no-reference image blur assessment using extreme learning machine
US11960996B2 (en) Video quality assessment method and apparatus
Jin et al. Full RGB just noticeable difference (JND) modelling
Soundararajan et al. Machine vision quality assessment for robust face detection
Banitalebi-Dehkordi et al. An image quality assessment algorithm based on saliency and sparsity
Mittal et al. No-reference approaches to image and video quality assessment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17854111

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17854111

Country of ref document: EP

Kind code of ref document: A1