CN116071623A

CN116071623A - Model training method, image-based processing method, device, equipment and medium

Info

Publication number: CN116071623A
Application number: CN202310020434.7A
Authority: CN
Inventors: 严丽; 王志龙; 李雪; 罗海平; 李旭莉; 王伟
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2023-01-06
Filing date: 2023-01-06
Publication date: 2023-05-05

Abstract

The disclosure provides a model training method, an image-based processing device and an image-based processing medium, and relates to the technical field of artificial intelligence, in particular to the technical fields of computer vision, machine learning and image processing. The specific implementation scheme is as follows: the method comprises the steps of obtaining corresponding preset sample characteristic information of a training sample pair and obtaining sample labels, wherein the preset sample characteristic information comprises statistical values of preset visual indexes corresponding to basic sample image data and comparison sample image data respectively, and difference information of the statistical values of the same preset visual indexes, the sample labels are used for representing subjective evaluation scores of image quality of the comparison sample image data compared with the basic sample image data, the preset sample characteristic information is input into a preset image quality evaluation model, and the preset image quality evaluation model is trained according to output results and the sample labels. By adopting the technical scheme, the image quality scoring can be accurately predicted, the manual evaluation cost is reduced, and the scoring efficiency is improved.

Description

Model training method, image-based processing method, device, equipment and medium

Technical Field

The present disclosure relates to the field of artificial intelligence, and in particular, to the field of computer vision, machine learning, and image processing techniques.

Background

Currently, viewing media content such as pictures, videos, live broadcast and the like is an important form of information acquisition for users, and the quality of image quality of the media content is also an important factor affecting user experience. In order to improve the image quality, it is necessary to accurately quantify the perception of an image by a user, that is, to accurately evaluate the image quality.

Disclosure of Invention

The disclosure provides a model training method, an image-based processing device, image-based processing equipment and a medium.

According to an aspect of the present disclosure, there is provided a training method of an image quality evaluation model, including:

acquiring preset sample characteristic information corresponding to a training sample pair, and acquiring sample labels corresponding to the training sample pair, wherein the training sample pair comprises basic sample image data and control sample image data, the image data comprises video data and/or picture data, the preset sample characteristic information comprises statistical values of preset visual indexes corresponding to the basic sample image data and the control sample image data respectively, and first difference information of the statistical values of the basic sample image data and the control sample image data for the same preset visual indexes, and the sample labels are used for representing subjective evaluation scores of image quality of the control sample image data compared with the basic sample image data;

Inputting the preset sample characteristic information into a preset image quality evaluation model to obtain an output result of the preset image quality evaluation model;

and training the preset image quality evaluation model according to the output result of the preset image quality evaluation model and the sample label.

According to another aspect of the present disclosure, there is provided an image-based processing method including:

acquiring preset characteristic information corresponding to an image data pair, wherein the image data pair comprises basic image data and contrast image data, the image data comprises video data and/or picture data, the preset characteristic information comprises statistical values of preset visual indexes corresponding to the basic image data and the contrast image data respectively, and first difference information of the statistical values of the basic image data and the contrast image data for the same preset visual indexes;

and inputting the preset characteristic information into an image quality evaluation model to obtain an output result of the image quality evaluation model, wherein the image quality evaluation model is obtained by adopting the training method according to any embodiment of the disclosure, and the output result of the image quality evaluation model is used for representing a predicted evaluation score of the image quality of the comparison image data compared with the basic image data.

acquiring original characteristic information of image data to be processed, wherein the original characteristic information comprises a statistic value of a preset visual index corresponding to the image data to be processed;

constructing information to be input according to the original characteristic information, wherein the information to be input comprises the original characteristic information, and further comprises first difference information of statistical values of the same preset visual index in the original characteristic information and candidate characteristic information, and the candidate characteristic information comprises characteristic information obtained after adjustment on the basis of the original characteristic information;

inputting the information to be input into an image quality evaluation model to obtain an output result of the image quality evaluation model, wherein the image quality evaluation model is obtained by adopting the training method disclosed by any embodiment of the disclosure;

screening target characteristic information from the information to be input according to the output result of the image quality evaluation model;

and carrying out corresponding processing on the image data to be processed by utilizing the target characteristic information so that the processed target image data meets the target characteristic information.

According to another aspect of the present disclosure, there is provided a training apparatus of an image quality evaluation model, including:

the characteristic information acquisition module is used for acquiring preset sample characteristic information corresponding to a training sample pair, wherein the training sample pair comprises basic sample image data and control sample image data, the image data comprises video data and/or picture data, the preset sample characteristic information comprises statistical values of preset visual indexes corresponding to the basic sample image data and the control sample image data respectively, and first difference information of the statistical values of the basic sample image data and the control sample image data for the same preset visual indexes;

the sample label acquisition module is used for acquiring a sample label corresponding to the training sample pair, and the sample label is used for representing subjective evaluation scores of image quality of the control sample image data compared with the basic sample image data;

the characteristic information input module is used for inputting the characteristic information of the preset sample into a preset image quality evaluation model to obtain an output result of the preset image quality evaluation model;

and the model training module is used for training the preset image quality evaluation model according to the output result of the preset image quality evaluation model and the sample label.

According to another aspect of the present disclosure, there is provided an image-based processing apparatus including:

the information acquisition module is used for acquiring preset characteristic information corresponding to an image data pair, wherein the image data pair comprises basic image data and contrast image data, the image data comprises video data and/or picture data, the preset characteristic information comprises statistical values of preset visual indexes corresponding to the basic image data and the contrast image data respectively, and first difference information of the statistical values of the basic image data and the contrast image data for the same preset visual indexes;

the information input module is configured to input the preset feature information into an image quality evaluation model to obtain an output result of the image quality evaluation model, where the image quality evaluation model is obtained by using the training device according to any embodiment of the disclosure, and the output result of the image quality evaluation model is used to represent a predicted evaluation score of the image quality of the reference image data compared with the base image data.

the device comprises an original characteristic information acquisition module, a processing module and a processing module, wherein the original characteristic information acquisition module is used for acquiring original characteristic information of image data to be processed, and the original characteristic information comprises a statistic value of a preset visual index corresponding to the image data to be processed;

The information construction module is used for constructing information to be input according to the original characteristic information, wherein the information to be input comprises the original characteristic information, and further comprises first difference information of the original characteristic information and candidate characteristic information for the same statistical value of the preset visual index, and the candidate characteristic information comprises characteristic information obtained after adjustment on the basis of the original characteristic information;

the to-be-input information input module is used for inputting the to-be-input information into an image quality evaluation model to obtain an output result of the image quality evaluation model, wherein the image quality evaluation model is obtained by using the training device disclosed by any embodiment of the disclosure;

the target characteristic information screening module is used for screening target characteristic information from the information to be input according to the output result of the image quality evaluation model;

and the image processing module is used for carrying out corresponding processing on the image data to be processed by utilizing the target characteristic information so that the processed target image data meets the target characteristic information.

According to another aspect of the present disclosure, there is provided an electronic device including:

at least one processor; and

A memory communicatively coupled to the at least one processor; wherein,,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any embodiment of the present disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method of any embodiment of the present disclosure.

According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the method of any embodiment of the present disclosure.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a flowchart of a training method of an image quality assessment model provided according to an embodiment of the present disclosure;

FIG. 2 is a flow chart of another method of training an image quality assessment model provided in accordance with an embodiment of the present disclosure;

FIG. 3 is a flow chart of an image-based processing method provided in accordance with an embodiment of the present disclosure;

fig. 4 is a schematic view of an application scenario provided according to an embodiment of the present disclosure;

FIG. 5 is a flow chart of another image-based processing method provided in accordance with an embodiment of the present disclosure;

FIG. 6 is a schematic diagram of a training device for an image quality assessment model according to an embodiment of the present disclosure;

fig. 7 is a schematic structural view of an image-based processing apparatus provided according to an embodiment of the present disclosure;

fig. 8 is a schematic structural view of another image-based processing apparatus provided according to an embodiment of the present disclosure;

fig. 9 is a block diagram of an electronic device for implementing the methods of embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Fig. 1 is a flowchart of a training method for an image quality evaluation model according to an embodiment of the present disclosure, which is applicable to a case of training an image quality evaluation model. The method can be executed by a training device of an image quality evaluation model, and the device can be realized by adopting a hardware and/or software mode and can be configured in electronic equipment. Referring to fig. 1, the method specifically includes the following:

s101, acquiring preset sample characteristic information corresponding to a training sample pair and acquiring sample labels corresponding to the training sample pair, wherein the training sample pair comprises basic sample image data and control sample image data, the image data comprises video data and/or picture data, the preset sample characteristic information comprises statistical values of preset visual indexes corresponding to the basic sample image data and the control sample image data respectively, and first difference information of the statistical values of the basic sample image data and the control sample image data for the same preset visual indexes, and the sample labels are used for representing subjective evaluation scores of image quality of the control sample image data compared with the basic sample image data;

s102, inputting the characteristic information of the preset sample into a preset image quality evaluation model to obtain an output result of the preset image quality evaluation model;

S103, training the preset image quality evaluation model according to the output result of the preset image quality evaluation model and the sample label.

For the evaluation of image quality, the user perception is quantified mainly by two modes of subjective evaluation and objective evaluation. The subjective evaluation may be, for example, by means of human visual sense, making visual judgment on the picture quality of the images involved in comparison by the observer through the evaluation tool, calculating an average subjective score (Mean Opinion Score, MOS), where a higher MOS indicates that the transformed image is better than the original image, and a value generally ranges from 0 to 100, and a score of 50 indicates that the two images are basically equivalent in quality. However, there may be a difference in the criteria of quality of the image by different persons, there is a tendency to be a sense deviation, and the labor cost is high. In objective evaluation schemes, objective indicators are generally used to fit user perception, such as a video quality multi-method evaluation fusion (Visual Multimethod Assessment Fusion, VMAF) indicator, which comprehensively evaluates visual information fidelity (Visual Quality Fidelity, VIF), detail loss indicator (Detail Loss Measure, DLM), and temporal motion indicator/average correlation position pixel difference (Temporal Information, TI), so that it is difficult to evaluate visual influence of single visual information on MOS score. The inventor finds that after the visual index of the image is adjusted, the image quality perceived by the user may be improved, so that a higher MOS score is obtained, therefore, in the embodiment of the disclosure, the visual index related information is utilized to construct model input, and the association relationship between the visual index related information and the subjective evaluation score is fitted by using the model, so that the image quality score can be predicted by using the model instead of the visual sense of human.

In the embodiment of the disclosure, a certain number of pictures or videos may be selected in advance for determining the training sample pair. The training sample pair comprises basic sample image data and control sample image data, wherein the control sample image data can be obtained by adjusting the values of one or more visual indexes on the basis of the basic sample image data, and the basic sample image data and the control sample image data can correspond to the same image content. The visual index may be understood as an index affecting subjective feeling when a user views an image, and may include, for example, hue (hue), saturation (saturation), brightness (lightness), contrast (contrast), color (color), texture (texture), and the like. Hue is understood to be the intensity of the primary colors in the various image color modes, ranging in level from 0 to 255, white when the hue level is 255 and black when the level is 0. Saturation is understood to be the concentration of the colors of an image, with higher saturation being more saturated and lower saturation being more stale. Brightness is understood to mean the degree of darkness of light impinging on a scene or image. The contrast is understood to be the difference between the different colors, the larger the contrast between the different colors, i.e. the so-called black and white is clear, the larger the contrast, the more the image will appear to be sharp, and the smaller the contrast, the smaller the contrast between the different colors. Color can be understood as the value of each color channel in various image color modes. Texture can be understood as a visual feature reflecting a homogeneous phenomenon in an image, and represents a slowly-changing or periodically-changing surface structure organization arrangement attribute of the surface of an object, and can be used for describing the surface property of a scene corresponding to the image or the image area.

For example, after determining the training sample pair, a sample label corresponding to the training sample may be manually determined, where the sample label is used to represent a subjective evaluation score of image quality of the control sample image data compared to the base sample image data, and may specifically be the MOS score. For example, for a training sample pair, a certain number (e.g., 5-7) of related personnel are scored, and then the average value of the scores is calculated to obtain the corresponding MOS score, namely, the sample label is obtained.

The preset sample characteristic information for inputting the preset image quality evaluation model includes statistics of preset visual indexes corresponding to the basic sample image data and the reference sample image data respectively. Optionally, the preset visual index includes at least one of hue, saturation, brightness, contrast, color, and texture. The method has the advantages that the visual indexes which are convenient to quantify and adjust are selected to quantify the association relation with the MOS components, the accuracy of model fitting is improved, and the output result of the model is conveniently used as the reference for image adjustment, so that the image with higher image quality score can be adjusted. Optionally, the preset visual indicators include hue, saturation, brightness, contrast, color and texture, which has the advantage that the visual characteristics of the image can be more comprehensively measured. The type of the statistical value may include at least one of a maximum value, an average value, and a minimum value, and may further include, for example, a variance, a standard deviation, and the like. Optionally, the method can comprise a maximum value, an average value and a minimum value at the same time, and can give consideration to the richness of the features and the operation efficiency. The statistics may be statistics of a plurality of video frames, for example, a hue, an average value of hues of a plurality of video frames, or the like; the statistical value may be a statistical value of pixels or image areas, and may be a maximum value of brightness of a plurality of pixels or image areas, for example, as the brightness.

In the embodiment of the disclosure, the preset sample feature information further includes first difference information of statistical values of the basic sample image data and the comparison sample image data for the same preset visual index, so that the dimension of the input feature of the model can be enriched, and the accuracy of model fitting is improved. The first difference information may include a difference value, specifically, may be a difference value of a statistical value of the same type of the base sample image data and the reference sample image data for the same preset visual index, and, for example, may be a difference value of a maximum value of the brightness of the base sample image data and a maximum value of the brightness of the reference sample image data.

For example, the model fitting may be abstracted into a regression problem, and the preset image quality evaluation model may be a machine learning model, and specifically may be a decision tree model or a neural network model. Optionally, the preset image quality evaluation model includes an extreme gradient lifting tree (eXtreme Gradient Boosting, XGBoost) model, and the XGBoost model has the advantages of good learning effect, high training speed and the like.

The training scheme of the image quality evaluation model provided by the embodiment of the disclosure is used for inputting the preset sample characteristic information in the preset image quality evaluation model, wherein the preset sample characteristic information comprises the statistical values of preset visual indexes corresponding to the basic sample image data and the comparison sample image data respectively, and the first difference information of the statistical values of the basic sample image data and the comparison sample image data for the same preset visual indexes, and the difference of the visual characteristics and the visual characteristics of the two groups of sample data is fully represented by utilizing the preset sample characteristic information, so that the preset image quality evaluation model can accurately fit the association relation between the characteristic information and the subjective evaluation score, further the image quality score of a video or a picture can be accurately predicted, the manual evaluation cost is reduced, and the scoring efficiency is improved.

In an alternative embodiment, the preset sample characteristic information further includes at least one of the following: second difference information of the base sample image data for different types of statistical values of the same preset visual index; third difference information of the control sample image data for different types of statistical values of the same preset visual index; fourth difference information of the second difference information and the third difference information. The method has the advantages that the dimension of the input features of the model can be further enriched, and the accuracy of model fitting is further improved. Alternatively, the difference information may be a difference value. For example, still taking luminance as an example, the second difference information may include a difference value of a maximum value of luminance of the base sample image data; the third difference information may include a difference value of a maximum value of brightness of the control sample image data; the fourth difference information may include a difference of a maximum value of the luminance of the base sample image data and a difference of a maximum value of the luminance of the control sample image data, the differences of the two differences.

In an alternative embodiment, the model setting parameters of the XGBoost model include at least one of: boost employs a gbtree; the objective function is reg: gamma; the loss function is the average absolute error loss; gamma is 0.1; the maximum depth of the decision tree is 8; lambda takes a value of 3; the seed value is 1000. The method has the advantages that the training efficiency and the training effect of the model can be better considered by reasonably setting the model setting parameters of the XGBoost model, and the accuracy of model prediction is improved.

In an optional embodiment, the acquiring the preset sample characteristic information corresponding to the training sample pair includes: determining an image type to be trained, and acquiring preset sample characteristic information corresponding to a training sample pair corresponding to the image type; the step of inputting the preset sample characteristic information into a preset image quality evaluation model to obtain an output result of the preset image quality evaluation model comprises the following steps: and inputting the preset sample characteristic information into a preset image quality evaluation model corresponding to the image type to obtain an output result of the preset image quality evaluation model. The method has the advantage that the accuracy of model prediction can be further improved by performing targeted training on different types of image data.

The classification mode of the image types is not limited, and can be set according to actual requirements. For example, it is possible to divide the images according to objects contained in the images, such as persons, still objects, scenery, and the like; the images can be classified according to the purposes of the images, such as film and television, daily shooting, monitoring, news, etc.

In an alternative embodiment, the preset sample characteristic information includes a plurality of items; after training the preset image quality evaluation model according to the output result of the preset image quality evaluation model and the sample label, the method further comprises: determining a target image quality evaluation model according to the training result; and determining importance degree ordering of each item of preset sample characteristic information according to the sequence from shallow to deep of tree nodes in the target image quality evaluation model. The advantage of setting up like this is that for the model can be on the basis of output prediction image quality grading, can also give richer information according to the model structure that the training accomplished to can carry out more comprehensively to the influence factor of image quality and know, be favorable to assisting the adjustment to preset visual index, and then obtain MOS and divide higher image data.

Fig. 2 is a flowchart of another training method for an image quality evaluation model according to an embodiment of the disclosure, where the image data includes video data, and the optimization is performed based on the foregoing alternative embodiments, as shown in fig. 2, and the method may include:

s201, determining the type of an image to be trained, and acquiring preset sample characteristic information corresponding to a training sample pair corresponding to the type of the image.

Wherein the training sample pair comprises base sample image data and control sample image data, the image data comprising video data. The preset sample characteristic information comprises statistic values of preset visual indexes corresponding to the basic sample image data and the control sample image data respectively, first difference information of the statistic values of the same type of the basic sample image data and the control sample image data for the same preset visual indexes, second difference information of the statistic values of different types of the basic sample image data for the same preset visual indexes, third difference information of the statistic values of different types of the control sample image data for the same preset visual indexes, and fourth difference information of the second difference information and the third difference information. The preset visual indexes comprise hue, saturation, brightness, contrast, color and texture, and the types of the statistical values comprise maximum values, average values and minimum values.

Illustratively, table 1 shows specific contents of preset sample feature information.

Table 1 preset sample characteristic information

Where _average represents an average value, _max represents a maximum value, _min represents a minimum value, -DIFF represents first difference information (difference value), _diff1 represents second difference information, _diff2 represents third difference information, and _gap represents fourth difference information.

S202, acquiring a sample label corresponding to the training sample pair.

The sample label is used for representing subjective evaluation scores of image quality of the control sample image data compared with the basic sample image data, namely MOS values.

S203, inputting the characteristic information of the preset sample into a preset image quality evaluation model corresponding to the image type, and obtaining an output result of the preset image quality evaluation model.

The preset image quality evaluation model is an XGBoost model. The model setting parameters of the XGBoost model include: boost employs a gbtree; the objective function is reg: gamma; the loss function is the average absolute error loss; gamma is 0.1; the maximum depth of the decision tree is 8; lambda takes a value of 3; the seed value is 1000.

S204, training the preset image quality evaluation model according to the output result of the preset image quality evaluation model and the sample label.

In the training process, the training may be performed on the preset image quality evaluation model with the objective of minimizing the objective function until a preset training stop condition is satisfied, where the specific training stop condition may be set according to the actual requirement.

S205, determining a target image quality evaluation model according to the training result.

For example, after the preset training cutoff condition is satisfied, the obtained model may be determined as a trained target image quality evaluation model. In an embodiment, the obtained MAE value may reach 6.74, and the error between the predicted value and the true value of the model is small, and taking the MOS 20 division as an example, the predicted value may be basically in the same gear as the true value, so as to verify the accuracy of the model.

S206, determining importance degree ranking of each item of preset sample characteristic information according to the sequence from shallow to deep of tree nodes in the target image quality evaluation model.

Illustratively, in the tree structure of the trained XGBoost model, the tree nodes above, that is, the tree nodes with shallower levels, may be regarded as the tree nodes corresponding to the features with higher influence on the MOS score, and the importance degree ranking of the feature information of each preset sample is determined according to the order from shallow to deep of the tree nodes.

Taking a trained target image quality evaluation model corresponding to an image type as an example, the preset sample characteristic information items with the importance degree ranking of 7 are sequentially: hue_max-DIFF, hue_average-DIFF, saturation _max-DIFF, colorfulness _average-DIFF, hue_min-DIFF, contrast _average-DIFF, and color_max-DIFF. Therefore, the further constructed features have larger influence on the perception of the user on the basis of the statistic value of the preset visual index, and can be beneficial to model fitting. For a single video, saturation and hue also have a strong differentiation of MOS scores.

The training scheme of the image quality evaluation model provided by the embodiment of the disclosure is used for inputting the statistical value of the preset visual index corresponding to the image type in the preset image quality evaluation model, which is respectively corresponding to the basic sample image data and the comparison sample image data, the first difference information of the statistical value of the basic sample image data and the comparison sample image data for the same preset visual index, the difference information of the statistical value of the single sample image data for the same preset visual index, which is different types, and the difference between the difference information, and fully characterizing the visual characteristics and the difference of the visual characteristics of the two groups of sample data by utilizing the preset sample characteristic information, so that the preset image quality evaluation model can accurately fit the association relation between the characteristic information of the corresponding image type and the subjective evaluation score, further accurately predict the image quality score of a video or a picture, reduce the artificial evaluation cost, improve the scoring efficiency, and provide more abundant information according to the trained model structure, thereby being capable of comprehensively knowing the influence factors of the image quality, and being favorable for assisting in adjusting the preset visual index, and further obtaining the image quality with higher MOS score.

Fig. 3 is a flowchart of an image-based processing method provided according to an embodiment of the present disclosure, which is applicable to a case of evaluating image quality using an image quality evaluation model. The method may be performed by an image-based processing device, which may be implemented in hardware and/or software, and may be configured in an electronic device. Referring to fig. 3, the method specifically includes the following:

s301, acquiring preset characteristic information corresponding to an image data pair, wherein the image data pair comprises basic image data and contrast image data, the image data comprises video data and/or picture data, the preset characteristic information comprises statistical values of preset visual indexes corresponding to the basic image data and the contrast image data respectively, and first difference information of the statistical values of the basic image data and the contrast image data for the same preset visual indexes;

s302, inputting preset characteristic information into the image quality evaluation model to obtain an output result of the image quality evaluation model, wherein the output result of the image quality evaluation model is used for representing a predicted evaluation score of the image quality of the comparison image data compared with the basic image data.

The image quality evaluation model is obtained by adopting the training method disclosed by any embodiment of the disclosure.

For example, in the production stage, the transmission stage and the consumption stage of the image data, adjustment of the image data may be involved, and image quality evaluation may be performed on the adjusted image data. Fig. 4 is a schematic view of an application scenario provided according to an embodiment of the present disclosure, as shown in fig. 4, taking video data as an example, parameters (corresponding to preset visual indexes) may be adjusted in a shooting stage, and video frames taken in the shooting stage may be changed by adjusting values of one or more preset visual indexes (for example, by adding a special effect for beauty or a filter to a camera, etc., a mapping relationship between the special effect for beauty or the filter, etc. and the values of the preset visual indexes may exist), so as to obtain adjusted shooting video, that is, base image data includes shooting image data, and reference image data includes image data obtained by adjusting a shooting image; the video shot by the shooting device is generally stored or transmitted after being encoded and then transmitted to a consumer after being decoded, and the optimization and adjustment of preset visual indexes can be performed in the encoding and decoding process, namely, the basic image data comprise image data before encoding and decoding, and the contrast image data comprise image data after adjusting the image data before encoding and decoding; the video after encoding and decoding can be presented to the user through the player in the consumption stage, and before the video is presented to the user, the image data can be adjusted, for example, the image quality can be improved by adjusting parameters on the player to restore or adjust colors or contrast and the like aiming at the problem that the image quality possibly appears after multiple transmission or transfer, namely, the basic image data comprises the image data before display, and the contrast image data comprises the image data after adjustment of the image data before display.

Optionally, if, in the model training stage, the preset sample feature information further includes at least one of the second difference information, the third difference information and the fourth difference information, the preset feature information may include at least one of the following: second difference information of different types of statistical values of the same preset visual index for the basic image data; third difference information of the comparison image data on different types of statistical values of the same preset visual index; fourth difference information of the second difference information and the third difference information.

Optionally, if the image quality evaluation model performs classification training according to the image type in the training stage, the preset feature information is input into the image quality evaluation model corresponding to the image type to which the image data belongs.

According to the image-based processing scheme provided by the embodiment of the disclosure, the image quality of the adjusted image data is accurately evaluated by adopting the image quality evaluation model, whether the adjusted image data has better image quality or not can be accurately determined at each stage of the image data, whether the image quality enhancement effect meets expectations or whether the image quality is lost due to adjustment processing or not is judged, the cost of manual evaluation is reduced, and the influence of image adjustment on user perception is accurately evaluated.

Fig. 5 is a flowchart of another image-based processing method provided according to an embodiment of the present disclosure, which is applicable to a case of guiding image adjustment using an image quality evaluation model. The method may be performed by an image-based processing device, which may be implemented in hardware and/or software, and may be configured in an electronic device. Referring to fig. 5, the method specifically includes the following:

s501, acquiring original characteristic information of image data to be processed, wherein the original characteristic information comprises a statistic value of a preset visual index corresponding to the image data to be processed;

s502, constructing information to be input according to original characteristic information, wherein the information to be input comprises the original characteristic information, and further comprises first difference information of statistical values of the same preset visual indexes in the original characteristic information and candidate characteristic information, and the candidate characteristic information comprises characteristic information obtained after adjustment on the basis of the original characteristic information;

s503, inputting information to be input into the image quality evaluation model to obtain an output result of the image quality evaluation model;

S504, screening target characteristic information from the information to be input according to an output result of the image quality evaluation model;

s505, the image data to be processed is correspondingly processed by utilizing the target characteristic information, so that the processed target image data meets the target characteristic information.

As described above, the adjustment of the image data may be involved in the production phase, the transmission phase and the consumption phase of the image data, and accordingly, the image data to be processed may include captured image data, image data before encoding and decoding, image data before displaying, and the like. Before adjustment, multiple sets of candidate feature information can be determined according to original feature information of the image data to be processed, namely possible adjusted feature information is estimated, information to be input of an image quality evaluation model is constructed (the number of the sets of the information to be input is generally consistent with that of the candidate feature information), finally adopted target feature information is selected through image quality grading of the image quality evaluation model, the target feature information can be understood as information to be input corresponding to the image quality grading highest or the image quality grading higher than a preset grading threshold value in an output result of the model, and the value of a corresponding preset visual index in the image data to be processed is adjusted, so that the feature information determined according to the image data to be processed and the adjusted target image data is consistent with the target feature information, or a difference between the feature information and the target feature information is within a preset difference range, and therefore the image data with better image quality is obtained.

Alternatively, a Grid Search (Grid Search) or the like may be used to Search within a preset adjustment range corresponding to the original feature information, so as to determine candidate feature information. Different preset visual indexes can correspond to different preset adjustment ranges and can be preset according to actual requirements.

According to the image-based processing scheme provided by the embodiment of the disclosure, the image quality of the adjusted image data can be accurately estimated in advance by adopting the image quality evaluation model, better image quality can be obtained by accurately estimating the adjusted image data at each stage of the image data, further, the image quality enhancement effect is selected to meet the expected adjustment parameters, so that the image to be adjusted can be adjusted in a targeted manner, repeated adjustment operation is reduced, image adjustment efficiency is improved, and parameter adjustment cost is reduced on the basis of ensuring that the image quality of the adjusted image data is improved.

In an optional embodiment, if during the model training phase, the preset sample feature information further includes at least one of the second difference information, the third difference information, and the fourth difference information, the information to be input may include at least one of the following: second difference information of different types of statistical values of the image data to be processed for the same preset visual index; third difference information of the candidate feature information on different types of statistical values of the same preset visual index; fourth difference information of the second difference information and the third difference information.

In an alternative embodiment, if the image quality evaluation model performs classification training according to the image type in the training stage, the information to be input is input into the image quality evaluation model corresponding to the image type to which the image data to be processed belongs.

In an optional implementation manner, the constructing the information to be input according to the original characteristic information includes: and according to the importance degree sequence of each item of information to be input corresponding to the image quality evaluation model, a preset adjustment range corresponding to each item of information to be input and the original characteristic information, constructing the information to be input. The method has the advantages that the feature parameters which are adjusted preferentially can be guided according to the importance degree ordering, so that the target feature information can be quickly found out, and the image adjustment efficiency is further improved.

For example, as illustrated above, if the information item to be input with the highest importance ranking is hue_max-DIFF, the hue_max-DIFF may be preferentially adjusted when the information to be input is constructed, so that the MOS score of the model output may be rapidly improved.

Fig. 6 is a schematic structural diagram of a training device for an image quality evaluation model according to an embodiment of the present disclosure, which is applicable to a case of training an image quality evaluation model. The device can be realized in a hardware and/or software mode and can be configured in electronic equipment. Referring to fig. 6, the training device 600 for the image quality evaluation model specifically includes:

The feature information obtaining module 601 is configured to obtain preset sample feature information corresponding to a training sample pair, where the training sample pair includes basic sample image data and control sample image data, the image data includes video data and/or picture data, the preset sample feature information includes statistics values of preset visual indicators corresponding to the basic sample image data and the control sample image data, respectively, and first difference information of statistics values of the basic sample image data and the control sample image data for the same preset visual indicators;

a sample label obtaining module 602, configured to obtain a sample label corresponding to the training sample pair, where the sample label is used to represent a subjective evaluation score of an image quality of the control sample image data compared to the base sample image data;

the feature information input module 603 is configured to input the feature information of the preset sample into a preset image quality evaluation model, so as to obtain an output result of the preset image quality evaluation model;

the model training module 604 is configured to train the preset image quality evaluation model according to the output result of the preset image quality evaluation model and the sample label.

In an alternative embodiment, the preset visual index includes at least one of hue, saturation, brightness, contrast, color, and texture.

In an alternative embodiment, the type of the statistical value includes at least one of a maximum value, an average value, and a minimum value.

In an alternative embodiment, the preset sample characteristic information further includes at least one of the following:

Second difference information of the base sample image data for different types of statistical values of the same preset visual index;

third difference information of the control sample image data for different types of statistical values of the same preset visual index;

fourth difference information of the second difference information and the third difference information.

In an alternative embodiment, the preset image quality assessment model includes an extreme gradient boost tree XGBoost model.

In an alternative embodiment, the model setting parameters of the XGBoost model include at least one of:

boost employs a gbtree; the objective function is reg: gamma; the loss function is the average absolute error loss; gamma is 0.1; the maximum depth of the decision tree is 8; lambda takes a value of 3; the seed value is 1000.

In an optional implementation manner, the feature information acquisition module is specifically configured to: determining an image type to be trained, and acquiring preset sample characteristic information corresponding to a training sample pair corresponding to the image type;

the characteristic information input module is specifically used for: and inputting the preset sample characteristic information into a preset image quality evaluation model corresponding to the image type to obtain an output result of the preset image quality evaluation model.

In an alternative embodiment, the preset sample characteristic information includes a plurality of items; the apparatus further comprises:

the model determining module is used for determining a target image quality evaluation model according to a training result after training the preset image quality evaluation model according to an output result of the preset image quality evaluation model and the sample label;

and the ranking determining module is used for determining the importance ranking of the preset sample characteristic information according to the sequence from shallow to deep of the tree nodes in the target image quality evaluation model.

Fig. 7 is a schematic structural diagram of an image-based processing apparatus according to an embodiment of the present disclosure, which is applicable to a case of evaluating image quality using an image quality evaluation model. The device can be realized in a hardware and/or software mode and can be configured in electronic equipment. Referring to fig. 7, the image-based processing apparatus 700 specifically includes:

an information obtaining module 701, configured to obtain preset feature information corresponding to an image data pair, where the image data pair includes base image data and reference image data, the image data includes video data and/or picture data, the preset feature information includes statistics of preset visual indicators corresponding to the base image data and the reference image data, respectively, and first difference information of statistics of the base image data and the reference image data for the same preset visual indicators;

An information input module 702, configured to input the preset feature information into an image quality evaluation model, to obtain an output result of the image quality evaluation model, where the image quality evaluation model is obtained by using the training device according to any embodiment of the disclosure, and the output result of the image quality evaluation model is used to represent a predicted evaluation score of the image quality of the reference image data compared with the base image data.

Fig. 8 is a schematic structural diagram of another image-based processing apparatus provided according to an embodiment of the present disclosure, which is applicable to a case of guiding image adjustment using an image quality evaluation model. The device can be realized in a hardware and/or software mode and can be configured in electronic equipment. Referring to fig. 8, the image-based processing apparatus 800 specifically includes:

An original feature information obtaining module 801, configured to obtain original feature information of image data to be processed, where the original feature information includes a statistical value of a preset visual index corresponding to the image data to be processed;

the information construction module 802 is configured to construct information to be input according to the original feature information, where the information to be input includes the original feature information, and further includes first difference information of statistical values for the same preset visual index in the original feature information and candidate feature information, and the candidate feature information includes feature information obtained after adjustment based on the original feature information;

the to-be-input information input module 803 is configured to input the to-be-input information into an image quality evaluation model, to obtain an output result of the image quality evaluation model, where the image quality evaluation model is obtained by using the training device according to any embodiment of the disclosure;

the target feature information screening module 804 is configured to screen target feature information from the information to be input according to an output result of the image quality evaluation model;

and the image processing module 805 is configured to perform corresponding processing on the image data to be processed by using the target feature information, so that the target image data obtained after the processing meets the target feature information.

The image-based processing scheme provided by the embodiment of the disclosure can accurately evaluate the image quality of the adjusted image data in advance by adopting the image quality evaluation model, can accurately estimate the adjusted image data at each stage of the image data so as to obtain better image quality, and then the image quality enhancement effect is selected to meet the expected adjustment parameters so as to adjust the image to be adjusted in a targeted manner, and on the basis of ensuring that the image quality of the adjusted image data is improved, repeated adjustment operation is reduced, the image adjustment efficiency is improved, and the parameter adjustment cost is reduced.

In an alternative embodiment, the information construction module is specifically configured to:

and according to the importance degree sequence of each item of information to be input corresponding to the image quality evaluation model, a preset adjustment range corresponding to each item of information to be input and the original characteristic information, constructing the information to be input.

In the technical scheme of the disclosure, the related personal information of the user is collected, stored, used, processed, transmitted, provided, disclosed and the like, all conform to the regulations of related laws and regulations and do not violate the popular public order.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

Fig. 9 shows a schematic block diagram of an example electronic device 900 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 9, the apparatus 900 includes a computing unit 901 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 902 or a computer program loaded from a storage unit 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data required for the operation of the device 900 can also be stored. The computing unit 901, the ROM 902, and the RAM 903 are connected to each other by a bus 904. An input/output (I/O) interface 905 is also connected to the bus 904.

Various components in device 900 are connected to I/O interface 905, including: an input unit 906 such as a keyboard, a mouse, or the like; an output unit 907 such as various types of displays, speakers, and the like; a storage unit 908 such as a magnetic disk, an optical disk, or the like; and a communication unit 909 such as a network card, modem, wireless communication transceiver, or the like. The communication unit 909 allows the device 900 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunications networks.

The computing unit 901 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 901 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The calculation unit 901 performs the respective methods and processes described above, for example, a training method of an image quality evaluation model or an image-based processing method. For example, in some embodiments, the training method of the image quality assessment model or the image-based processing method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 908. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 900 via the ROM 902 and/or the communication unit 909. When the computer program is loaded into the RAM 903 and executed by the computing unit 901, one or more steps of the training method of the image quality evaluation model or the image-based processing method described above may be performed. Alternatively, in other embodiments, the computing unit 901 may be configured to perform a training method of the image quality evaluation model or an image-based processing method by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above can be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome. The server may also be a server of a distributed system or a server that incorporates a blockchain.

Artificial intelligence is the discipline of studying the process of making a computer mimic certain mental processes and intelligent behaviors (e.g., learning, reasoning, thinking, planning, etc.) of a person, both hardware-level and software-level techniques. Artificial intelligence hardware technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing, and the like; the artificial intelligent software technology mainly comprises a computer vision technology, a voice recognition technology, a natural language processing technology, a machine learning/deep learning technology, a big data processing technology, a knowledge graph technology and the like.

Cloud computing (cloud computing) refers to a technical system that a shared physical or virtual resource pool which is elastically extensible is accessed through a network, resources can comprise servers, operating systems, networks, software, applications, storage devices and the like, and resources can be deployed and managed in an on-demand and self-service mode. Through cloud computing technology, high-efficiency and powerful data processing capability can be provided for technical application such as artificial intelligence and blockchain, and model training.

It should be understood that various forms of the flow shown above may be used, reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel, sequentially, or in a different order, provided that the desired results of the technical solutions provided by the present disclosure are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A training method of an image quality evaluation model comprises the following steps:

2. The method of claim 1, wherein the preset visual indicators comprise at least one of hue, saturation, brightness, contrast, color, and texture.

3. The method of any of claims 1-2, wherein the type of statistical value comprises at least one of a maximum value, an average value, and a minimum value.

4. A method according to any one of claims 1-3, wherein the pre-set sample characteristic information further comprises at least one of:

5. The method of any of claims 1-4, wherein the pre-set image quality assessment model comprises an extreme gradient lifting tree XGBoost model.

6. The method of claim 5, wherein the model setting parameters of the XGBoost model comprise at least one of:

7. The method according to any one of claims 1-6, wherein the acquiring the corresponding preset sample characteristic information of the training sample pair includes:

determining an image type to be trained, and acquiring preset sample characteristic information corresponding to a training sample pair corresponding to the image type;

the step of inputting the preset sample characteristic information into a preset image quality evaluation model to obtain an output result of the preset image quality evaluation model comprises the following steps:

and inputting the preset sample characteristic information into a preset image quality evaluation model corresponding to the image type to obtain an output result of the preset image quality evaluation model.

8. The method of any of claims 5-7, wherein the preset sample characteristic information comprises a plurality of items; after training the preset image quality evaluation model according to the output result of the preset image quality evaluation model and the sample label, the method further comprises:

Determining a target image quality evaluation model according to the training result;

and determining importance degree ordering of each item of preset sample characteristic information according to the sequence from shallow to deep of tree nodes in the target image quality evaluation model.

9. An image-based processing method, comprising:

inputting the preset characteristic information into an image quality evaluation model to obtain an output result of the image quality evaluation model, wherein the image quality evaluation model is obtained by adopting the training method according to any one of claims 1-8, and the output result of the image quality evaluation model is used for representing a predicted evaluation score of the image quality of the comparison image data compared with the basic image data.

10. An image-based processing method, comprising:

inputting the information to be input into an image quality evaluation model to obtain an output result of the image quality evaluation model, wherein the image quality evaluation model is obtained by adopting the training method according to any one of claims 1-8;

screening target characteristic information from the characteristic information to be input according to an output result of the image quality evaluation model;

11. The method of claim 10, wherein the constructing information to be input from the raw characteristic information comprises:

12. An image quality evaluation model training apparatus comprising:

13. The apparatus of claim 12, wherein the preset visual indicators comprise at least one of hue, saturation, brightness, contrast, color, and texture.

14. The apparatus of any of claims 12-13, wherein the type of statistic includes at least one of a maximum value, an average value, and a minimum value.

15. The apparatus of any of claims 12-14, wherein the preset sample characteristic information further comprises at least one of:

16. The apparatus of any of claims 12-15, wherein the pre-set image quality assessment model comprises an extreme gradient lifting tree XGBoost model.

17. The apparatus of claim 16, wherein the model setting parameters of the XGBoost model comprise at least one of:

18. The device according to any one of claims 12-17, wherein,

the characteristic information acquisition module is specifically used for: determining an image type to be trained, and acquiring preset sample characteristic information corresponding to a training sample pair corresponding to the image type;

19. The apparatus of any of claims 16-18, wherein the preset sample characteristic information comprises a plurality of items; the apparatus further comprises:

20. An image-based processing apparatus comprising:

an information input module, configured to input the preset feature information into an image quality evaluation model, to obtain an output result of the image quality evaluation model, where the image quality evaluation model is obtained by using the training device according to any one of claims 12 to 19, and the output result of the image quality evaluation model is used to represent a predicted evaluation score of the image quality of the reference image data compared with the base image data.

21. An image-based processing apparatus comprising:

the to-be-input information input module is used for inputting the to-be-input information into an image quality evaluation model to obtain an output result of the image quality evaluation model, wherein the image quality evaluation model is obtained by using the training device according to any one of claims 12-19;

22. The apparatus of claim 21, wherein the information construction module is specifically configured to:

23. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-11.

24. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-11.