CN114513655A

CN114513655A - Live video quality evaluation method, video quality adjustment method and related device

Info

Publication number: CN114513655A
Application number: CN202210182069.5A
Authority: CN
Inventors: 刘杰洪
Original assignee: Guangzhou Cubesili Information Technology Co Ltd
Current assignee: Guangzhou Cubesili Information Technology Co Ltd
Priority date: 2022-02-25
Filing date: 2022-02-25
Publication date: 2022-05-17

Abstract

The application provides a live video quality evaluation method, a video quality adjustment method and a related device, wherein the live video quality evaluation method comprises the steps of obtaining a video code stream to be evaluated; extracting code stream characteristic information from a video code stream to be evaluated, and inputting the code stream characteristic information into a pre-established video quality evaluation model to obtain quality scores of live videos; the video quality evaluation model is obtained by taking code stream characteristic information extracted from historical video data of a live broadcast scene as input and real video scoring data calculated according to the historical video data as output to train at least one machine learning model. On one hand, the method is a no-reference method based on code stream information, and can efficiently and conveniently calculate the quality score of a live video in a live scene, so that the video quality of a video code stream is monitored; in addition, at least one machine learning model is adopted, so that the accuracy of live video quality calculation can be improved.

Description

Live video quality evaluation method, video quality adjustment method and related device

Technical Field

The application relates to the technical field of network live broadcast, in particular to a live broadcast video quality evaluation method, a live broadcast video quality adjustment method, a live broadcast video quality evaluation device, equipment or a server and a computer readable storage medium.

Background

With the continuous development and integration of network and multimedia technologies, streaming media video services (e.g., live webcasting) relying on the internet have gradually become an important part of people's daily life and work. In the network live broadcast, users can enter a live broadcast room through respective terminal equipment to watch the wonderful performance of the main broadcast. However, the quality of live video is an important factor affecting the live broadcasting effect, when a video quality problem occurs in the live broadcasting process, if the video quality problem cannot be timely adjusted or improved, the live broadcasting effect can be directly affected, and therefore the satisfaction degree of audiences is seriously reduced. Therefore, it is important to accurately evaluate the quality of the live video.

Existing video quality evaluation methods can be classified into 3 categories according to the degree of dependence on source video data (i.e., video before encoding) or the degree of dependence: full reference, partial reference and no reference methods. However, in a live scene, source video image data is often difficult to acquire, and therefore, the video quality evaluation methods of full reference and partial reference are not suitable for being used in the live scene. The reference-free method can completely not depend on source video data, and the video quality can be evaluated only by utilizing the coded video. Wherein the no-reference method is mainly a pixel information based method; the method based on the pixel information is to acquire the characteristic information from the image after decoding the coding code stream to obtain the pixel-level image and then to evaluate the image quality.

At present, in a live broadcast scene, a common video quality evaluation method is a method based on pixel information, however, the method based on pixel information needs to be performed after a whole frame or a plurality of frames of images are fully decoded, which is inefficient for a live broadcast scene with high real-time requirements.

Disclosure of Invention

In view of this, embodiments of the present application provide a live video quality evaluation method, apparatus, device, and storage medium.

In a first aspect, an embodiment of the present application provides a live video quality evaluation method, including:

acquiring a video code stream to be evaluated; the video code stream to be evaluated is formed by coding a live broadcast source video needing video quality evaluation;

extracting code stream characteristic information from the video code stream to be evaluated, and inputting the code stream characteristic information into a pre-established video quality evaluation model to obtain quality scores of live videos;

the video quality evaluation model is obtained by taking code stream characteristic information extracted from historical video data of a live broadcast scene as input and taking real video scoring data calculated according to the historical video data as output to train at least one machine learning model.

In a second aspect, an embodiment of the present application provides a method for adjusting quality of a live video, where the method includes:

acquiring a live video stream, and extracting a plurality of frames of video code streams from the live video stream;

calculating the quality score of the live video by adopting the live video quality evaluation method of the first aspect according to a plurality of frames of the video code stream;

when the quality score exceeds a preset quality score range, judging that the quality of the live video is abnormal;

counting the abnormal times of the quality of the live video in a preset time period, and adjusting the related parameters of the live video when the abnormal times exceed a preset threshold value so as to complete the adjustment of the quality of the live video.

In a third aspect, an embodiment of the present application provides a live video quality evaluation apparatus, where the apparatus includes:

the video code stream acquisition module is used for acquiring a video code stream to be evaluated; the video code stream to be evaluated is formed by coding a live broadcast source video needing video quality evaluation;

the code stream characteristic information extraction module is used for extracting code stream characteristic information from the video code stream to be evaluated;

the quality score obtaining module is used for inputting the code stream characteristic information into a pre-established video quality evaluation model so as to obtain the quality score of the live video;

In a fourth aspect, an embodiment of the present application provides an apparatus for adjusting quality of a live video, where the apparatus includes:

the live video stream acquisition module is used for acquiring a live video stream;

the video code stream extraction module is used for extracting a plurality of frames of video code streams from the live video stream;

the quality score calculation module is used for extracting code stream characteristic information from a plurality of frames of video code streams, and inputting the code stream characteristic information into a pre-established video quality evaluation model to calculate the quality score of the live video;

the video quality evaluation model is obtained by taking code stream characteristic information extracted from historical video data of a live broadcast scene as input and taking real video scoring data calculated according to the historical video data as output to train at least one machine learning model;

the abnormity judgment module is used for judging that the quality of the live video is abnormal when the quality score exceeds a preset quality score range;

and the video quality adjusting module is used for counting the abnormal times of the quality of the live video in a preset time period, and adjusting the related parameters of the live video when the abnormal times exceed a preset threshold value so as to complete the adjustment of the quality of the live video.

In a fifth aspect, an embodiment of the present application provides a terminal device or a server, including: a memory; one or more processors coupled with the memory; one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, and wherein the one or more applications are configured to perform the live video quality evaluation method provided by the first aspect and/or the live video quality adjustment method provided by the second aspect.

In a sixth aspect, an embodiment of the present application provides a computer-readable storage medium, where a program code is stored in the computer-readable storage medium, and the program code may be invoked by a processor to execute the live video quality evaluation method provided in the first aspect and/or the live video quality adjustment method provided in the second aspect.

The method, the device, the equipment and the computer readable storage medium for evaluating the quality of the live video acquire a video code stream to be evaluated; the video code stream to be evaluated is formed by coding a live broadcast source video needing video quality evaluation; extracting code stream characteristic information from a video code stream to be evaluated, and inputting the code stream characteristic information into a pre-established video quality evaluation model to obtain quality scores of live videos; the video quality evaluation model is obtained by taking code stream characteristic information extracted from historical video data of a live broadcast scene as input and real video scoring data calculated according to the historical video data as output to train at least one machine learning model.

The live video quality evaluation method is a no-reference method based on code stream information, namely, the characteristic information can be directly extracted through the coded or transcoded video code stream, a video frame does not need to be completely decoded, and simultaneously, source video does not need to participate, so that the quality score of the live video can be efficiently and conveniently calculated in a live scene, and the video quality monitoring of the video code stream is realized; on the other hand, the method adopts at least one machine learning model, and the accuracy of live video quality calculation can be improved.

The method, the device, the equipment and the computer readable storage medium for adjusting the quality of the live video acquire the live video stream, and extract a plurality of frames of video code streams from the live video stream; calculating the quality score of the live video by adopting the live video quality evaluation method in the first aspect according to the video code streams of the frames; when the quality score exceeds a preset quality score range, judging that the quality of the live video is abnormal; counting the abnormal times of the quality of the live video in a preset time period, and adjusting the related parameters of the live video when the abnormal times exceed a preset threshold value so as to complete the adjustment of the quality of the live video. According to the method, due to the adoption of the live video quality evaluation method, the quality score of the live video can be rapidly and accurately calculated, and then the calculated quality score of the live video is compared with the preset quality score range, so that whether the direct current of the live video has problems or not can be rapidly and accurately determined, and when the direct video has problems, the direct current of the live video can be timely adjusted.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

Fig. 1 is a schematic view of an application scenario of a live video quality evaluation method and/or a live video quality adjustment method provided in an embodiment of the present application;

fig. 2 is a schematic flowchart of a live video quality evaluation method according to an embodiment of the present application;

FIG. 3 is a diagram illustrating a 64 × 64 coding unit according to an embodiment of the present application;

FIG. 4 is a schematic structural diagram of a cross-validation method according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of a stacking model provided in an embodiment of the present application;

fig. 6 is a schematic structural diagram of a stacking model fusion method according to an embodiment of the present disclosure;

fig. 7 is a schematic flowchart of establishing a video quality evaluation model according to an embodiment of the present application;

fig. 8 is a flowchart illustrating a method for adjusting quality of a live video according to an embodiment of the present application;

fig. 9 is a diagram illustrating a preset video quality score range according to an embodiment of the present application;

fig. 10 is a block diagram of a live video quality evaluation apparatus according to an embodiment of the present application;

fig. 11 is a block diagram of an apparatus for adjusting quality of live video according to an embodiment of the present application;

fig. 12 is a schematic structural diagram of a terminal device provided in an embodiment of the present application;

fig. 13 is a schematic structural diagram of a computer-readable storage medium provided in an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described clearly and completely below, and it should be understood that the described embodiments are only a part of the embodiments of the present application, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

For more detailed explanation of the present application, a live video quality evaluation method, an apparatus, a terminal device, and a computer storage medium provided in the present application are specifically described below with reference to the accompanying drawings.

Referring to fig. 1, fig. 1 is a schematic diagram illustrating an application scenario of a live video quality evaluation method provided by an embodiment of the present application, where the application scenario includes a server 102, at least one live terminal 104, and a client 106 provided by an embodiment of the present application. Wherein a network is provided between server 102, anchor 104, and client 106. The network is used to provide a medium for communication links between server 102, anchor 104, and clients 106. The network may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few. Server 102 is capable of communicating with anchor 104 and client 106 to provide live services to anchor 104 and/or client 106. For example, anchor 104 may send a live video stream of a live room to server 102, and a user may access server 102 through client 106 to view the live video of the live room. For another example, the server 102 may also send a notification message to the user's client 106 when the user subscribes to a live room. The live video stream can be a video stream currently live in a live platform or a complete video stream formed after the live broadcast is completed.

In some implementation scenarios, anchor 104 and client 106 may be used interchangeably. For example, a anchor may use anchor 104 to provide live video services to viewers, as well as to view live video provided by other anchors as users. For another example, the user may use the client 106 to view live video provided by a main broadcast of interest, or may serve as a main broadcast to provide live video services to other viewers.

In this embodiment, the anchor terminal 104 and the client 106 are both terminal devices, and may be various electronic devices with a display screen, including but not limited to a smart phone, a personal digital assistant, a tablet computer, a personal computer, a notebook computer, a virtual reality terminal device, an augmented reality terminal device, and the like. The anchor 104 and the client 106 may have internet products installed therein for providing internet live services, for example, the internet products may be applications APP, Web pages, applets, and the like used in computers or smart phones and related to internet live services.

It is understood that the application scenario shown in fig. 1 is only one possible example, and in other possible embodiments, the application scenario may include only some of the components shown in fig. 1 or may also include other components. For example, the application scenario shown in fig. 1 may further include a video capture terminal 108 for capturing a live video frame of the anchor, where the video capture terminal 108 may be directly installed or integrated in the anchor 104, or may be independent of the anchor 104, and the like, and the embodiment is not limited herein.

It should be understood that the number of the clients 106 and the anchor 104 may be plural, may be only a few, or may be tens or hundreds, and the number and the type of the clients are not limited in the embodiment of the present application. For convenience of description, only one client 106, one host 104 and one server 102 are taken as examples for description.

In webcasting, live video quality (i.e., video picture quality) is an important factor affecting the live effect. For the video quality problem occurring in the live broadcasting process, if the adjustment and improvement are continuously obtained, the satisfaction degree of the audience watching the live broadcasting is seriously reduced. The live video quality of the current anchor broadcast push is determined by preset broadcast parameters of different gears, under different network environments and client equipment environments, the preset broadcast is adopted for broadcast, however, due to certain reasons in actual live broadcast, the video quality of the pushed video stream may have large deviation or even abnormity with the preset broadcast parameters, the live video quality is often poor at the moment, however, the problem of video quality can not be found in time and the video quality can be adjusted in time in the live broadcast process, the live broadcast effect is poor, and at present, the problem can only be found by checking logs afterwards. Similarly, the video watched by the client 106 is obtained by transcoding the video pushed by the anchor terminal 104 on the server 102 according to different configurations, the quality of the live video watched by the client 106 depends on the transcoded video stream, and at the same time, the quality of the live video watched by the client 106 largely depends on the physical decoding device used by the viewer, and timely monitoring of the picture quality of the video stream actually watched by the client 106 is an important measure for ensuring that the good live watching experience is obtained by the viewer. Therefore, it is very important to evaluate the live video quality and adjust the video quality in the live network. Based on the above, the embodiment of the application provides a live video quality evaluation method and a live video quality adjustment method; as can be known from an application scenario of network live broadcast, monitoring of live video quality is mainly used for two nodes, namely a live video stream pushed by a main broadcast end and a video stream decoded by a viewer end, so that the live video quality evaluation method and the live video quality adjustment method can be used for the server 102 and the client 106, that is, the server 102 or the client 106 in fig. 1 can be used to execute the live video quality evaluation method and/or the live video quality adjustment method provided in the embodiment of the present application, which is not limited herein.

Based on this, the embodiment of the application provides a live video quality evaluation method. Referring to fig. 2, fig. 2 is a schematic flowchart illustrating a method for evaluating quality of a live video according to an embodiment of the present application, and taking an example that the method is applied to a server or a client in fig. 1 as an example, the method includes the following steps:

step S110, obtaining a video code stream to be evaluated.

The video code stream to be evaluated is formed by coding a live broadcast source video needing video quality evaluation.

Specifically, in a live network scene, a video stream generated by a anchor user during live broadcasting at a live broadcasting end is recorded as a live broadcasting source video. After the live broadcast source video is obtained, the live broadcast source video needs to be coded to form a video code stream, and then the video code stream is pushed to a server. After receiving the video code stream of the main broadcast stream, the server can transcode the video code stream according to different clients and convert the video code stream of the main broadcast stream into video code streams of other video formats or parameters; the client can pull the video code streams with different parameter configurations from the server and then decode the video code streams, so as to obtain and display video pictures. In the process, when live broadcast development and operation and maintenance personnel want to know the quality of live broadcast video, live broadcast video code streams can be collected in real time or at regular time and recorded as video code streams to be evaluated, and then direct current scores of live broadcast videos are determined according to the video code streams.

Optionally, when the video code stream in the anchor is collected, the video code stream may be directly collected from the anchor (that is, formed after the live source video is coded), or the video code stream of the anchor push stream may be directly extracted from the server, or the video code stream after transcoding; or extracting the pulled video code stream from the client.

And step S120, extracting code stream characteristic information from the video code stream to be evaluated, and inputting the code stream characteristic information into a pre-established video quality evaluation model to obtain the quality score of the live video.

The video quality evaluation model is obtained by taking code stream characteristic information extracted from historical video data of a live broadcast scene as input and real video scoring data calculated according to the historical video data as output to train at least one machine learning model.

Specifically, the code stream characteristic information is some parameters related to coding, and is typically some information related to a coding unit, including but not limited to a coding unit type, a coding unit size, a partition manner of the coding unit, a prediction type of the coding unit, a motion vector of the coding unit, a quantization coefficient of the coding unit, and the like. The feature information that can be directly extracted from the code stream contains the specific properties of the source video in terms of time, space and quantization in the encoding process, and is an important factor that can directly reflect the picture quality of the corresponding video frame.

The video code stream usually includes a plurality of frames, and each frame of the video code stream includes a coding unit. The coding unit is the smallest bit combination that represents the unit of encoded text for switching or processing. Alternatively, there are various sizes and types of coding units, and the size may be 64 × 64 (the structure of which is shown in fig. 3), 32 × 32, 16 × 16, 8 × 8, 32 × 16, 16 × 8, or the like; the types of the coding units comprise an i-frame coding unit, a b-frame coding unit and a p-frame coding unit.

In this embodiment, for more convenient calculation, after extracting the code stream characteristic information (i.e. the information related to the coding units) from the video code stream, data sorting and statistics are performed on the extracted code stream characteristic information, so as to obtain a higher dimensional data characteristic matrix, which includes but is not limited to the encoder type, the code rate mode, the frame rate, the resolution, the code rate, the encoder gear, the I-frame interval, the type of the encoded segment, the proportion of the coding units of various sizes in one frame, the proportion of the coding units of various types in one frame, the proportion of the intra-frame prediction units of various sizes in one frame, the proportion of the inter-frame prediction units of various sizes in one frame, the proportion of the skip type prediction units of various sizes in one frame, the proportion of the different coding unit division modes in one frame, and the statistical data (average value, and statistical data of the coding quantization coefficients of the coding units in one frame, Median, 1/4 quantile, 3/4 quantile, maximum, minimum, standard deviation, variance), the proportion of different types of prediction units in a frame, and statistics related to motion vectors (average length, maximum in x/y direction, and proportion of 0 vector length).

The video quality evaluation model is obtained by training at least one machine learning model by using code stream characteristic information extracted from historical video data of a live broadcast scene and real video scoring data (namely video quality scoring) calculated according to the historical video data in advance, and can reflect the mapping relation between the code stream characteristic information and the video quality scoring. Therefore, when the quality of the live video needs to be evaluated, the code stream characteristic information extracted from the video code stream can be input into the video quality evaluation model, the quality score of the live video can be calculated, and the quality score can intuitively and accurately reflect the video quality condition.

Additionally, machine learning models refer to loading large amounts of data into a computer program and selecting a model to fit the data so that the computer derives a prediction. The way in which a computer creates a model is done by algorithms that include both simple equations (such as straight line equations) and very complex logic/mathematical systems that allow the computer to derive the best predictions. The prediction precision of the machine learning model is high.

In an alternative embodiment, one or more machine learning models can be selected to form a video quality evaluation model, different types of machine learning models are usually selected when a plurality of machine learning models are selected, and the advantages of different machine learning models can be combined, so that the formed video quality evaluation model can predict more accurate quality scores of live videos.

Optionally, the machine learning module includes, but is not limited to, a machine learning model based on a lifting tree (e.g., xgboost, lgbm model), a neural network, deep learning, svm support vector machine, and the like.

Further, a specific implementation of the training of the video quality evaluation model is given, and is described as follows:

in one embodiment, the video quality assessment model is obtained by:

s1: acquiring historical video data of a live scene; the historical video data comprises historical live broadcast source videos and historical video code streams.

In particular, a large amount of video data of a live scene is required in training a video quality evaluation model. In this embodiment, the historical video data in the real live scene is mainly collected, including the historical live source video of the real live scene and the coded historical video code stream. Historical live source video data is generally required to cover various types of scenes in live broadcasting, such as dynamic live scene, static live scene, very dynamic live scene, game live scene and the like. The historical video code stream can be obtained by adopting various coding parameters in a real live scene to code the collected historical source video data to obtain the historical video code stream of various historical live source videos under different coding parameter combinations, or by directly pulling the historical video code stream pushed by a main broadcast end from an online live scene. The encoding parameters may be encoder type, rate mode, frame rate, resolution, rate, encoder gear, I-frame interval, and other parameters.

S2: and extracting the characteristics of the historical video code stream to obtain code stream characteristic information of each frame of video picture.

After the historical video code stream is collected, feature extraction needs to be performed on the historical video code stream to obtain code stream feature information, wherein the code stream feature information can be recorded as feature vectors.

The code stream characteristic information is mainly related information of the coding units, and then the code stream characteristic information extraction is to extract related information of all coding units of each frame in the video code stream.

S3: and calculating the real video scoring data of each frame of video picture based on the historical live broadcast source video and the historical video code stream.

Specifically, the video quality evaluation model is mainly used to evaluate the video quality, that is, it outputs the quality score of the video, so the target vector of the video quality rating model may be the real video score data (i.e. the image quality index value) of each frame of the video frame.

Alternatively, the image quality index value may be a subjective image quality score mos, or may be an objective image quality index such as psnr or ssim. In this embodiment, the objective image quality index psnr may be used to represent the real video score data of the historical video code stream. The calculation process of the real video scoring data comprises the following steps: the historical source video data and the historical video code stream are combined to calculate the image quality grading index frame by frame, namely, the historical video code stream is decoded to obtain a decoded video with the same number as the historical source video frames, and then the reference source video calculates the real image quality grading (namely, real video grading data) of each frame of video picture in the decoded video frame by using a full reference video evaluation method, so that the real video grading data of each frame of video picture can be in one-to-one correspondence with the feature vector (namely, code stream feature information) of the frame of video picture.

S4: and dividing the code stream characteristic information and the real video scoring data into a training set and a testing set according to a preset proportion.

In one embodiment, dividing the code stream characteristic information and the real video scoring data into a training set and a test set according to a preset proportion includes: analyzing the correlation between each code stream characteristic information and corresponding real video scoring data; and screening code stream characteristic information with the correlation greater than or equal to a preset value and real video scoring data, and dividing the code stream characteristic information and the real video scoring data into a training set and a testing set according to a preset proportion.

After the feature vector (i.e., code stream feature information) and the target vector (i.e., real video scoring data) of each frame of video picture are obtained, the feature vector of each frame and the corresponding real video scoring data can be trained by using a machine learning model to find a potential mapping relation, so as to obtain a video quality evaluation model.

The feature vectors (i.e., code stream feature information) and the target vectors of the video pictures obtained by calculation can form a data set with huge data scale, and the data set can be randomly divided into a training set and a test set according to a preset proportion. The preset proportion may be a preset value, and may be specifically determined according to actual requirements in the model training process.

In this embodiment, the training set and the test set can be randomly divided according to a ratio of 0.85: 0.15. Only training set data is used in the training process, and test set data is not used for training and is only used for evaluating model prediction results.

In addition, before the training set is adopted to train the machine learning model, necessary data analysis can be carried out on the data set, so that the data characteristics of the data set can be better presented, and the subsequent machine learning model training can be facilitated to obtain a better prediction effect. Specifically, the correlation and distribution relationship between the feature vector (i.e., code stream feature information) and the target vector (i.e., real video scoring data) can be analyzed, the feature vector and the target vector with high correlation can be screened out, and the feature vector and the target vector with low correlation can be removed, so that the model complexity can be reduced.

S5: and (3) training at least one machine learning model by adopting code stream characteristic information and real video scoring data in the training set, and updating parameters of each machine learning model to form an initial video quality evaluation model.

Specifically, each machine learning model can be trained and fitted individually by adopting stream feature information and real video scoring data in a training set, and parameters of the machine learning model are updated continuously in the model training process, that is, a plurality of independent models which can map the potential relationship between feature vectors and target vectors and have high prediction accuracy are obtained through a series of parameter adjusting processes and are recorded as initial video quality evaluation models.

S6: and testing the initial video quality evaluation model by adopting the code stream characteristic information and the real video scoring data in the test set, and determining the video quality evaluation model according to the test result.

After the initial video quality evaluation model is obtained, the initial video quality evaluation model can be tested by adopting code stream characteristic information and real video scoring data in the test set, so that the prediction accuracy of the initial video quality evaluation model is determined; when the prediction accuracy score reaches a preset value, the initial video quality evaluation model is proved to be optimal, and therefore a video quality evaluation model is obtained; if the prediction accuracy score does not reach the preset value, the initial video quality evaluation model does not reach the optimal value, and usually, the initial video quality evaluation model needs to be further trained until the effect is optimal, so that the video quality evaluation model is obtained. The method can quickly and accurately train the model so as to carry out corresponding calculation by using the model when the quality of the live video is evaluated, and has the advantages of convenient operation, high efficiency and high accuracy.

In one embodiment, in step S5, training at least one machine learning model using the codestream feature information and the real video scoring data in the training set, and updating parameters of each machine learning model to form an initial video quality evaluation model includes: dividing the training set into a plurality of folds; for a number of folds, training at least one machine learning model using one fold at a time as a validation set, the remaining folds as training data, until all folds are used as validation sets; calculating model fitting effect index values after each time of machine learning model training; and selecting the model parameter corresponding to the minimum model fitting effect index value as the optimal parameter of the machine learning model to form an initial video quality evaluation model.

Specifically, when the machine learning model is trained by using the codestream feature information and the real video scoring data in the training set, a cross-validation (CV) method may be used to avoid overfitting of the model. In the cross-validation method, a training set is equally divided into several small subsets (each subset is marked as a fold), and then a modeling process is executed on different subsets to obtain an index (which can be expressed by mean absolute error of MAE) of the fitting effect of each subset model.

For ease of understanding, a detailed embodiment is given. Referring to fig. 4, the process of training the model by using the cross-validation method is as follows: the method comprises the steps of firstly dividing a Data set (namely All Data) into a Training set (namely Training Data) and a testing set (namely Test Data), then scoring the Training set into a plurality of subsets (also called folding, namely Fold 1-Fold 5 in figure 4, wherein only 5 subsets are shown in the figure), and then Training a machine learning model by adopting the plurality of subsets. In the first model training, the first Fold (Fold1) can be used as a validation set, and other folds (Fold 2-Fold 5) can be used as training data to train the model, and the first training model fitting effect index value (MAE 1) is calculated; in the second model training, the second Fold (Fold2) can be used as a validation set, and the folds (i.e. Fold1, Fold3 to Fold5) except the second Fold are used as training data to train the model, and the model fitting effect index (MAE 2) of the second training is calculated; this process is repeated, one fold at a time as the verification set, until all folds are known to be used as over-verification sets. And finally, comparing and calculating to obtain each model fitting effect index value (namely MAE), and selecting the model parameter corresponding to the minimum MAE as the optimal model of the machine learning model.

In a preferred embodiment, a 5-fold cross validation method can be adopted, and key parameters of a mature parameter adjusting tool machine learning model are respectively adjusted and optimized one by one, so that the model is trained in multiple times of cross validation to obtain the minimum rmse (root mean square error), and thus an initial video quality evaluation model is obtained.

It should be noted that, when there are multiple machine learning models, it is necessary to train each machine learning model separately by using a cross validation method.

The cross validation method is adopted to train the machine learning model, so that overfitting of the model can be avoided, and the initial quality evaluation model obtained after training is more accurate.

In one embodiment, in executing step S6, determining a video quality evaluation model according to the test result includes: when a plurality of machine learning models are available, obtaining a plurality of trained machine learning models according to the test result; constructing a stacking model by taking a plurality of trained machine learning models as a first layer model and a regression model as a second layer model; dividing the training set into a plurality of folds; selecting one fold as verification data, and training a first layer model by adopting the other folds to obtain a plurality of corresponding video quality prediction models; respectively calculating the verification data by adopting a plurality of corresponding video quality prediction models to obtain a plurality of video quality prediction values; repeating the steps of training the first layer model and calculating a video quality prediction value until all folds are used as validation data; and training the second layer model by adopting each video quality predicted value and real video scoring data in the verification data corresponding to each video quality predicted value to obtain a video quality evaluation model.

Specifically, when a plurality of machine learning models are adopted, each machine learning model outputs a quality score predicted value of a live video according to input code stream characteristic information, and then quality score predicted values of a plurality of live videos are obtained at this time, and the values are usually unequal, so that the quality score of a live video needs to be finally determined according to the obtained quality score predicted values of the plurality of live videos.

Based on this, a method of stacking model fusion may be employed to fit multiple machine learning models. Among them, Stacking is a method of ensemble learning in machine learning. The integration has the advantages that different models can learn different characteristics of data, and the result after fusion can be better represented and can make up for deficiencies. The structure of the Stacking Model is shown in fig. 5, and mainly includes two layers, which are respectively marked as a first layer Model and a second layer Model, where Model _1 to Model _ n in fig. 5 represent the first layer Model, and may also be referred to as a base Model; model _ (n +1) represents the second layer Model, which may also be referred to as the upper layer Model. The first layer model is mainly used for feature transformation, and the second layer model is mainly used for result prediction.

Optionally, the first layer model is typically some strong model, such as Xgboost, LightGBM, Random Forest, GBDT, ExtraTrees, etc. The second layer model is typically some simple model such as a linear regression, logistic regression model, etc.

In addition, the training process of the Stacking requires that the outputs (i.e., data _ out _1 to data _ out _ N) of the first layer models (i.e., Model _1 to Model _ N) are merged into the input of the second layer models (i.e., Model _ (N +1)), i.e., the number of models N of the first layer determines the input feature dimension of the second layer Model _ (N + 1).

In addition, each model in the first layer model of Stacking makes a K-Fold prediction (i.e., K-Fold cross validation), and the Stacking training process is shown in FIG. 6. As can be seen from FIG. 6, all data obtained after K-Fold training of the first layer Model are concatenated together to form a training set of Model _ (n + 1).

In this embodiment, a 5-Fold (i.e., 5-Fold) cross-validation method may be employed to train Stacking. The specific process is as follows: the xgboost model and the lgbm model can be used as two first layer models fused with stacking, meanwhile, the linear regression model is used as a second layer model, in each training process, the training set is randomly divided into 5 folds, 4 folds are used as training data, the xgboost model and the lgbm model in the first layer model are respectively fitted to obtain corresponding prediction models, the two prediction models are used for respectively predicting the quality score of the live video of the residual 1 fold of data, video quality prediction values are respectively obtained, a new training set is obtained by combining the video quality prediction values and the real image quality values (namely real video score data) of the 1 fold of data, namely, the fusion prediction is carried out on the basis of the video quality prediction values, the linear regression model on the upper layer is used for carrying out the fitting regression on the video quality prediction values given by the two first layer models and the real video score data, and finally, training to obtain a model which can further fuse and predict the quality score of the finally-output live video according to the two first-layer model predicted values, and recording the model as a video quality evaluation model.

After the video quality evaluation model is obtained, for the video code stream which needs to be subjected to video quality evaluation, only the video code stream which needs to be subjected to video quality evaluation needs to be subjected to feature extraction to obtain code stream feature information, and then the code stream feature information is input into the video quality evaluation model, so that the quality score of the corresponding live video can be obtained, and the quality score of the live video is the basis for reflecting the quality of the frame image.

The established video quality evaluation model can be more accurate by adopting the stacking model fusion method, so that the quality score of the live video calculated by adopting the video quality evaluation model is more accurate.

In order to facilitate understanding of the establishment process of the video quality evaluation model, a detailed embodiment is given. Referring to fig. 7, the process of establishing the video quality evaluation model includes: 1. acquiring an original video, namely acquiring a historical live broadcast source video (namely the original video) in a live broadcast scene; 2. video coding: carrying out h265 video coding on an original video according to different code rates and frame rates to form h265 video code streams (namely historical video code streams); 3. feature extraction and image quality index calculation: extracting code stream characteristics from each frame of video picture of the h265 video code stream, taking the code stream characteristics as characteristic vectors of the model, calculating a real value (namely psnr value) of image quality indexes of each frame of video picture of the h265 video code stream by utilizing original video data, and taking the real value as a target vector of the model; 4. preliminary analysis of data: performing data preliminary analysis including correlation analysis, variable distribution condition analysis and the like on a data matrix consisting of the characteristic vectors and the target vectors; 5. model training: then, randomly dividing a data matrix consisting of the characteristic vector and the target vector into a training set and a testing set according to a certain proportion, and then training a machine learning model (such as an xgboost model and an lgbm model) by adopting data in the training set to form a basic model; 6. model fusion and optimization: and adopting stacking model fusion to the basic model to obtain a video quality evaluation model.

In addition, the embodiment of the application also provides adjustment of the quality of the live video. Referring to the drawings, fig. 8 is a schematic flowchart illustrating a method for adjusting quality of live video according to an embodiment of the present application, and taking an example that the method is applied to the server or the client in fig. 1 as an example, the method includes the following steps:

step S810, acquiring a live video stream, and extracting a plurality of frames of video code streams from the live video stream.

And S820, calculating the quality score of the live video by adopting the method in the embodiment of the live video quality evaluation method according to the video code streams of the frames.

Specifically, in live webcasting, live webcasting development and maintenance personnel can monitor and adjust the quality of live webcasting video. In addition, in combination with a live network scene, the live video quality monitoring generally needs to monitor whether the quality of a live video in a video code stream pushed to a server by a main broadcast terminal and a video code stream pulled from the server by a client terminal is normal or not, and adjust the quality of the live video when the quality of the live video is abnormal. Therefore, the monitoring and the adjustment of the quality of the live video can be carried out in the server for the anchor terminal and can be carried out on the client terminal for the client terminal. Therefore, the method for adjusting the quality of the live video in the embodiment is mainly executed in the server and/or the client.

The specific process is as follows: at intervals, an identification video stream can be obtained, and then a plurality of frames of video code streams are extracted from the live video stream; and then, calculating the quality score of the live video by using the video streams of a plurality of frames and the playing in the embodiment of the live video quality evaluation method.

In step S830, when the quality score exceeds the preset quality score range, it is determined that the quality of the live video is abnormal.

After the quality score of the live video is obtained through calculation, the quality score can be compared with a preset score range, whether the quality score is within the preset quality score range or not is judged, and when the quality score is within the preset score range, the quality of the live video is normal; and when the quality score is not in the preset score range, indicating that the quality of the live video is abnormal.

Next, an embodiment of determining the preset quality score range is given, and the following description is provided:

in one embodiment, the preset quality score range is obtained by: acquiring a source video stream, each encoding parameter and each network parameter of a live broadcast scene; encoding the source video stream by adopting each encoding parameter to obtain each video code stream; for each video code stream, calculating the actual video quality of each video code stream under each network parameter by referring to the corresponding source video stream and adopting a full-parameter video quality evaluation method; and determining a preset quality grading range according to the actual video quality of each video code stream.

Specifically, the preset quality score range can be obtained by collecting a large number of live source videos of live scenes and encoded video data (i.e., video code streams) for analysis and statistics. The specific process is as follows: the method comprises the steps of collecting live broadcast source videos (namely, uncoded yuv video data), coding parameters and network parameters of a certain number of live broadcast scenes, coding the live broadcast source videos on different combinations of the coding parameters to obtain video code streams, referring the video code streams to the live broadcast source videos, obtaining actual video qualities corresponding to different coding parameter and network parameter (namely, network condition parameter) combinations one by adopting an existing full-parameter video quality evaluation method, driving a preset quality grading range according to the actual video qualities, and obtaining a result shown in fig. 9, wherein fig. 9 shows the preset quality grading ranges under different coding parameters and network parameter combinations.

The network condition parameters include bandwidth, delay, packet loss rate, and the like.

Step S880, counting the abnormal times of the quality of the live video in a preset time period, and adjusting the related parameters of the live video when the abnormal times exceed a preset threshold value so as to complete the adjustment of the quality of the live video.

When the abnormal times reach a preset threshold value within a period of time, the problem of picture quality of the current video stream is judged, and at the moment, relevant parameters of the live video can be adjusted, so that the adjustment of the quality of the live video is completed.

The method for adjusting the quality of the live video, provided by the embodiment of the application, comprises the steps of acquiring a live video stream, and extracting a plurality of frames of video code streams from the live video stream; calculating the quality score of the live video by adopting the live video quality evaluation method of the first aspect according to a plurality of frames of the video code stream; when the quality score exceeds a preset quality score range, judging that the quality of the live video is abnormal; counting the abnormal times of the quality of the live video in a preset time period, and adjusting the related parameters of the live video when the abnormal times exceed a preset threshold value so as to complete the adjustment of the quality of the live video. According to the method, due to the adoption of the live video quality evaluation method, the quality score of the live video can be rapidly and accurately calculated, and then the calculated quality score of the live video is compared with the preset quality score range, so that whether the direct current of the live video has problems or not can be rapidly and accurately determined, and when the direct video has problems, the direct current of the live video can be timely adjusted.

Further, several embodiments of adjusting the relevant parameters of the live broadcast are given, and are described in detail as follows:

in one embodiment, adjusting the relevant parameters of the live video comprises: adjusting the encoding parameters of the anchor terminal or restarting an encoder; wherein the encoding parameters include one or more of an encoder type, a rate mode, a frame rate, a resolution, an encoding gear, a code rate, an encoder gear, an I-frame interval, and an encoding slice type.

Specifically, for the anchor terminal, if it is detected that the current video quality is abnormal, it is necessary to adjust the live video. The specific adjustment mode may be to adjust the encoding parameters of the anchor end, or to re-parent the encoder. The adjustment of the encoding parameters of the anchor terminal may be to try to adjust and issue different encoding parameters to the anchor terminal to improve the video picture quality problem according to a plurality of allowable configuration combinations obtained by the server under the current network parameters according to the preset configuration and the preset quality score range.

By adopting the mode, the video quality can be adjusted in time when the live video quality abnormality occurs at the anchor terminal.

In one embodiment, adjusting the relevant parameters of the live video comprises: and the client pulls the video code streams with different parameter configurations or restarts the decoder.

For the client, if the current video quality is detected to be abnormal, and the live video needs to be adjusted, the current network condition of the client and a preset configuration strategy can be used, and video code streams configured by different parameters are automatically pulled or a decoder is restarted, so that the video picture quality problem is improved.

By adopting the mode, the video quality can be adjusted in time when the client side has the abnormal live video quality.

In one embodiment, the method for adjusting the quality of the live video further comprises: and triggering a log alarm when the abnormal times exceed a preset threshold.

Specifically, whether on the anchor side or the client side, when the number of times of occurrence of the abnormal quality of the live video exceeds a preset threshold, a log alarm can be triggered. For the anchor terminal, the log alarm triggering mainly includes marking the video stream id, the coding parameters, the actual video quality score, the preset video quality score and other related state information at the current moment, so as to provide the follow-up development and operation and maintenance personnel to perform quick and accurate problem positioning and analysis.

For the client, the triggering of the alarm log mainly feeds back and records the problem site and related state information so that the client can pull different video code streams.

It should be understood that, although the steps in the flowcharts of fig. 2 and 8 are shown in sequence as indicated by the arrows, the steps are not necessarily performed in sequence as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2 and 8 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performing the sub-steps or stages is not necessarily sequential, but may be performed alternately or alternately with other steps or at least some of the sub-steps or stages of other steps.

The embodiment disclosed in the application describes a live video quality evaluation method in detail, and the method disclosed in the application can be implemented by devices in various forms, so that the application also discloses a live video quality evaluation device corresponding to the method, and a detailed description is given below for a specific embodiment.

Please refer to fig. 10, which is a device for evaluating quality of live video according to an embodiment of the present application, and the device mainly includes:

a video code stream obtaining module 1002, configured to obtain a video code stream to be evaluated; the video code stream to be evaluated is formed by coding a live broadcast source video needing video quality evaluation.

A code stream characteristic information extracting module 1004, configured to extract code stream characteristic information from the video code stream to be evaluated.

A quality score obtaining module 1006, configured to input the code stream feature information into a pre-established video quality evaluation model to obtain a quality score of the live video; in an embodiment, the video quality evaluation model is obtained by training at least one machine learning model by using code stream feature information extracted from historical video data of a live broadcast scene as input and using real video scoring data calculated according to the historical video data as output, and the apparatus further includes:

the historical video data acquisition module is used for acquiring historical video data of a live scene; the historical video data comprises a historical live broadcast source video and a historical video code stream.

And the characteristic extraction module is used for extracting the characteristics of the historical video code stream to obtain the code stream characteristic information of each frame of video picture.

The real video scoring data calculation module is used for calculating real video scoring data of each frame of video picture based on the historical live broadcast source video and the historical video code stream;

the data set dividing module is used for dividing the code stream characteristic information and the real video scoring data into a training set and a testing set according to a preset proportion;

the initial video quality evaluation model forming module is used for training at least one machine learning model by adopting code stream characteristic information and real video scoring data in the training set and updating parameters of each machine learning model to form an initial video quality evaluation model;

and the video quality evaluation model determining module is used for testing the initial video quality evaluation model by adopting the code stream characteristic information and the real video scoring data in the test set and determining the video quality evaluation model according to the test result.

In one embodiment, the initial video quality evaluation model forming module is used for dividing the training set into a plurality of folds; for a number of folds, training at least one machine learning model using one fold at a time as a validation set, the remaining folds as training data, until all folds are used as validation sets; calculating model fitting effect index values after each time of machine learning model training; and selecting the model parameter corresponding to the minimum model fitting effect index value as the optimal parameter of the machine learning model to form an initial video quality evaluation model.

In one embodiment, the video quality evaluation model determining module is configured to, when there are multiple machine learning models, obtain multiple trained machine learning models according to the test result; constructing a stacking model by taking a plurality of trained machine learning models as a first layer model and a regression model as a second layer model; dividing the training set into a plurality of folds; selecting one fold as verification data, and training a first layer model by adopting the other folds to obtain a plurality of corresponding video quality prediction models; respectively calculating the verification data by adopting a plurality of corresponding video quality prediction models to obtain a plurality of video quality prediction values; repeating the steps of training the first layer model and calculating a video quality prediction value until all folds are used as validation data; and training the second layer model by adopting each video quality predicted value and real video scoring data in the verification data corresponding to each video quality predicted value to obtain a video quality evaluation model.

In one embodiment, the data set partitioning module is configured to analyze a correlation between each piece of codestream feature information and corresponding real video scoring data; and screening code stream characteristic information with the correlation greater than or equal to a preset value and real video scoring data, and dividing the code stream characteristic information and the real video scoring data into a training set and a testing set according to a preset proportion.

For specific limitations of the live video quality evaluation device, reference may be made to the above limitations of the method, which are not described herein again. The various modules in the above-described apparatus may be implemented in whole or in part by software, hardware, and combinations thereof. The modules can be embedded in a hardware form or independent of a processor in the terminal device, and can also be stored in a memory in the terminal device in a software form, so that the processor can call and execute operations corresponding to the modules.

The embodiment disclosed in the present application describes a method for adjusting live video quality in detail, and the method disclosed in the present application can be implemented by devices in various forms, so that the present application also discloses a device for adjusting live video quality corresponding to the method, and a detailed description is given below with respect to a specific embodiment.

Please refer to fig. 11, which is a device for adjusting quality of live video according to an embodiment of the present application, and the device mainly includes:

and a live video stream obtaining module 112, configured to obtain a live video stream.

And a video stream extraction module 114, configured to extract a number of frames of video streams from the live video stream.

The quality score calculation module 116 is used for extracting code stream characteristic information from a plurality of frames of video code streams, and inputting the code stream characteristic information into a pre-established video quality evaluation model to calculate the quality score of the live video;

the video quality evaluation model is obtained by taking code stream characteristic information extracted from historical video data of a live broadcast scene as input and real video scoring data calculated according to the historical video data as output to train at least one machine learning model;

and the abnormity judgment module 118 is used for judging that the quality of the live video is abnormal when the quality score exceeds a preset quality score range.

The video quality adjusting module 1110 is configured to count the number of times of occurrence of an abnormal quality of the live video within a preset time period, and adjust related parameters of the live video when the number of times of the abnormal quality exceeds a preset threshold, so as to complete adjustment of the quality of the live video.

In one embodiment, the video quality adjusting module 1110 is configured to adjust encoding parameters of the anchor or restart the encoder; wherein the encoding parameters include one or more of an encoder type, a rate mode, a frame rate, a resolution, an encoding gear, a code rate, an encoder gear, an I-frame interval, and an encoding slice type.

In one embodiment, the video quality adjustment module 1110 is configured to pull video streams with different parameter configurations or restart the decoder.

In one embodiment, the apparatus further comprises:

and the data acquisition module is used for acquiring the source video stream, each coding parameter and each network parameter of the live broadcast scene.

And the encoding module is used for encoding the source video stream by adopting each encoding parameter so as to obtain each video code stream.

And the actual video quality calculation module is used for calculating the actual video quality of each video code stream under each network parameter by referring to the corresponding source video stream and adopting a full-parameter video quality evaluation method.

And the preset quality grading range determining module is used for determining the preset quality grading range according to the actual video quality of each video code stream.

In one embodiment, the apparatus further comprises:

and the log alarm module is used for triggering log alarm when the abnormal times exceed a preset threshold value.

The specific definition of the adjusting device for the live video quality can be referred to the above definition of the method, and is not described herein again. The various modules in the above-described apparatus may be implemented in whole or in part by software, hardware, and combinations thereof. The modules can be embedded in a hardware form or independent of a processor in the terminal device, and can also be stored in a memory in the terminal device in a software form, so that the processor can call and execute operations corresponding to the modules.

Referring to fig. 12, fig. 12 is a block diagram illustrating a structure of a terminal device or a server according to an embodiment of the present disclosure. The terminal device or server 120 may be a computer device. The terminal device 120 or server in the present application may include one or more of the following components: a processor 122, a memory 124, and one or more applications, wherein the one or more applications may be stored in the memory 124 and configured to be executed by the one or more processors 122, the one or more applications configured to perform the methods described in the above-described live video quality evaluation method embodiments, and/or to perform the methods described in the above-described live video quality adjustment method embodiments.

Processor 122 may include one or more processing cores. The processor 122 connects various parts throughout the terminal device 120 using various interfaces and lines, and performs various functions of the terminal device 120 and processes data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 124 and calling data stored in the memory 124. Alternatively, the processor 122 may be implemented in hardware using at least one of Digital Signal Processing (DSP), Field-Programmable Gate Array (FPGA), and Programmable Logic Array (PLA). The processor 122 may integrate one or a combination of a Central Processing Unit (CPU), a Graphic Processing Unit (GPU), a modem, and the like. Wherein, the CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing display content; the modem is used to handle wireless communications. It is understood that the modem may not be integrated into the processor 122, but may be implemented by a communication chip.

The Memory 124 may include a Random Access Memory (RAM) or a Read-Only Memory (Read-Only Memory). The memory 124 may be used to store instructions, programs, code sets, or instruction sets. The memory 124 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for implementing at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing various method embodiments described below, and the like. The storage data area may also store data created by the terminal device 120 in use, and the like.

Those skilled in the art will appreciate that the structure shown in fig. 12 is a block diagram of only a portion of the structure relevant to the present application, and does not constitute a limitation on the terminal device to which the present application is applied, and a particular terminal device may include more or less components than those shown in the drawings, or combine some components, or have a different arrangement of components.

In summary, the terminal device provided in the embodiment of the present application is used to implement the corresponding live video quality evaluation method in the foregoing method embodiment, and has the beneficial effects of the corresponding method embodiment, which are not described herein again.

Referring to fig. 13, a block diagram of a computer-readable storage medium according to an embodiment of the present disclosure is shown. The computer-readable storage medium 130 stores program codes that can be called by a processor to execute the methods described in the above embodiments of the live video quality evaluation method and/or the methods described in the above embodiments of the live video quality adjustment method.

The computer-readable storage medium 130 may be an electronic memory such as a flash memory, an EEPROM (electrically erasable programmable read only memory), an EPROM, a hard disk, or a ROM. Alternatively, the computer-readable storage medium 130 includes a non-transitory computer-readable storage medium. The computer readable storage medium 130 has storage space for program code 132 for performing any of the method steps described above. The program code can be read from or written to one or more computer program products. The program code 132 may be compressed, for example, in a suitable form.

In the description herein, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A live video quality evaluation method is characterized by comprising the following steps:

2. The method of claim 1, wherein the video quality assessment model is obtained by:

acquiring historical video data of a live scene; the historical video data comprises a historical live broadcast source video and a historical video code stream;

extracting the characteristics of the historical video code stream to obtain code stream characteristic information of each frame of video picture;

calculating real video scoring data of each frame of video picture based on the historical live broadcast source video and the historical video code stream;

dividing each code stream characteristic information and each real video scoring data into a training set and a testing set according to a preset proportion;

training at least one machine learning model by using the code stream characteristic information and the real video scoring data in the training set, and updating parameters of each machine learning model to form an initial video quality evaluation model;

and testing the initial video quality evaluation model by adopting the code stream characteristic information and the real video scoring data in the test set, and determining the video quality evaluation model according to a test result.

3. The method of claim 2, wherein the training at least one of the machine learning models using the codestream feature information and the real video scoring data in the training set, and updating parameters of each of the machine learning models to form an initial video quality evaluation model comprises:

dividing the training set into a number of folds;

for a number of folds, training at least one of the machine learning models using one fold at a time as a validation set, the remaining folds as training data, until all folds are used as validation sets;

calculating model fitting effect index values after each time of machine learning model training;

and selecting the model parameter corresponding to the minimum model fitting effect index value as the optimal parameter of the machine learning model to form the initial video quality evaluation model.

4. The method according to claim 2 or 3, wherein the determining the video quality evaluation model according to the test result comprises:

when a plurality of machine learning models are available, obtaining a plurality of trained machine learning models according to a test result;

constructing a stacking model by taking the trained machine learning models as a first-layer model and a regression model as a second-layer model;

dividing the training set into a plurality of folds;

selecting one fold as verification data, and training the first layer model by adopting the other folds to obtain a plurality of corresponding video quality prediction models;

calculating the verification data by adopting a plurality of corresponding video quality prediction models respectively to obtain a plurality of video quality prediction values;

repeating the steps of training the first layer model and calculating the video quality prediction until all folds are used as validation data;

and training the second layer model by adopting each video quality predicted value and real video scoring data in the verification data corresponding to each video quality predicted value to obtain the video quality evaluation model.

5. The method of claim 2, wherein the dividing each of the codestream feature information and each of the real video scoring data into a training set and a testing set according to a preset ratio comprises:

analyzing the correlation between each code stream characteristic information and corresponding real video scoring data;

and screening code stream characteristic information with the correlation greater than or equal to a preset value and real video scoring data, and dividing the code stream characteristic information and the real video scoring data into a training set and a testing set according to a preset proportion.

6. A method for adjusting the quality of live video is characterized by comprising the following steps:

calculating the quality score of the live video by adopting the live video quality evaluation method according to a plurality of frames of the video code stream and any one of claims 1 to 5;

7. The method of claim 6, wherein the adjusting the relevant parameters of the live video comprises:

adjusting the encoding parameters of the anchor terminal or restarting the encoder;

wherein the encoding parameters include one or more of an encoder type, a rate mode, a frame rate, a resolution, an encoding gear, a code rate, an encoder gear, an I-frame interval, and an encoding slice type.

8. The method of claim 6, wherein the adjusting the relevant parameters of the live video comprises:

and the client pulls the video code streams with different parameter configurations or restarts the decoder.

9. The method according to any one of claims 6-8, wherein the preset quality score range is obtained by:

acquiring a source video stream, each encoding parameter and each network parameter of a live broadcast scene;

encoding the source video stream by adopting each encoding parameter to obtain each video code stream;

for each video code stream, referring to the corresponding source video stream, and calculating the actual video quality of each video code stream under each network parameter by adopting a full-parameter video quality evaluation method;

and determining a preset quality grading range according to the actual video quality of each video code stream.

10. The method of claim 6, further comprising:

and triggering a log alarm when the abnormal times exceed a preset threshold value.

11. A live video quality evaluation apparatus, characterized in that the apparatus comprises:

12. An apparatus for adjusting quality of live video, the apparatus comprising:

13. A terminal device or server, comprising:

a memory; one or more processors coupled with the memory; one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more applications being configured to perform the method of any of claims 1-5, and/or the method of claims 6-10.

14. A computer-readable storage medium, in which a program code is stored, which program code can be invoked by a processor to perform the method according to any one of claims 1 to 5, and/or the method according to claims 6 to 10.