CN112804587B

CN112804587B - Video quality inspection method and device based on watching people number sequence and computer equipment

Info

Publication number: CN112804587B
Application number: CN202011622844.1A
Authority: CN
Inventors: 许丹
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2020-12-31
Filing date: 2020-12-31
Publication date: 2022-10-14
Anticipated expiration: 2040-12-31
Also published as: CN112804587A

Abstract

The invention discloses a video quality inspection method, a video quality inspection device, computer equipment and a storage medium based on a watching person number sequence, which relate to the artificial intelligence technology and comprise a target video training subset and a target prediction sub-model, wherein the target video training subset and the target prediction sub-model belong to the current video data; calculating to obtain the attenuation rate of the number of watching people of the current video by taking the current video feature vector corresponding to the current video data as the input of a target prediction sub-model; acquiring the actual number of watching people of the current video data, and calculating according to the corresponding attenuation rate curve of the number of watching people of the current video to obtain a time sequence of the estimated number of watching people of the current video; and (4) obtaining a trend time sequence by subtracting the time sequence of the actual number of people watching the current video from the time sequence of the estimated number of people watching the current video. The method and the device have the advantages that the average watching person number attenuation rate of the video of a certain input category is predicted according to the video characteristics, then the trend time sequence obtained by comparing the actual watching person number attenuation rate is used as the video evaluation parameter, the evaluation process is simple, and the result accuracy is high.

Description

Video quality inspection method and device based on watching people number sequence and computer equipment

Technical Field

The invention relates to the technical field of artificial intelligence intelligent decision making, in particular to a video quality inspection method and device based on a watching people number sequence, computer equipment and a storage medium.

Background

The number of people watching a video, similar to the audience rating concerned by the traditional television media, is often used as an important measure for showing the success of video production. In practice, the video playing frequency depends on the similar title attraction, topic popularity, video promotion strength based on the number of comments and the like to a great extent, which are different from the measurement standard for really embodying the video quality.

Different from the number of video players, the fluctuation shape of the number of viewers in the playing process of the video is more beneficial to reflecting the change of the interest concentration of the viewers in the video, and further reflecting the video quality from a more essential angle. For example, if a user entering a video playing page finds that video content does not conform to a title, the user can leave the video playing page quickly; if the video quality is high, a greater percentage of viewers will have the video completely viewed.

At present, quality evaluation is generally performed on a video to be evaluated based on a time domain evaluation parameter and a space domain evaluation parameter, the influence of video evaluation parameters of different dimensions on video quality is reflected more comprehensively from two dimensions of a time domain and a space domain, and the time domain evaluation parameter and the space domain evaluation parameter can better accord with the visual perception of a human visual system on the video quality, but the evaluation process of the video quality is complex, and the accuracy of an evaluation result is not high.

Disclosure of Invention

The embodiment of the invention provides a video quality inspection method, a video quality inspection device, computer equipment and a storage medium based on a watching person number sequence, and aims to solve the problems that in the prior art, the quality of a video to be evaluated is evaluated based on a time domain evaluation parameter and a space domain evaluation parameter, the evaluation process of the video quality is complex, and the accuracy of an evaluation result is low.

In a first aspect, an embodiment of the present invention provides a video quality inspection method based on a number sequence of viewers, which includes:

acquiring video time corresponding to each video data in a video training set, and dividing the video training set into video training subsets corresponding to video packet numbers according to the video time and a preset video time threshold value set;

acquiring a video watching person number time sequence corresponding to each video data in each video training subset, and calculating the video watching person number attenuation rate corresponding to each video watching person number time sequence by a three-order exponential smoothing method;

acquiring a video image corresponding to each video data in each video training subset, and acquiring a video feature vector corresponding to each video data in the video training subset through the video image corresponding to each video data; the video image comprises a video category characteristic, a video duration characteristic, an online date characteristic, a teacher characteristic and an audience group characteristic;

respectively carrying out model training through each video training subset to obtain a prediction submodel corresponding to each video training subset; the prediction sub-model is used for predicting the attenuation rate of the number of people watching the video according to the video feature vector corresponding to the video data;

if current video data are received, obtaining current video time corresponding to the current video data, and obtaining a target video training subset corresponding to the current video data and a target prediction sub-model corresponding to the target video training subset according to the current video time and video time intervals corresponding to the video training subsets respectively;

calculating by taking the current video feature vector corresponding to the current video data as the input of the target prediction submodel to obtain the attenuation rate of the number of watching people of the current video corresponding to the current video data;

acquiring a current video actual watching people number time sequence corresponding to the current video data, acquiring an actual watching people number corresponding to the current video data, and performing operation according to the actual watching people number and a current video watching people number attenuation rate curve corresponding to the current video watching people number attenuation rate to obtain a current video estimated watching people number time sequence corresponding to the current video data; and

and calculating the difference between the time sequence of the actual number of people watching the current video and the time sequence of the estimated number of people watching the current video to obtain a trend time sequence.

In a second aspect, an embodiment of the present invention provides a video quality inspection apparatus based on a number of people watching a sequence, including:

the video training set grouping unit is used for acquiring video time corresponding to each piece of video data in a video training set, and dividing the video training set into video training subsets corresponding to video packet numbers according to the video time and a preset video time threshold value set;

the first attenuation rate acquisition unit is used for acquiring a video watching person number time sequence corresponding to each video data in each video training subset, and calculating the video watching person number attenuation rate corresponding to each video watching person number time sequence by a three-order exponential smoothing method;

the video feature vector acquisition unit is used for acquiring a video image corresponding to each video data in each video training subset and acquiring a video feature vector corresponding to each video data in the video training subset through the video image corresponding to each video data; the video image comprises a video category characteristic, a video duration characteristic, an online date characteristic, a teacher characteristic and an audience group characteristic;

the predictor model training unit is used for respectively carrying out model training through each video training subset to obtain a predictor model respectively corresponding to each video training subset; the prediction sub-model is used for predicting the attenuation rate of the number of people watching the video according to the video feature vector corresponding to the video data;

the current video data grouping unit is used for acquiring current video time corresponding to the current video data if the current video data is received, and acquiring a target video training subset corresponding to the current video data and a target prediction sub-model corresponding to the target video training subset according to the current video time and video time intervals corresponding to the video training subsets respectively;

the second attenuation rate obtaining unit is used for calculating the current video feature vector corresponding to the current video data as the input of the target prediction sub-model to obtain the attenuation rate of the number of watching people of the current video corresponding to the current video data;

the estimated sequence obtaining unit is used for obtaining a current video actual watching people number time sequence corresponding to the current video data, obtaining an actual watching people number base number corresponding to the current video data, and performing operation according to the actual watching people number base number and a current video watching people number attenuation rate curve corresponding to the current video watching people number attenuation rate to obtain a current video estimated watching people number time sequence corresponding to the current video data; and

and the trend time sequence acquisition unit is used for calculating the difference between the current video actual watching people number time sequence and the current video estimated watching people number time sequence to obtain a trend time sequence.

In a third aspect, an embodiment of the present invention further provides a computer device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and when the processor executes the computer program, the video quality inspection method based on the number of people watching sequence according to the first aspect is implemented.

In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and the computer program, when executed by a processor, causes the processor to execute the video quality inspection method based on a sequence of people watching people according to the first aspect.

The embodiment of the invention provides a video quality inspection method, a video quality inspection device, computer equipment and a storage medium based on a watching person number sequence, wherein the method comprises the steps of obtaining a target video training subset and a target prediction submodel to which current video data correspond; calculating to obtain the attenuation rate of the number of watching people of the current video by taking the current video feature vector corresponding to the current video data as the input of a target prediction sub-model; acquiring the actual number of people watching the current video data, and calculating according to the corresponding current video number of people watching attenuation rate curve to obtain the estimated number of people watching the current video time sequence; and (4) obtaining a trend time sequence by subtracting the time sequence of the actual number of people watching the current video from the time sequence of the estimated number of people watching the current video. The method and the device have the advantages that the average watching person number attenuation rate of the video of a certain input category is predicted according to the video characteristics, then the trend time sequence obtained by comparing the actual watching person number attenuation rate is used as the video evaluation parameter, the evaluation process is simple, and the result accuracy is high.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the description below are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.

Fig. 1 is a schematic view of an application scenario of a video quality inspection method based on a number of people watching a video;

FIG. 2 is a schematic flowchart of a video quality inspection method based on a sequence of people watching persons according to an embodiment of the present invention;

FIG. 3 is a block diagram of a video quality inspection apparatus based on a sequence of people watching a video according to an embodiment of the present invention;

fig. 4 is a schematic block diagram of a computer device provided in an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

Referring to fig. 1 and fig. 2, fig. 1 is a schematic view of an application scenario of a video quality inspection method based on a sequence of people watching according to an embodiment of the present invention; fig. 2 is a schematic flowchart of a video quality inspection method based on a number of people watching sequence according to an embodiment of the present invention, where the video quality inspection method based on the number of people watching sequence is applied to a server, and the method is executed by application software installed in the server.

As shown in FIG. 2, the method includes steps S101 to S108.

S101, video time corresponding to each piece of video data in a video training set is obtained, and the video training set is divided into video training subsets corresponding to video grouping numbers according to the video time and a preset video time threshold value set.

In this embodiment, the video training set may include a video set of short videos (for example, videos with video duration not exceeding 15 minutes are classified into a short video type), a video set of medium videos (for example, videos with video duration between 15 minutes and 45 minutes are classified into a medium video type), and a video set of long videos (for example, videos with video duration above 45 minutes are classified into a long video type). In specific implementation, the video training set referred to in the application is composed of knowledge training type videos, and the knowledge training type videos are all uploaded to a server after being recorded by a teacher, or video data are automatically stored by the server when the teacher broadcasts a knowledge training course on line.

Based on the above example, the number of video packets may be set to 3, the preset first video duration threshold value is 15 minutes, the preset second video duration threshold value is 45 minutes, and the first video duration threshold value and the second video duration threshold value form a video duration threshold value set. Therefore, after the video packet number and the video time threshold value set are known, the video data can be divided into corresponding video training subsets according to the video time corresponding to each video data in the video training subsets.

S102, obtaining a video watching person number time sequence corresponding to each video data in each video training subset, and calculating video watching person number attenuation rates corresponding to the video watching person number time sequences respectively through a three-order exponential smoothing method.

In this embodiment, for example, the video training set is divided into a first video training subset (a video set corresponding to a short video), a second video training subset (a video set corresponding to a medium video), and a third video training subset (a video set corresponding to a long video), and since each video training subset includes a plurality of video data, a time sequence of the number of video viewers corresponding to each video data may be obtained first. For more clear understanding of the technical solution of the present application, the technical solution is described in detail with the number of video packets being 3, the first video duration threshold being 15 minutes, and the second video duration threshold being 45 minutes.

In an embodiment, the obtaining a time series of video viewers corresponding to each video data in each video training subset includes:

acquiring a video duration type corresponding to the video training subset;

if the video training subset corresponds to the first video duration type, acquiring a preset first video intercepting duration and a preset first interval time, intercepting first target duration video data corresponding to each video data in the video training subset according to the first video intercepting duration, acquiring the number of video viewers at the left end point of each time sub-interval after each first target duration video data is divided according to the first interval time, and normalizing to obtain a video watching number time sequence corresponding to each video data in the video training subset;

if the video training subset corresponds to a second video duration type, acquiring preset second video intercepting duration and preset second interval time, intercepting second target duration video data corresponding to the second video intercepting duration from each video data in the video training subset, acquiring the number of video viewers at the left endpoint of each time sub-interval after each second target duration video data is divided according to the second interval time, and normalizing to obtain a video viewer number time sequence corresponding to each video data in the video training subset;

if the video training subset corresponds to a third video duration type, acquiring a preset third video intercepting duration and a preset third interval time, intercepting third target duration video data corresponding to the third video intercepting duration from each video data in the video training subset, acquiring the number of video viewers at the left end point of each time sub-interval after each third target duration video data is divided according to the third interval time, and normalizing to obtain a video viewer number time sequence corresponding to each video data in the video training subset.

In this embodiment, the obtaining principle of calculating the time sequence of the number of video viewers when the video training subsets respectively correspond to the first video duration type, the second video duration type, and the third video duration type is the same, and at this time, the process of obtaining the time sequence of the number of video viewers corresponding to the video data is described in a case where the video training subsets correspond to the second video duration type.

Because the video durations corresponding to the video data in the video training subset of the second video duration type exceed 15 minutes, the preset interception duration of the second video is 15 minutes, the preset second time interval is 10 seconds, if 0 second of each video data is taken as a starting point, the time interval corresponding to the second target duration video data corresponding to each video data in the video training subset is 0-15 minutes, and after the second target duration video data are divided according to the second time interval of 10 seconds, each second target duration video data can be divided into 90 time sub-intervals. Taking one of the second target duration video data as an example, the number of video viewers corresponding to the left endpoint of each of the 90 time sub-intervals in the second target duration video data is obtained at this time, and an initial video viewer number time sequence is formed. Then obtaining the maximum value and the minimum value of the time sequence of the number of the initial video watching people, and finally obtaining the maximum value and the minimum value according to a min-max standardized normalization method (the normalization formula corresponding to the min-max standardized normalization method is x' = (x-x) _min ）/（x _max -x _min ) Normalizing the initial video watching people number time sequence to obtain a video watching people number time sequence corresponding to the second target duration video data. By referring to the above process, the time sequence of the number of video viewers corresponding to each video data in the video training subset can be obtained.

In an embodiment, the calculating the video watching person number decay rate corresponding to each video watching person number time series by a three-order exponential smoothing method includes:

calculating video watching person number attenuation curves respectively corresponding to the video watching person number time sequences by an accumulation three-order exponential smoothing method;

and obtaining the video watching person number attenuation rate corresponding to each video watching person number attenuation curve by a least square method.

In the present embodiment, since the third-order exponential smoothing includes two kinds of accumulation and multiplication, the present application adopts an accumulation third-order exponential smoothing method. The formula corresponding to the cumulative third-order exponential smoothing method is as follows:

wherein the content of the first and second substances,

、

、

all values of (a) are in [0,1]And three smoothing parameters which are required to be solved through the time sequence of the number of the video viewers are respectively marked as a first smoothing parameter, a second smoothing parameter and a third smoothing parameter.

Represents the value of the ith step after smoothing,

a trend factor is represented as a function of time,

which is indicative of the seasonal index,

representing the neutralization of a time series of video viewers

Corresponding to actual data, h is a positive integer.

When the time sequence of the number of people watching each video is taken as the basis, the accumulated three-order exponential smoothing method formula can be obtained

、

、

The attenuation function of the number of the video viewers corresponding to the time series of the number of the video viewers is obtained. And then calculating the video watching person number attenuation rate corresponding to the video watching person number attenuation function by using a least square method.

S103, acquiring a video image corresponding to each video data in each video training subset, and acquiring a video feature vector corresponding to each video data in the video training subsets through the video image corresponding to each video data; the video image comprises a video category characteristic, a video duration characteristic, an online date characteristic, a teacher characteristic and an audience group characteristic.

In this embodiment, each video data in each video training subset may acquire the video portrait of each video data in addition to the information of the time series of the number of viewers in the video. A video representation, such as video data, may include the following sub-representations:

basic attribute subimages including video duration, video category, online date, etc.;

the lecturer sprite is used for representing basic characteristics of the lecturer, such as age, gender, class of the lecturer and the like, and also comprises some counted characteristics, such as live broadcast times, watching times and the like;

audience segment sub-images, which include audience age, audience level, territory, etc.

At this time, after the basic attribute sprite, the instructor sprite and the audience group sprite corresponding to each piece of video data are known, the video feature vector corresponding to each piece of video data in the video training set can be obtained through the video image corresponding to each piece of video data. More specifically, when the video feature vector corresponding to the video data is obtained, values corresponding to video category features, video duration features and online date features included in the basic attribute sprite are obtained first, then values corresponding to the instructor features included in the instructor sprite are obtained, values corresponding to the audience group features included in the audience group sprite are obtained, and finally the values are sequentially connected in series to form the video feature vector corresponding to the video data.

The basic attribute sub-image comprises information of video time length, video category and online date, which belong to attribute data in the video data, and the information can be directly extracted from the attribute data in the video data. The age, the sex and the teacher grade contained in the teacher child portrait belong to attribute data in the video data, and the attribute data in the video data can be directly extracted; the lecturer subimage comprises live broadcast times, the watching times belong to statistical data in the video data, and the live broadcast times can be directly extracted from the statistical data in the video data. The audience age, audience level and region in the audience group subimage belong to statistical data in the video data, and can be directly extracted from the statistical data in the video data. When each piece of video data is uploaded to the server, attribute data of the video data can be set, and a statistical data collection buried point is set to collect the statistical data.

For example, 1 video data in a certain video training subset is recorded as video data a, the video category feature of the video data a corresponds to 3 (where 3 represents a category of video related to class C), the video duration feature corresponds to 30 (30 represents that the video duration of the video data a is 30 minutes), the online date feature corresponds to 20191201 (representing that the online date of the video data a is 12 and 1 in 2019), the instructor feature corresponds to [31 15 30 ] (representing that 5 tags included in an instructor child image corresponding to the video data a correspond to age 31, gender male, instructor level 5, live broadcast number 30, and viewing number 1000, where gender corresponds to 1, if gender is 2, the vector corresponding to audience group feature is [22 2 4403] (representing that the average audience data a corresponds to audience data a 22 and average level 2, the video audience data a is concentrated in series connection with guang zhen, and the video data a corresponding to 4403), and thus the video category feature of the video data a is obtained by using a specific video image vector.

S104, respectively carrying out model training through each video training subset to obtain a prediction sub-model respectively corresponding to each video training subset; the prediction sub-model is used for predicting the attenuation rate of the number of the video watching persons according to the video feature vectors corresponding to the video data.

In this embodiment, after the video feature vector corresponding to each video data in each video training subset and the corresponding video watching person number attenuation rate are obtained, a predictor model can be trained correspondingly through each video training subset. In particular, the predictor model may employ a multiple linear regression model.

For example, the video training set is divided into a first video training subset (a video set corresponding to a short video), a second video training subset (a video set corresponding to a medium video), and a third video training subset (a video set corresponding to a long video), and after the video feature vector corresponding to each video data and the corresponding attenuation rate of the number of video viewers are obtained, the video feature vector corresponding to each video data in the first video training subset is used as the input of the first prediction sub-model, and the attenuation rate of the number of video viewers corresponding to each video data in the first video training subset is used as the output of the first prediction sub-model, and the first prediction sub-model is trained to obtain the first prediction sub-model. Referring to the above process, a second predictor model and a third predictor model may also be obtained.

For example, the predictor models are all multiple linear regression models, and the models of the multiple linear regression models are as follows:

wherein the content of the first and second substances,

、

、……

is a non-random variable in the multiple linear regression model (the values in the video feature vector corresponding to each video data can be sequentially regarded as

、

、……

）；

、

、……、

Is the regression coefficient to be trained and,

is the random error term to be trained;

due to being in a known multiple linear regression model

、

、……

And y value, so that corresponding multiple linear regression models can be trained through each video training subset respectively to serve as the predictor models.

S105, if current video data are received, obtaining current video time corresponding to the current video data, and obtaining a target video training subset corresponding to the current video data and a target prediction sub-model corresponding to the target video training subset according to the current video time and video time intervals corresponding to the video training subsets respectively.

In this embodiment, since the video training set is divided into the plurality of video training subsets according to the video duration described in step S101, for example, the video duration is divided into a first video training subset corresponding to the first video duration type, the video duration is divided into a second video training subset corresponding to the second video duration type, and the video duration is divided into a third video training subset corresponding to the third video duration type, where the video duration does not exceed 15 minutes. The first video training subset corresponds to a first predictor model, the second video training subset corresponds to a second predictor model, and the third video training subset corresponds to a third predictor model.

At this time, if the obtained current video time length is 30 minutes, the obtained current video time length belongs to the second video time length type, so that the target video training subset to which the current video data corresponds is the second video training subset, and the target prediction submodel corresponding to the target video training subset is the second prediction submodel. By the mode of determining the corresponding prediction sub-model according to the video duration in advance, a prediction model capable of predicting the attenuation rate more accurately can be selected instead of only using the same prediction model in a general way.

And S106, taking the current video feature vector corresponding to the current video data as the input of the target prediction sub-model for operation to obtain the attenuation rate of the number of the current video viewers corresponding to the current video data.

In this embodiment, after the predictor model corresponding to the current video data is determined according to the current video duration, the current video feature vector corresponding to the current video data may be obtained, and the current video feature vector is input to the target predictor model for operation, so as to obtain the attenuation rate of the number of viewers of the corresponding current video.

S107, obtaining a current video actual watching people number time sequence corresponding to the current video data, obtaining an actual watching people number base number corresponding to the current video data, and performing operation according to the actual watching people number base number and a current video watching people number attenuation rate curve corresponding to the current video watching people number attenuation rate to obtain a current video estimated watching people number time sequence corresponding to the current video data.

In this embodiment, for example, it is determined in the server that the current time length corresponding to the current video data belongs to a second video time length type, after the current video data is obtained, a process of extracting second target time length video data corresponding to the video data of the second video time length type may be referred to, where 0 second of each video data is a starting point, a time interval of the current second target time length video data corresponding to the current video data is 0 to 15 minutes, and after the current second target time length video data is divided according to a second time interval of 10 seconds, 90 time sub-intervals are obtained, at this time, the number of video viewers corresponding to a left endpoint of each time sub-interval in the 90 time sub-intervals is obtained, so as to form a current initial video viewer number time sequence. Then obtaining the maximum value and the minimum value of the time sequence of the number of people watching the current initial video, and finally obtaining the maximum value and the minimum value of the time sequence of the number of people watching the current initial videoNormalizing by the min-max normalization method (the normalization formula corresponding to the min-max normalization method is x' = (x-x) _min ）/（x _max -x _min ) The current time sequence of the number of people watching the current initial video is normalized, and the time sequence of the number of people watching the current video corresponding to the current second target duration video data can be obtained.

In an embodiment, the obtaining the actual number of people watching the video data in step S107, and performing an operation according to the actual number of people watching the video data and a current video number of people watching attenuation rate curve corresponding to the current video number of people watching the video data to obtain a current video estimated number of people watching time sequence corresponding to the current video data includes:

obtaining a current video watching person number attenuation rate curve according to the current video watching person number attenuation rate;

acquiring a target video intercepting time corresponding to a target video training subset, and intercepting a corresponding target part in a current video watching population attenuation rate curve according to the target video intercepting time to obtain a current target attenuation rate curve;

acquiring target interval time corresponding to a target video training subset, taking an original point as a starting point, and sequentially taking points from a current target attenuation rate curve according to the target interval time to form a current initial time sequence;

and multiplying each sequence value in the current initial time sequence by the actual number of people watched to form the current video estimated number of people watched time sequence.

In this embodiment, when the attenuation rate of the number of current video viewers is obtained, the attenuation rate curve of the number of current video viewers corresponding to the attenuation rate curve can be reversely derived. In a current video watching population attenuation rate curve, a left starting point is generally defaulted to be corresponding to a horizontal axis value of 0 (the horizontal axis represents time), a vertical axis value of 1 (the vertical axis is a dimensionless normalized value), and then a gradually-attenuated decreasing function is carried out.

Because the current video data is of the same video time type as the target video training subset, the target video interception time and the target interval time corresponding to the target video training subset can be obtained at this time, and then points are sequentially taken from the current target attenuation rate curve (at this time, the points are taken as longitudinal coordinate values corresponding to the target points) by taking the original point as the starting point according to the target interval time to form a current initial time sequence. And finally, multiplying all the sequence values in the current initial time sequence by the actual number of people watching the video to form the current video estimated number of people watching the video. In this way, the estimated number of people watching the current video can be estimated according to the predicted attenuation rate of the number of people watching the current video.

And S108, calculating the difference between the time sequence of the actual number of people watching the current video and the time sequence of the estimated number of people watching the current video to obtain a trend time sequence.

In this embodiment, if the time series of the actual number of people watching the current video is denoted as X (t), the time series of the estimated number of people watching the current video is denoted as Y (t), and the time series of the trend is denoted as Z (t), then:

Z（t）= X（t）- Y（t）

the trend time series calculated in this way can be used as a reference series for evaluating video quality. The trend time series may then be transmitted to the transmitting terminal that transmitted the current video data.

In an embodiment, step S108 is followed by:

and acquiring the actual watching person number attenuation rate corresponding to the current video actual watching person number time sequence, and calculating the difference between the actual watching person number attenuation rate and the current video watching person number attenuation rate to obtain a comparison result.

In this embodiment, the process of calculating the attenuation rate according to the time series in step S102 may be referred to, so as to calculate the attenuation rate of the actual number of people watching corresponding to the time series of the actual number of people watching the current video. And then, the attenuation rate of the number of people actually watched is subtracted from the attenuation rate of the number of people currently watched to obtain a comparison result.

In an embodiment, the obtaining an attenuation rate of the actual number of people watching the current video according to the time sequence of the actual number of people watching the current video, subtracting the attenuation rate of the actual number of people watching the current video from the attenuation rate of the actual number of people watching the current video, and obtaining a comparison result further includes:

if the comparison result is larger than 0, carrying out first video grade value labeling on the current video data;

if the comparison result is equal to 0, carrying out second video grade value labeling on the current video data;

and if the comparison result is less than 0, carrying out third video grade value labeling on the current video data.

In this case, the comparison result is listed as follows:

if the comparison result is less than 0, the video quality of the video is relatively high in the same type of video. The actual decay rate is small, namely the number of viewers runs away more slowly, in other words, the viewers entering the video stay longer than the average level of the viewers of the video, which can show that the quality of the video is higher than the average level of the video; therefore, the labeling of the first video grade value can be performed, for example, the first video grade value corresponds to 3;

if the comparison result is equal to 0, the video reaches the average level of the similar video; therefore, the marking of the second video grade value can be carried out, for example, the second video grade value corresponds to 2;

if the comparison result is greater than 0, the video is lower in quality compared with the similar video; therefore, a third video level value labeling can be performed, for example, the third video level value corresponds to 1.

In an embodiment, step S108 is followed by:

and acquiring a positive value interval, a negative value interval and a zero value interval in the trend time sequence.

In this embodiment, the positive value interval in the trend time sequence indicates that there are consecutive positive sequence values in the trend time sequence, then a time point corresponding to the leftmost sequence value in the target sequence set is obtained as a positive value interval start time, a time point corresponding to the rightmost sequence value in the target sequence set is obtained as a positive value interval end time, and after the positive value interval start time and the positive value interval end time are known, the positive value time interval corresponding to the positive value interval can be obtained. With reference to the acquiring process of the positive value interval and the positive value time interval, the negative value interval and the negative value time interval in the trend time sequence can be obtained in the same way, and the zero value interval and the zero value time interval in the trend time sequence can also be obtained.

In the positive time interval, the video audience slowly declines or even does not basically lose, which indicates that the video is a high-quality video segment. The video viewer drops off rapidly during the zero value time interval, indicating that the video segment is a medium quality video segment. In the negative value time interval, the video audience is accelerated to decline, and the video is represented as a low-quality video segment.

In an embodiment, the obtaining of the positive value interval, the negative value interval, and the zero value interval in the trend time series further includes:

and carrying out video key time node marking on a positive value time interval corresponding to the positive value interval, a zero value time interval corresponding to the zero value interval and a negative value time interval corresponding to the negative value interval of the current video data to obtain marked video data.

In this embodiment, since the positive value time interval, the negative value time interval, and the zero value time interval are known at this time, the key time node of the video can be labeled in the complete time interval corresponding to the current video data, so that other users can refer to the key time node of the video and control the video to fast forward to the high-quality video segment corresponding to the positive value time interval. And effectively dividing the video frequency band reference trend time sequence by marking video key time nodes on the video.

intercepting and acquiring corresponding first target video data from the current video data according to a positive value time interval corresponding to the positive value interval;

and performing voice recognition on the first target video data to obtain a corresponding first target text.

In this embodiment, since the positive time interval is known at this time, the corresponding high-quality video segment can be cut out to obtain the first target video data for use as the subsequent voice-to-text conversion.

And performing voice recognition on the audio data in the first target video data through a plurality of methods to obtain a first target text. For example, the first target text obtained by performing voice recognition on the audio data in the first target video data may be performed by an N-Gram model.

In specific implementation, the topic extraction can be performed on the first target text. For example, in actual operation, the text corresponding to the normal speech rate of 2 minutes has about 300 characters, and can be classified as "long text". For long texts, a text topic can be extracted by utilizing an LDA topic model, and the topic can be understood as a 'knowledge point' in a training scene.

The knowledge points corresponding to the first target text can be understood as the knowledge points liked by the audience, and can be pushed to the user side as recommendation data.

The method utilizes a machine learning model to predict the average number of people watching a certain input video according to the video characteristics, and then obtains a trend time sequence by comparing the attenuation rate of the actual number of people watching the input video, so that the trend time sequence can be used as an evaluation parameter for the overall quality of the video, the evaluation process is simple, and the accuracy of an evaluation result is high.

The embodiment of the invention also provides a video quality inspection device based on the number of people watching, which is used for executing any embodiment of the video quality inspection method based on the number of people watching. Specifically, referring to fig. 3, fig. 3 is a schematic block diagram of a video quality inspection apparatus based on a sequence of people watching people according to an embodiment of the present invention. The video quality inspection apparatus 100 based on the viewer number sequence may be configured in a server.

As shown in fig. 3, the video quality inspection apparatus 100 based on a sequence of viewers includes: the video training set grouping unit 101, the first attenuation rate acquisition unit 102, the video feature vector acquisition unit 103, the predictor model training unit 104, the current video data grouping unit 105, the second attenuation rate acquisition unit 106, the estimated sequence acquisition unit 107 and the trend time sequence acquisition unit 108.

The video training set grouping unit 101 is configured to obtain a video time length corresponding to each video data in a video training set, and divide the video training set into video training subsets corresponding to the video packet number according to the video time length according to a preset video packet number and a preset video time length threshold value set.

Based on the above example, the number of video packets may be set to 3, the preset first video duration threshold value is 15 minutes, the preset second video duration threshold value is 45 minutes, and the first video duration threshold value and the second video duration threshold value form a video duration threshold value set. Therefore, after the video grouping number and the video duration threshold value set are known, the video data can be divided into corresponding video training subsets according to the video duration corresponding to each video data in the video training sets.

The first attenuation rate obtaining unit 102 is configured to obtain a video watching person number time sequence corresponding to each video data in each video training subset, and calculate video watching person number attenuation rates corresponding to the video watching person number time sequences respectively by a three-order exponential smoothing method.

In an embodiment, the first attenuation rate obtaining unit 102 includes:

the video duration type acquisition unit is used for acquiring the video duration type corresponding to the video training subset;

the first calculating unit is used for acquiring a preset first video intercepting time length and a preset first interval time if the video training subset corresponds to the first video time length type, intercepting first target time length video data corresponding to the first video intercepting time length from each video data in the video training subset, acquiring the number of video watching persons on the left end point of each time subinterval after each first target time length video data is divided according to the first interval time, and normalizing to obtain a video watching person number time sequence corresponding to each video data in the video training subset;

the second calculating unit is used for acquiring a preset second video intercepting time length and a preset second interval time if the video training subset corresponds to the second video time length type, intercepting second target time length video data corresponding to the second video intercepting time length from each video data in the video training subset, acquiring the number of video viewers at the left end point of each time sub-interval after each second target time length video data is divided according to the second interval time, and normalizing to obtain a video watching number time sequence corresponding to each video data in the video training subset;

and the third calculating unit is used for acquiring a preset third video intercepting time length and a preset third interval time if the video training subset corresponds to the third video time length type, intercepting third target time length video data corresponding to the third video intercepting time length from each video data in the video training subset, acquiring the number of video viewers at the left end point of each time sub-interval after each third target time length video data is divided according to the third interval time, and normalizing to obtain a video viewer number time sequence corresponding to each video data in the video training subset.

Because the video durations corresponding to the video data in the video training subset of the second video duration type exceed 15 minutes, the preset interception duration of the second video is 15 minutes, the preset second time interval is 10 seconds, if 0 second of each video data is taken as a starting point, the time interval corresponding to the second target duration video data corresponding to each video data in the video training subset is 0-15 minutes, and after the second target duration video data are divided according to the second time interval of 10 seconds, each second target duration video data can be divided into 90 time sub-intervals. Taking one of the second target duration video data as an example, the number of video viewers corresponding to the left endpoint of each of the 90 time sub-intervals in the second target duration video data is obtained at this time, and an initial video viewer number time sequence is formed. Then obtaining the maximum value and the minimum value of the time sequence of the number of the initial video watching people, and finally obtaining the maximum value and the minimum value according to a min-max standardized normalization method (the normalization formula corresponding to the min-max standardized normalization method is x' = (x-x) _min ）/（x _max -x _min ) Normalizing the initial video watching people number time sequence to obtain a video watching people number time sequence corresponding to the second target duration video data. By referring to the above process, the number of video viewers corresponding to each video data in the video training subset can be obtainedAnd (4) a sequence of (A) and (B).

In an embodiment, the first attenuation rate obtaining unit 102 includes:

the attenuation curve acquisition unit is used for calculating video watching person number attenuation curves respectively corresponding to the video watching person number time sequences by an accumulation three-order exponential smoothing method;

and the least square unit is used for acquiring the video watching people number attenuation rate corresponding to each video watching people number attenuation curve by a least square method.

wherein the content of the first and second substances,

、

、

all values of (a) are in [0,1]Three smoothing parameters which need to be solved through the video watching person number time sequence are respectively marked as a first smoothing parameter and a second smoothing parameterNumber, third smoothing parameter.

Represents the value of the ith step after smoothing,

a trend factor is represented as a function of time,

which represents an index of the seasons of the season,

representing the neutralization of a time series of video viewers

Corresponding to actual data, h is a positive integer.

When the time sequence of the number of the watching people of each video is taken as the basis, the cumulative three-order exponential smoothing method formula can be obtained

、

、

Thereby obtaining the attenuation function of the number of the video viewers corresponding to the time series of the number of the video viewers. And then calculating the video watching person number attenuation rate corresponding to the video watching person number attenuation function by using a least square method.

A video feature vector obtaining unit 103, configured to obtain a video image corresponding to each video data in each video training subset, and obtain a video feature vector corresponding to each video data in the video training subset through the video image corresponding to each video data; the video images comprise video category characteristics, video time characteristics, online date characteristics, instructor characteristics and audience group characteristics.

At this time, after the basic attribute sprite, the instructor sprite and the audience group sprite corresponding to each piece of video data are known, the video feature vector corresponding to each piece of video data in the video training set can be obtained through the video image corresponding to each piece of video data.

For example, 1 video data in a certain video training subset is recorded as video data a, the video category feature of the video data a is correspondingly taken as 3 (where 3 represents a category of C course related video), the video duration feature is taken as 30 (30 represents that the video duration of the video data a is 30 minutes), the online date feature is taken as 20191201 (representing that the online date of the video data a is 2019, 12 and 1 months), the instructor feature is taken as [31 5 30 1000] (representing that 5 tags included in an instructor sub-image corresponding to the video data a are respectively taken as age 31, male, instructor level 5, live broadcast time 30 and viewing time 1000, where the gender corresponds to 1, if the gender is taken as 2), the audience group feature corresponds to [22 2 ] (representing that the average audience of the video data a corresponds to 22 and the average audience level is 2, the region is centered in guangdong zhen, 4403 corresponding to east girl), and the video category feature vector is taken as a video data.

A predictor model training unit 104, configured to perform model training on each video training subset respectively to obtain predictor models corresponding to each video training subset respectively; the prediction sub-model is used for predicting the attenuation rate of the number of the video watching persons according to the video feature vectors corresponding to the video data.

In this embodiment, after the video feature vector and the attenuation rate of the number of viewers corresponding to each video data in each video training subset are obtained, a predictor model can be trained through each video training subset. In particular, the predictor model may employ a multiple linear regression model.

A current video data grouping unit 105, configured to, if current video data is received, obtain a current video duration corresponding to the current video data, and obtain, according to the current video duration and a video duration interval corresponding to each video training subset, a target video training subset to which the current video data belongs and a target prediction submodel corresponding to the target video training subset.

In this embodiment, since the video training set grouping unit 101 is described to divide the video training set into a plurality of video training subsets according to the video duration, for example, a division that the video duration does not exceed 15 minutes is in the first video training subset corresponding to the first video duration type, a division that the video duration exceeds 15 minutes to 45 minutes is in the second video training subset corresponding to the second video duration type, and a division that the video duration exceeds 45 minutes is in the third video training subset corresponding to the third video duration type. The first video training subset corresponds to a first predictor model, the second video training subset corresponds to a second predictor model, and the third video training subset corresponds to a third predictor model.

At this time, if the obtained current video time length is 30 minutes, the obtained current video time length belongs to the second video time length type, so that the target video training subset to which the current video data corresponds is the second video training subset, and the target prediction submodel corresponding to the target video training subset is the second prediction submodel. By means of the mode of determining the corresponding prediction sub-model according to the video duration in advance, a prediction model capable of predicting the attenuation rate more accurately can be selected, and the prediction model is not generally used only by the same prediction model.

A second attenuation rate obtaining unit 106, configured to perform operation using the current video feature vector corresponding to the current video data as an input of the target prediction sub-model, so as to obtain an attenuation rate of the number of watching people of the current video corresponding to the current video data.

In this embodiment, after the predictor model corresponding to the current video data is determined according to the current video duration, the current video feature vector corresponding to the current video data can be obtained, and the current video feature vector is input to the target predictor model for operation, so as to obtain the attenuation rate of the number of watching people of the corresponding current video.

The estimated sequence obtaining unit 107 is configured to obtain a current video actual watching people number time sequence corresponding to the current video data, obtain an actual watching people number corresponding to the current video data, and perform an operation according to the actual watching people number and a current video watching people number attenuation rate curve corresponding to the current video watching people number attenuation rate to obtain a current video estimated watching people number time sequence corresponding to the current video data.

In this embodiment, for example, it is determined in the server that the current time length corresponding to the current video data belongs to a second video time length type, after the current video data is obtained, a process of extracting second target time length video data corresponding to the video data of the second video time length type may be referred to, where 0 second of each video data is a starting point, a time interval of the current second target time length video data corresponding to the current video data is 0 to 15 minutes, and after the current second target time length video data is divided according to a second time interval of 10 seconds, 90 time sub-intervals are obtained, at this time, the number of video viewers corresponding to a left endpoint of each time sub-interval in the 90 time sub-intervals is obtained, so as to form a current initial video viewer number time sequence. Then obtaining the maximum value and the minimum value of the time sequence of the number of the watching people of the current initial video, and finally normalizing according to the min-max (The normalization formula corresponding to the min-max normalization method is x' = (x-x) _min ）/（x _max -x _min ) The current actual watching person number time sequence of the current video corresponding to the current second target time length video data can be obtained by carrying out normalization processing on the current initial video watching person number time sequence.

In an embodiment, the predicted sequence obtaining unit 107 includes:

the current attenuation curve obtaining unit is used for obtaining a current video watching person number attenuation rate curve according to the current video watching person number attenuation rate;

the current target attenuation rate curve acquisition unit is used for acquiring a target video interception duration corresponding to the target video training subset, and intercepting a corresponding target part in the current video watching population attenuation rate curve according to the target video interception duration to obtain a current target attenuation rate curve;

the current initial time sequence obtaining unit is used for obtaining target interval time corresponding to the target video training subset, and points are sequentially taken from a current target attenuation rate curve by taking an original point as a starting point according to the target interval time to form a current initial time sequence;

and the current video estimated watching person number time sequence acquisition unit is used for multiplying each sequence value in the current initial time sequence by the actual watching person number base number to form the current video estimated watching person number time sequence.

In this embodiment, when the attenuation rate of the number of people watching the current video is obtained, the attenuation rate curve of the number of people watching the current video corresponding to the attenuation rate curve can be reversely derived. In the current video watching people number decay rate curve, the default left starting point is generally 0 (the horizontal axis represents time) corresponding to the horizontal axis, the vertical axis is 1 (the vertical axis is a dimensionless normalized value), and then a gradually decaying decreasing function is carried out.

Because the current video data is of the same video time type as the target video training subset, the target video interception time and the target interval time corresponding to the target video training subset can be obtained at this time, and then points are sequentially taken from the current target attenuation rate curve (at this time, the points are taken as longitudinal coordinate values corresponding to the target points) by taking the original point as the starting point according to the target interval time to form a current initial time sequence. And finally, multiplying all the sequence values in the current initial time sequence by the actual number of the watched people to form the current video estimated number of the watched people time sequence. In this way, the estimated number of people watching the current video can be estimated according to the predicted attenuation rate of the number of people watching the current video.

And the trend time sequence obtaining unit 108 is configured to calculate a difference between the current video actual watching people number time sequence and the current video estimated watching people number time sequence to obtain a trend time sequence.

Z（t）= X（t）- Y（t）

In one embodiment, the video quality inspection apparatus 100 based on the number of people watching the sequence of people further includes:

and the comparison result acquisition unit is used for acquiring the attenuation rate of the number of actually-watched persons corresponding to the time sequence of the number of actually-watched persons of the current video, and obtaining a comparison result by calculating the difference between the attenuation rate of the number of actually-watched persons and the attenuation rate of the number of currently-watched persons.

In the present embodiment, the process of calculating the attenuation rate according to the time series in the first attenuation rate obtaining unit 102 may be referred to, so as to calculate the attenuation rate of the actual number of people watching corresponding to the time series of the actual number of people watching the current video. And then, calculating the difference between the attenuation rate of the actual number of people watching the video and the attenuation rate of the current number of people watching the video to obtain a comparison result.

the first labeling unit is used for labeling the first video grade value of the current video data if the comparison result is greater than 0;

the second labeling unit is used for labeling the level value of the current video data by a second video if the comparison result is equal to 0;

and the third labeling unit is used for labeling the third video grade value of the current video data if the comparison result is less than 0.

In this case, the comparison result is listed as follows:

if the comparison result is less than 0, the video quality of the video in the same type of video is relatively high. The actual decay rate is small, namely the number of viewers runs away more slowly, in other words, the viewers entering the video stay longer than the average level of the viewers of the video, which can show that the quality of the video is higher than the average level of the video; therefore, the first video grade value can be labeled, for example, the first video grade value corresponds to 3;

if the comparison result is greater than 0, the video is lower in quality compared with the similar video; therefore, a third video rank value can be labeled, for example, the third video rank value corresponds to 1.

and the interval dividing unit is used for acquiring a positive value interval, a negative value interval and a zero value interval in the trend time sequence.

In the positive time interval, the video audience slowly declines or even does not basically lose, which indicates that the video is a high-quality video segment. The video viewer slows down during the zero value time interval, indicating that the segment of video is a medium quality video segment. In the negative value time interval, the video audience is accelerated to decline, and the video is represented as a low-quality video segment.

and the video key point marking unit is used for marking the video key time nodes of the positive value time interval corresponding to the positive value interval, the zero value time interval corresponding to the zero value interval and the negative value time interval corresponding to the negative value interval of the current video data to obtain the marked video data.

In this embodiment, since the positive value time interval, the negative value time interval, and the zero value time interval are known at this time, the key time node of the video can be labeled in the complete time interval corresponding to the current video data, so that other users can refer to the key time node of the video and control the video to fast forward to the high-quality video segment corresponding to the positive value time interval. And effectively dividing the video frequency band reference trend time sequence by marking the video key time nodes on the video.

the first target video data acquisition unit is used for intercepting and acquiring corresponding first target video data in the current video data according to a positive value time interval corresponding to the positive value interval;

and the voice recognition unit is used for performing voice recognition on the first target video data to obtain a corresponding first target text.

In this embodiment, since the positive time interval is known at this time, the corresponding high-quality video segment can be captured to obtain the first target video data for use as the subsequent speech-to-text conversion.

The device utilizes a machine learning model to predict the average watching person number attenuation rate of a certain input video according to the video characteristics, and then obtains a trend time sequence by comparing the actual watching person number attenuation rate of the input video, so that the trend time sequence can be used as an evaluation parameter for the overall quality of the video, the evaluation process is simple, and the accuracy of the evaluation result is high.

The video quality inspection apparatus based on the number of viewers can be implemented in the form of a computer program that can be run on a computer device as shown in fig. 4.

Referring to fig. 4, fig. 4 is a schematic block diagram of a computer device according to an embodiment of the present invention. The computer device 500 is a server, and the server may be an independent server or a server cluster composed of a plurality of servers.

Referring to fig. 4, the computer device 500 includes a processor 502, memory, and a network interface 505 connected by a system bus 501, where the memory may include a non-volatile storage medium 503 and an internal memory 504.

The non-volatile storage medium 503 may store an operating system 5031 and a computer program 5032. The computer program 5032, when executed, causes the processor 502 to perform a video quality inspection method based on a sequence of people watching.

The processor 502 is used to provide computing and control capabilities that support the operation of the overall computer device 500.

The internal memory 504 provides an environment for the operation of the computer program 5032 in the non-volatile storage medium 503, and when the computer program 5032 is executed by the processor 502, the processor 502 can be enabled to execute a video quality inspection method based on a sequence of people watching the video.

The network interface 505 is used for network communication, such as providing transmission of data information. It will be appreciated by those skilled in the art that the configuration shown in fig. 4 is a block diagram of only a portion of the configuration associated with aspects of the present invention, and is not intended to limit the computing device 500 to which aspects of the present invention may be applied, as a particular computing device 500 may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.

The processor 502 is configured to run the computer program 5032 stored in the memory to implement the video quality inspection method based on the number of people watching the video sequence disclosed in the embodiment of the present invention.

Those skilled in the art will appreciate that the embodiment of a computer device illustrated in fig. 4 does not constitute a limitation on the specific construction of the computer device, and that in other embodiments a computer device may include more or fewer components than those illustrated, or some components may be combined, or a different arrangement of components. For example, in some embodiments, the computer device may only include a memory and a processor, and in such embodiments, the structures and functions of the memory and the processor are consistent with those of the embodiment shown in fig. 4, and are not described herein again.

It should be understood that, in the embodiment of the present invention, the Processor 502 may be a Central Processing Unit (CPU), and the Processor 502 may also be other general-purpose processors, digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, and the like. Wherein a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

In another embodiment of the invention, a computer-readable storage medium is provided. The computer readable storage medium may be a non-volatile computer readable storage medium. The computer readable storage medium stores a computer program, wherein the computer program, when executed by a processor, implements the video quality inspection method based on the number of people watching sequence disclosed by the embodiment of the invention.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses, devices and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of illustrating clearly the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the embodiments provided by the present invention, it should be understood that the disclosed apparatus, device and method can be implemented in other ways. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only a logical division, and there may be other divisions when the actual implementation is performed, or units having the same function may be grouped into one unit, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may also be an electric, mechanical or other form of connection.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment of the present invention.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in the form of hardware, or may also be implemented in the form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in a storage medium. Based on such understanding, the technical solution of the present invention essentially or partially contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a magnetic disk, or an optical disk, and various media capable of storing program codes.

While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and various equivalent modifications and substitutions can be easily made by those skilled in the art within the technical scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A video quality inspection method based on a number sequence of viewers is characterized by comprising the following steps:

respectively carrying out model training through each video training subset to obtain a prediction submodel respectively corresponding to each video training subset; the prediction sub-model is used for predicting the attenuation rate of the number of people watching the video according to the video feature vector corresponding to the video data;

2. The video quality inspection method based on the number of viewers as claimed in claim 1, further comprising:

acquiring the actual watching person number attenuation rate corresponding to the current video actual watching person number time sequence, and obtaining a comparison result by calculating the difference between the actual watching person number attenuation rate and the current video watching person number attenuation rate;

acquiring a positive value interval, a negative value interval and a zero value interval in the trend time sequence;

3. The video quality inspection method based on the number of people watching the sequence of people according to claim 2, further comprising:

4. The method for video quality inspection based on the number of people watching the video sequence according to claim 1, wherein the obtaining the time sequence of the number of people watching the video corresponding to each video data in each video training subset comprises:

acquiring a video duration type corresponding to the video training subset;

if the video training subset corresponds to the first video duration type, acquiring preset first video intercepting duration and preset first interval time, intercepting first target duration video data corresponding to the first video intercepting duration from each video data in the video training subset, acquiring the number of video viewers at the left endpoint of each time sub-interval after each first target duration video data is divided according to the first interval time, and normalizing to obtain a video watching number time sequence corresponding to each video data in the video training subset;

if the video training subset corresponds to a third video duration type, acquiring a preset third video intercepting duration and a preset third interval time, intercepting third target duration video data corresponding to each video data in the video training subset according to the third video intercepting duration, acquiring the number of video viewers at the left end point of each time sub-interval after each third target duration video data is divided according to the third interval time, and normalizing to obtain a video watching number time sequence corresponding to each video data in the video training subset.

5. The method for video quality inspection based on viewer number sequence according to claim 1, wherein the calculating the video viewer number decay rate corresponding to each video viewer number time sequence by a third-order exponential smoothing method comprises:

6. The method for video quality inspection based on the number of people watching the sequence of the video according to claim 1, wherein the obtaining of the number of people actually watching the video corresponding to the current video data and the obtaining of the time sequence of the estimated number of people watching the video at present corresponding to the current video data by performing the operation according to the number of people actually watching the video and the current video watching number attenuation rate curve corresponding to the current video watching number attenuation rate comprises:

acquiring a target video interception time length corresponding to a target video training subset, and intercepting a corresponding target part in a current video watching population attenuation rate curve according to the target video interception time length to obtain a current target attenuation rate curve;

7. The method for video quality inspection based on the number of people watching the sequence of the video according to claim 2, wherein the step of obtaining the attenuation rate of the number of people actually watching the current video corresponding to the time sequence of the number of people actually watching the current video, and after obtaining the comparison result by subtracting the attenuation rate of the number of people actually watching the current video from the attenuation rate of the number of people currently watching the current video, further comprises the steps of:

if the comparison result is larger than 0, marking a first video grade value of the current video data;

8. A video quality inspection device based on a sequence of the number of people watching is characterized by comprising:

the video training set grouping unit is used for acquiring video time corresponding to each video data in a video training set, and dividing the video training set into video training subsets corresponding to the video grouping number according to the video time according to a preset video grouping number and a preset video time threshold value set;

9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the computer program implements the video quality inspection method based on a sequence of people watching according to any one of claims 1 to 7.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a processor, causes the processor to execute the video quality inspection method based on a sequence of viewing people of any one of claims 1 to 7.