CN109922334B - Video quality identification method and system - Google Patents

Video quality identification method and system Download PDF

Info

Publication number
CN109922334B
CN109922334B CN201711330513.9A CN201711330513A CN109922334B CN 109922334 B CN109922334 B CN 109922334B CN 201711330513 A CN201711330513 A CN 201711330513A CN 109922334 B CN109922334 B CN 109922334B
Authority
CN
China
Prior art keywords
target video
quality
video
specified
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711330513.9A
Other languages
Chinese (zh)
Other versions
CN109922334A (en
Inventor
张深源
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Youku Culture Technology Beijing Co ltd
Original Assignee
Alibaba China Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba China Co Ltd filed Critical Alibaba China Co Ltd
Priority to CN201711330513.9A priority Critical patent/CN109922334B/en
Publication of CN109922334A publication Critical patent/CN109922334A/en
Application granted granted Critical
Publication of CN109922334B publication Critical patent/CN109922334B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)

Abstract

The embodiment of the application discloses a method and a system for identifying video quality, wherein the method comprises the following steps: acquiring playing information of a target video, and judging whether the target video is a suspected low-quality video or not based on the playing information; after the target video is determined to be a suspected low-quality video, identifying at least one of audio information and picture information of the target video to judge whether specified quality characteristics exist in the target video; acquiring a judgment threshold value associated with the specified quality feature, and judging whether the specified quality feature belongs to an abnormal quality feature or not based on the judgment threshold value; determining whether the target video is a low-quality video based on the determination result for the specified quality feature. The technical scheme provided by the application can evaluate the quality of the video more comprehensively.

Description

Video quality identification method and system
Technical Field
The present application relates to the field of internet technologies, and in particular, to a method and a system for identifying video quality.
Background
With the continuous development of internet technology, more and more video playing platforms emerge. The quality of the video in the video playing platform usually determines whether the video playing platform is concerned by the user. Therefore, the video playing platform usually identifies the quality of the video to remove the video with lower quality.
Currently, when identifying the video quality, it is common to evaluate the indicators of the video, such as the blurriness, the sharpness, and the encoding rate. Specifically, videos to be evaluated can be grouped, and then, indexes such as the fuzziness, the definition, the encoding rate and the like of the videos can be judged in a subjective grading and objective grading mode, so that the videos with low quality can be comprehensively determined finally.
However, the problem of video quality affecting user experience is more and more infinite, and the problems in terms of video ambiguity, definition, encoding rate, etc. are no longer limited, so that a more comprehensive video quality identification method is needed.
Disclosure of Invention
The embodiment of the application aims to provide a video quality identification method and a video quality identification system, which can evaluate the quality of a video more comprehensively.
In order to achieve the above object, an embodiment of the present application provides a method for identifying video quality, where the method includes: acquiring playing information of a target video, and judging whether the target video is a suspected low-quality video or not based on the playing information; after the target video is determined to be a suspected low-quality video, identifying at least one of audio information and picture information of the target video to judge whether specified quality characteristics exist in the target video; acquiring a judgment threshold value associated with the specified quality feature, and judging whether the specified quality feature belongs to an abnormal quality feature or not based on the judgment threshold value; determining whether the target video is a low-quality video based on the determination result for the specified quality feature.
In order to achieve the above object, an embodiment of the present application further provides a video quality recognition system, where the system includes a memory and a processor, the memory stores a computer program, and the computer program, when executed by the processor, implements the following steps: acquiring playing information of a target video, and judging whether the target video is a suspected low-quality video or not based on the playing information; after the target video is determined to be a suspected low-quality video, identifying at least one of audio information and picture information of the target video to judge whether specified quality characteristics exist in the target video; acquiring a judgment threshold value associated with the specified quality feature, and judging whether the specified quality feature belongs to an abnormal quality feature or not based on the judgment threshold value; determining whether the target video is a low-quality video based on the determination result for the specified quality feature.
Therefore, according to the technical scheme provided by the application, when the quality of the target video is identified, whether the target video is suspected low-quality video can be judged preliminarily according to the playing information of the target video. The playing information may be obtained by the video playing platform by counting the playing condition of the target video. For example, the playing information may be information such as the number of times the video is completely played, or the average viewing time of the video. After the target video is determined to be a suspected low-quality video, the audio information and/or the picture information of the target video can be further identified. During identification, the target video can be detected respectively according to several factors which are common at present and cause video quality reduction. These factors that cause degradation of video quality may be used as the specified quality characteristics. For example, the specified quality feature may be machine audio, advertising information, a black screen, a still image, etc. in the target video. After identifying the presence of the specified quality feature in the target video, different decision thresholds may be provided for different specified quality features. In this way, a determination threshold value associated with the specified quality feature can be acquired, and whether the specified quality feature exceeds a normal tolerance range is determined according to the determination threshold value, so that whether the specified quality feature is an abnormal quality feature is determined. Finally, in conjunction with the determination results for each specified quality feature, it can be determined whether the target video is a low-quality video. Therefore, the technical scheme provided by the application can detect whether the abnormal quality characteristics exist in the target video from multiple aspects, so that the quality of the target video can be more comprehensively evaluated.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a diagram illustrating steps of a video quality identification method according to an embodiment of the present disclosure;
fig. 2 is a flowchart of a video quality recognition method according to an embodiment of the present disclosure;
fig. 3 is a schematic structural diagram of a video quality recognition system according to an embodiment of the present application.
Detailed Description
In order to make those skilled in the art better understand the technical solutions in the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art without any inventive work based on the embodiments in the present application shall fall within the scope of protection of the present application.
The application provides a video quality identification method which can be applied to a server of a video playing website. Referring to fig. 1 and 2, the method may include the following steps.
S1: the method comprises the steps of obtaining playing information of a target video, and judging whether the target video is a suspected low-quality video or not based on the playing information.
In this embodiment, the target video may be a video of which quality is to be evaluated. And after the target video is uploaded to the video playing website, the target video can be watched by the user. During the watching process of the user, the server of the video playing website can record the playing information of the target video. The playing information may include the number of times the target video is played, the average duration of time the target video is played, the number of times the target video is completely played, the number of times the target video is approved/forwarded/reported/downloaded, and the like.
In this embodiment, the playing information of the target video may be stored in association with the identifier of the target video in the server. The identification of the target video may be a digital encoding of the target video in a video playback website. The identification and the playing information of the target video can be stored in a key-value form in an associated manner. The identifier may be used as a key, and the playing information may be used as a value, so that the server may read the playing information of the target video from the storage device by querying the identifier of the target video.
In this embodiment, after the playing information is obtained, it may be preliminarily determined whether the target video is a suspected low-quality video according to the playing information. In particular, considering that the video with lower quality is generally considered, the time for the user to watch the video is not too long, and the probability that the video with lower quality is reported by the user is higher. In view of this, in one embodiment, it may be determined whether the target video is a suspected low-quality video according to a ratio of an average playing time length to a total time length of the target video included in the playing information. The average playing time of the target video may be obtained by averaging the playing time of the target video in each playing process. For example, if the average playing time of the target video is 1 minute and the total time of the target video is 10 minutes, the ratio of the average playing time to the total time may be 0.1. In the present embodiment, a specified duration ratio threshold value for determining whether the calculated ratio is within the normal range may be set in advance. The specified duration proportion threshold may be determined for the proportion of normal video. For example, the ratio of the normal average playing time length of the video to the total time length is usually more than 0.6, and then the specified time length ratio threshold may be set to 0.6 or a value slightly lower than 0.6, which may be flexibly adjusted according to the actual situation. After the specified duration proportion threshold is determined, the calculated proportion may be compared to the specified duration proportion threshold. When the ratio of the average playing time length to the total time length of the target video is lower than the specified time length ratio threshold, it can be determined that the target video is a suspected low-quality video.
In another embodiment, considering that the low-quality video is also less likely to be viewed completely by the user, it may be determined whether the target video is suspected to be the low-quality video according to a ratio of the number of times that the target video is viewed completely to the total number of times that the target video is viewed, which is included in the playing information. Specifically, a specified number of times proportional threshold value for determining whether the ratio is within the normal range may also be set in advance. In this way, when the ratio of the number of times that the target video is completely viewed to the total number of times that the target video is viewed is lower than a specified number of times proportional threshold, the target video can be determined to be a suspected low-quality video.
In another embodiment, it may be further determined whether the target video is a suspected low-quality video from the number of times the target video is reported. Specifically, when the reported times of the target video are higher than a specified reported times threshold, the target video may be determined to be a suspected low-quality video. The specified reporting time threshold value can be obtained by statistics according to the reported times of the normal videos in advance. For example, if the average number of times that a normal video is reported is 10.4 times, the threshold value of the specified number of times of reporting may be set to 11 times or higher according to actual conditions.
Of course, the above steps of determining the pseudo-low quality video according to the data in the playing information are only limited examples of the present application, and in an actual application scenario, the determination may be performed according to more data, and the technical solution of the present application is not limited to the above-mentioned cases.
S3: after the target video is determined to be a suspected low-quality video, at least one of audio information and picture information of the target video is identified to judge whether the target video has specified quality characteristics.
In this embodiment, after determining that the target video is a low quality video, the actual content of the target video may be further determined. In particular, the actual content of the target video may comprise audio information and picture information, and may then be further identified for at least one of them. The purpose of identifying the audio information and/or the picture information is to identify whether or not content causing a video quality degradation is contained therein. In this embodiment, these contents that cause the video quality to be degraded may be collectively referred to as a specified quality characteristic. Therefore, after the audio information and/or the picture information of the target video are identified, whether the specified quality characteristics exist in the target video can be judged.
In this embodiment, the audio information may include audio information such as background sounds in video, monologs of characters, and dialogue. In low quality video, voice information is often broadcast mechanically using machine audio. For example, in some videos, a pre-prepared document may be mechanically read aloud by way of machine-synthesized audio to constitute audio information for the video. If there are a large number of such machine sounds in the video, a poor viewing experience is often created for the user. Therefore, in one embodiment of the present application, the recognition can be performed for a machine sound in the audio information, and the machine sound can be used as a specified quality feature to be recognized. Specifically, when audio information is identified, the audio information may be divided into a plurality of audio frames, and audio feature vectors of the audio frames may be extracted. Specifically, the audio frame may be divided by a time window, and a window length of the time window may be flexibly set. The time window may be shifted in the time axis of the audio information, and the audio information within the time window may be treated as one audio frame at a time. After dividing the audio information into a plurality of audio frames, audio feature vectors of the respective audio frames may be extracted. The audio feature vector may be obtained by performing a fourier transform on the data of the audio frame. After fourier transformation, a series of discrete values may be obtained, which may be used as vector elements to construct the audio feature vector. Certainly, in practical application, the number of discrete values after fourier transform is large, in order to reduce the dimensionality of the audio feature vector, the discrete values after fourier transform may be grouped, and each group obtains one energy value by calculating the energy value. Finally, the obtained energy values can be formed into the audio feature vector, so that the dimensionality of the audio feature vector can be greatly reduced.
In this embodiment, the audio feature vector of each audio frame can be obtained by the above method, and thus, based on the extracted audio feature vector, a feature sequence of the audio information can be constructed. Specifically, the audio feature vectors of the respective audio frames may be sequentially arranged, thereby constituting the feature sequence. That is, each sequence value in the feature sequence is actually an audio feature vector. Since the feature sequence is extracted based on the audio information, the feature sequence may characterize the audio information. Similar audio information, the similarity between the obtained feature sequences is often higher. In the present embodiment, a speech recognition model for recognizing a machine sound may be trained in advance by a machine learning method. Specifically, a large amount of audio information composed of machine sounds may be prepared in advance, and then feature sequences of these audio information may be extracted separately and the extracted feature sequences may be input into the speech recognition model. The speech recognition model can be regarded as a neural network, and various neurons are arranged in the neural network and can have respective operation parameters. After the input feature sequence is calculated through the neural network, a prediction probability value can be obtained, and the prediction probability value can represent the probability that the input feature sequence is used as the feature sequence of the machine sound. Thus, the predicted result can be compared with the actual result to find the difference. The parameters in the speech recognition model can be corrected by the difference, so that when the characteristic sequence corresponding to the machine speech is input again, the predicted result is consistent with the actual result. Therefore, after a large amount of audio information of the machine sound is trained, the speech recognition model can accurately distinguish whether the input characteristic sequence is the characteristic sequence of the machine sound. Therefore, in this embodiment, the constructed feature sequence may be input into the preset speech recognition model, so as to obtain a prediction probability value corresponding to the feature sequence. When the predicted probability value is larger than or equal to a specified probability threshold value, the target video can be judged to have specified quality characteristics representing machine sound. The specified probability threshold may be obtained by statistics according to a training result of a preset speech recognition model in a training phase, and if the training phase indicates that the prediction probability values corresponding to the feature sequences of the machine sound are all above 0.8, a value of 0.8 or slightly lower may be used as the specified probability threshold.
In one embodiment, advertisement information present in the picture information of the target video may also be identified. For example, information such as advertisement links and promotional characters in the screen information may be identified. Thus, the specified quality feature may be advertising information. Specifically, in the present embodiment, the Character information included in the screen information may be recognized in each frame of the image by a Character Recognition method such as Optical Character Recognition (OCR). After the text information is obtained, the text information may be further identified in order to distinguish the advertisement information from normal subtitles or special effect text. Specifically, the word information may be segmented to obtain at least one vocabulary corresponding to the word information. When the word segmentation is carried out, a preset vocabulary library can be adopted to identify the vocabulary in the character information, so that a plurality of vocabularies in the character information can be identified and obtained. In practical application, various word segmenters can be adopted to segment words of the text information. The word segmenter may be, for example, a friso word segmenter, a Jcseg word segmenter, an MMSEG4J word segmenter, or the like. Furthermore, in order to improve the accuracy of identifying the advertisement information, a word bank of the word segmentation device can be constructed based on the common advertisement word assembly, so that the result output by the word segmentation device can better accord with the language habit of the advertisement words. In this embodiment, after each vocabulary in the text information is obtained, the vocabulary obtained by word segmentation may be matched in a preset vocabulary library. Various advertisement commonly used vocabularies can be gathered in the preset vocabulary library, so that if the ratio of the number of the matched vocabularies in the total number of the vocabularies obtained by word segmentation is larger than or equal to a specified ratio threshold value, the advertisement vocabularies contained in the character information are quite large, and at the moment, specified quality characteristics representing the advertisement information can be judged to exist in the target video.
In one embodiment, black screen frames occurring in the target video may also be identified. Specifically, when identifying a black screen frame, the identification may be performed by a gray value of the picture frame. If the average gray value of the picture frame is higher, it indicates that the picture frame is darker, and the probability of being a black screen frame is higher. Therefore, in the present embodiment, the average grayscale value of the current frame picture in the picture information can be calculated. Specifically, the gray values of the pixels in the current frame picture may be counted, and then the average value of the gray values is obtained, so as to obtain the average gray value. When the average gray value is greater than or equal to a specified gray threshold value, the target video can be judged to have specified quality characteristics representing the black screen frame. Wherein the specified gray threshold value may be determined according to an average gray value of picture frames in the normal video. The method of determination is similar to that set forth in the above embodiments and will not be described here again.
In one embodiment, still pictures in the target video may also be identified. In some videos, there may be a large number of still pictures, which may also affect the viewing experience of the viewer. Specifically, the similarity between two adjacent frames of pictures in the picture information may be calculated, and the higher the similarity is, the greater the possibility that the two adjacent frames of pictures are still pictures is. When calculating the similarity between two adjacent frames, the feature vector of each frame can be extracted separately. The vector elements in the feature vector may be gray values of each pixel in the picture. For example, each frame of picture in the target video includes 1000 pixels, the gray values of the 1000 pixels can be determined respectively, and then the arrangement order of the 1000 pixels in the picture can be used as the arrangement order of the gray values, so as to obtain the feature vector formed by the gray values. The similarity between two adjacent frames can be obtained by calculating the distance between two feature vectors in the vector space. The closer the distance, the greater the similarity. When the calculated similarity is greater than or equal to two frames of pictures with a specified similarity threshold, the target video can be judged to have specified quality characteristics representing the still pictures.
S5: and acquiring a judgment threshold value associated with the specified quality feature, and judging whether the specified quality feature belongs to an abnormal quality feature or not based on the judgment threshold value.
In this embodiment, after the specified quality feature existing in the target video is identified, it cannot be directly determined that the target video is a low-quality video, because if the specified quality feature appears in the target video with a low frequency, the quality of the video is not seriously affected. In order to accurately assess the quality of the video, different decision thresholds may be set for different specified quality characteristics in this embodiment. The specified quality characteristic and the associated determination threshold value may be stored in the server in advance, and when a certain specified quality characteristic needs to be determined, the server may read the determination threshold value associated with the specified quality characteristic.
In this embodiment, the determination threshold may be obtained based on data analysis of a normal video. The decision threshold may be a threshold value that measures the normal quality feature and the abnormal quality feature. Therefore, based on the determination threshold, it is possible to determine whether or not the specified quality feature belongs to an abnormal quality feature.
In particular, for a given quality feature of a machine sound, the decision threshold associated with the given quality feature may be an upper limit value of a time period during which the given quality feature characterizing the machine sound appears in the target video. For example, if the decision threshold indicates that only machine sounds within 20 seconds are allowed to appear in a video, then 20 seconds may be used as the duration upper limit. In this way, when the total duration of the specified quality feature appearing in the target video is greater than the duration upper limit value, the specified quality feature can be determined to be an abnormal quality feature.
In another embodiment, for a given quality feature of the advertising information, the decision threshold associated with the given quality feature may be an upper limit on the number of times the given quality feature appears in the target video. For example, the decision threshold is limited to only allowing 3 times of occurrence of advertisement information in one video, and then 3 times may be used as the above-mentioned upper limit value of times. Then, when the total number of times of occurrence of the specified quality feature representing the advertisement information in the target video is greater than the number upper limit value, the specified quality feature can be determined to be an abnormal quality feature.
In another embodiment, for a given quality feature of a black screen, the decision threshold associated with the given quality feature may be an upper limit on the number of frames in the target video in which the given quality feature appears. For example, if the decision threshold is limited to only allowing 10 frames of black screen to appear in one video, 10 frames may be used as the frame number upper limit value. Then, the total number of frames of the picture frame whose average gray value is greater than or equal to the specified gray threshold value in the target video may be counted, and when the counted total number of frames is greater than the frame number upper limit value, the specified quality feature may be determined to be an abnormal quality feature.
In another embodiment, for a given quality feature of a still image, the decision threshold associated with the given quality feature may also be an upper limit on the number of frames in which the given quality feature appears in the target video. For example, if the determination threshold is limited to only 10 still images allowed to appear in one video, 10 frames may be used as the frame number upper limit value. Then, two frames of pictures with similarity greater than or equal to the specified similarity threshold may be marked as target pictures, and the total number of frames of the target pictures in the picture information of the target video may be counted. And when the total frame number is greater than the frame number upper limit value, judging the specified quality characteristic as an abnormal quality characteristic.
S7: determining whether the target video is a low-quality video based on the determination result for the specified quality feature.
In this embodiment, after the specified quality features in the target video are determined, the total number of abnormal quality features existing in the target video may be counted. In this way, when the total number of statistics is greater than or equal to a specified number threshold, the target video may be determined to be a low quality video. The specified number threshold may be adjusted according to actual conditions, and may be set relatively small if the quality check is strict.
In one embodiment of the present application, considering that the duration of the entire video may be relatively long, it may take a considerable amount of resources and a considerable amount of time to identify the audio information and the picture information of the entire video. In this embodiment, the playing time interval to be identified may be determined by combining the operation behavior data of the user when watching the target video. Specifically, the operation behavior data may characterize an operation behavior performed on a playing page of a target video when a user views the target video. In practical application, the operation behavior data of the user can be acquired in a buried point mode. Specifically, part of the controls in the page of the video playing website may be bound with the program code in advance. When the control is triggered by the user, the bound program code can be automatically executed, so that the current operation behavior of the user is recorded. Wherein the control may be a page element that is capable of interacting with a user. For example, the widget may be a play button of a video, a draggable button on a progress bar, a hyperlink in a page, or the like. For example, a program code for acquiring a click behavior of a user may be pre-bound to a play key of a video, so that when the play key is clicked by the user, the program code may be automatically executed, thereby recording the click behavior of this time and recording a time node at which the click behavior occurs. The recorded information can be automatically sent to a server of the video playing website by the terminal equipment used by the target user. Thus, the information sent to the server can be used as the operation behavior data of the user.
In this embodiment, after the operation behavior data is obtained, a playing time interval corresponding to when the target video is terminated may be determined according to the operation behavior data. Specifically, when the user closes the target video or jumps to another page from the page of the target video, the corresponding playing time node at this time can be recorded in a point-burying manner. Then, a certain duration range may be selected to form the play duration interval with the play time node as a center. For example, if the user closes the target video when the target video is played for 3 minutes and 10 seconds, the server may automatically use the playing time interval of 3 minutes to 3 minutes and 20 seconds as the time interval to be identified. Subsequently, at least one of the audio information and the picture information in the playing time interval can be identified in the above manner. In this way, considerable computational resources and computational time can be saved.
In another embodiment, a time node with a possible quality defect may be obtained for the comment information of the user in the comment area or the barrage area. Specifically, user comment information for the target video may be acquired, and words used for characterizing the specified quality features may be identified from the user comment information, and time nodes associated with the words may be extracted from the user comment information. The words used to characterize the specified quality feature may be, for example, words such as "black screen", "noise", "still". These words can characterize the corresponding quality defects. For example, a user comment "why is the picture stuck until 3 minutes and 10 seconds? "still" is recognized from the comment information, and the time node associated with the word is "3 minutes 10 seconds" through semantic analysis, which indicates that the target video has a quality defect around 3 minutes 10 seconds. In this way, a play-out time interval including the time node can be determined, and at least one of audio information and picture information within the play-out time interval is identified. When the play time interval is determined, a certain time range may be selected to form the play time interval, with a time node identified from the comment information as a center. For example, a play time period of 3 minutes to 3 minutes and 20 seconds may be used as the time period for identifying the specified quality feature.
Referring to fig. 3, the present application further provides a video quality recognition system, which includes a memory and a processor, wherein the memory stores a computer program, and the computer program, when executed by the processor, implements the following steps:
s1: acquiring playing information of a target video, and judging whether the target video is a suspected low-quality video or not based on the playing information;
s3: after the target video is determined to be a suspected low-quality video, identifying at least one of audio information and picture information of the target video to judge whether specified quality characteristics exist in the target video;
s5: acquiring a judgment threshold value associated with the specified quality feature, and judging whether the specified quality feature belongs to an abnormal quality feature or not based on the judgment threshold value;
s7: determining whether the target video is a low-quality video based on the determination result for the specified quality feature.
In this embodiment, the playing information includes a ratio of the number of times that the target video is completely viewed to the total number of times that the target video is viewed; accordingly, the computer program, when executed by the processor, further implements the steps of:
and when the ratio of the number of times that the target video is completely watched to the total number of times that the target video is watched is lower than a specified number of times proportional threshold, determining that the target video is a suspected low-quality video.
In this embodiment, the computer program, when executed by the processor, further implements the steps of:
dividing the audio information into a plurality of audio frames, and extracting audio feature vectors of the audio frames;
constructing a feature sequence of the audio information based on the extracted audio feature vector;
inputting the constructed characteristic sequence into a preset voice recognition model to obtain a prediction probability value corresponding to the characteristic sequence;
and when the prediction probability value is greater than or equal to a specified probability threshold, judging that the specified quality characteristics exist in the target video.
In this embodiment, the computer program, when executed by the processor, further implements the steps of:
acquiring operation behavior data when a user watches the target video, and determining a corresponding playing time interval when the target video is terminated according to the operation behavior data;
and identifying at least one of audio information and picture information in the playing time interval.
In this embodiment, the memory may include a physical device for storing information, and typically, the information is digitized and then stored in a medium using an electrical, magnetic, or optical method. The memory according to this embodiment may further include: devices that store information using electrical energy, such as RAM, ROM, etc.; devices that store information using magnetic energy, such as hard disks, floppy disks, tapes, core memories, bubble memories, usb disks; devices for storing information optically, such as CDs or DVDs. Of course, there are other ways of memory, such as quantum memory, graphene memory, and so forth.
In this embodiment, the processor may be implemented in any suitable manner. For example, the processor may take the form of, for example, a microprocessor or processor and a computer-readable medium that stores computer-readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, an Application Specific Integrated Circuit (ASIC), a programmable logic controller, an embedded microcontroller, and so forth.
The specific functions implemented by the memory and the processor of the video quality recognition system provided in the embodiments of the present description may be explained in comparison with the foregoing embodiments in the present description, and can achieve the technical effects of the foregoing embodiments, and thus, will not be described herein again.
Therefore, according to the technical scheme provided by the application, when the quality of the target video is identified, whether the target video is suspected low-quality video can be judged preliminarily according to the playing information of the target video. The playing information may be obtained by the video playing platform by counting the playing condition of the target video. For example, the playing information may be information such as the number of times the video is completely played, or the average viewing time of the video. After the target video is determined to be a suspected low-quality video, the audio information and/or the picture information of the target video can be further identified. During identification, the target video can be detected respectively according to several factors which are common at present and cause video quality reduction. These factors that cause degradation of video quality may be used as the specified quality characteristics. For example, the specified quality feature may be machine audio, advertising information, a black screen, a still image, etc. in the target video. After identifying the presence of the specified quality feature in the target video, different decision thresholds may be provided for different specified quality features. In this way, a determination threshold value associated with the specified quality feature can be acquired, and whether the specified quality feature exceeds a normal tolerance range is determined according to the determination threshold value, so that whether the specified quality feature is an abnormal quality feature is determined. Finally, in conjunction with the determination results for each specified quality feature, it can be determined whether the target video is a low-quality video. Therefore, the technical scheme provided by the application can detect whether the abnormal quality characteristics exist in the target video from multiple aspects, so that the quality of the target video can be more comprehensively evaluated.
In the 90 s of the 20 th century, improvements in a technology could clearly distinguish between improvements in hardware (e.g., improvements in circuit structures such as diodes, transistors, switches, etc.) and improvements in software (improvements in process flow). However, as technology advances, many of today's process flow improvements have been seen as direct improvements in hardware circuit architecture. Designers almost always obtain the corresponding hardware circuit structure by programming an improved method flow into the hardware circuit. Thus, it cannot be said that an improvement in the process flow cannot be realized by hardware physical modules. For example, a Programmable Logic Device (PLD), such as a Field Programmable Gate Array (FPGA), is an integrated circuit whose Logic functions are determined by programming the Device by a user. A digital system is "integrated" on a PLD by the designer's own programming without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Furthermore, nowadays, instead of manually making an Integrated Circuit chip, such Programming is often implemented by "logic compiler" software, which is similar to a software compiler used in program development and writing, but the original code before compiling is also written by a specific Programming Language, which is called Hardware Description Language (HDL), and HDL is not only one but many, such as abel (advanced Boolean Expression Language), ahdl (alternate Language Description Language), traffic, pl (core unified Programming Language), HDCal, JHDL (Java Hardware Description Language), langue, Lola, HDL, laspam, hardbyscript Description Language (vhr Description Language), and the like, which are currently used by Hardware compiler-software (Hardware Description Language-software). It will also be apparent to those skilled in the art that hardware circuitry that implements the logical method flows can be readily obtained by merely slightly programming the method flows into an integrated circuit using the hardware description languages described above.
It is also known to the person skilled in the art that, in addition to implementing the video quality recognition system in the form of pure computer-readable program code, it is entirely possible to logically program the method steps such that the video quality recognition system implements the same functions in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Such a video quality recognition system can therefore be considered as a hardware component, and the means included therein for implementing the various functions can also be considered as a structure within the hardware component. Or even means for performing the functions may be regarded as being both a software module for performing the method and a structure within a hardware component.
From the above description of the embodiments, it is clear to those skilled in the art that the present application can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present application may be essentially or partially implemented in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the embodiments or some parts of the embodiments of the present application.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments can be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, embodiments of the identification system for video quality may be explained with reference to the introduction of embodiments of the method described above.
The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
Although the present application has been described in terms of embodiments, those of ordinary skill in the art will recognize that there are numerous variations and permutations of the present application without departing from the spirit of the application, and it is intended that the appended claims encompass such variations and permutations without departing from the spirit of the application.

Claims (18)

1. A method for identifying video quality, the method comprising:
acquiring playing information of a target video, and judging whether the target video is a suspected low-quality video or not based on the playing information; the playing information is information recorded in the playing process of the target video;
after the target video is determined to be a suspected low-quality video, identifying at least one of audio information and picture information of the target video to judge whether specified quality characteristics exist in the target video; wherein the specified quality features represent content that results in a degradation of video quality;
acquiring a judgment threshold value associated with the specified quality feature, and judging whether the specified quality feature belongs to an abnormal quality feature or not based on the judgment threshold value;
determining whether the target video is a low-quality video based on a determination result for the specified quality feature; wherein, include: counting the total number of abnormal quality features existing in the target video, and when the counted total number is larger than or equal to a specified number threshold, judging that the target video is a low-quality video.
2. The method of claim 1, wherein the playback information comprises a ratio of an average playback time duration to a total playback time duration of the target video; accordingly, determining whether the target video is a suspected low-quality video includes:
and when the ratio of the average playing time length to the total time length of the target video is lower than a specified time length ratio threshold value, judging that the target video is a suspected low-quality video.
3. The method of claim 1, wherein the playback information comprises a ratio of a number of times the target video is viewed in its entirety to a total number of times the target video is viewed; accordingly, determining whether the target video is a suspected low-quality video includes:
and when the ratio of the number of times that the target video is completely watched to the total number of times that the target video is watched is lower than a specified number of times proportional threshold, determining that the target video is a suspected low-quality video.
4. The method of claim 1, wherein the playback information includes a number of times the target video was reported; accordingly, determining whether the target video is a suspected low-quality video includes:
and when the reported times of the target video are higher than a specified reported time threshold value, judging that the target video is a suspected low-quality video.
5. The method of claim 1, wherein determining whether the target video has a specified quality characteristic comprises:
dividing the audio information into a plurality of audio frames, and extracting audio feature vectors of the audio frames;
constructing a feature sequence of the audio information based on the extracted audio feature vector;
inputting the constructed characteristic sequence into a preset voice recognition model to obtain a prediction probability value corresponding to the characteristic sequence;
and when the prediction probability value is greater than or equal to a specified probability threshold, judging that the specified quality characteristics exist in the target video.
6. The method of claim 5, wherein the decision threshold associated with the specified quality feature is an upper limit of a duration of time that the specified quality feature appears in the target video; accordingly, determining whether the specified quality feature belongs to an abnormal quality feature comprises:
and when the total duration of the specified quality features appearing in the target video is greater than the duration upper limit value, judging that the specified quality features are abnormal quality features.
7. The method of claim 1, wherein determining whether the target video has a specified quality characteristic comprises:
recognizing character information contained in the picture information, and performing word segmentation on the character information to obtain at least one word corresponding to the character information;
and matching the words obtained by word segmentation in a preset word library, and judging that the designated quality characteristics exist in the target video if the ratio of the number of the words obtained by word segmentation to the total number of the words obtained by word segmentation is greater than or equal to a designated ratio threshold.
8. The method of claim 7, wherein the decision threshold associated with the specified quality feature is an upper limit of the number of occurrences of the specified quality feature in the target video; accordingly, determining whether the specified quality feature belongs to an abnormal quality feature comprises:
and when the total times of the specified quality features appearing in the target video is greater than the upper limit value of the times, judging that the specified quality features are abnormal quality features.
9. The method of claim 1, wherein determining whether the target video has a specified quality characteristic comprises:
and calculating the average gray value of the current frame in the frame information, and judging that the target video has the specified quality characteristic when the average gray value is greater than or equal to a specified gray threshold.
10. The method of claim 9, wherein the decision threshold associated with the specified quality feature is an upper limit on a number of frames in which the specified quality feature appears in the target video; accordingly, determining whether the specified quality feature belongs to an abnormal quality feature comprises:
and counting the total frame number of the picture frames with the average gray value larger than or equal to the specified gray threshold value in the target video, and judging that the specified quality feature is an abnormal quality feature when the counted total frame number is larger than the frame number upper limit value.
11. The method of claim 1, wherein determining whether the target video has a specified quality characteristic comprises:
and calculating the similarity between two adjacent frames of pictures in the picture information, and judging that the specified quality characteristics exist in the target video when the calculated similarity is greater than or equal to the two frames of pictures with specified similarity threshold values.
12. The method of claim 11, wherein the decision threshold associated with the specified quality feature is an upper limit on a number of frames in which the specified quality feature appears in the target video; accordingly, determining whether the specified quality feature belongs to an abnormal quality feature comprises:
marking two frames of pictures with the similarity greater than or equal to the specified similarity threshold as target pictures, and counting the total frame number of the target pictures in the picture information of the target video;
and when the total frame number is greater than the frame number upper limit value, judging the specified quality characteristic as an abnormal quality characteristic.
13. The method of claim 1, wherein identifying at least one of audio information and picture information of the target video comprises:
acquiring operation behavior data when a user watches the target video, and determining a corresponding playing time interval when the target video is terminated according to the operation behavior data;
and identifying at least one of audio information and picture information in the playing time interval.
14. The method of claim 1, wherein identifying at least one of audio information and picture information of the target video comprises:
acquiring user comment information aiming at the target video, identifying words used for representing the specified quality characteristics from the user comment information, and extracting time nodes associated with the words from the user comment information;
and determining a playing time interval containing the time node, and identifying at least one of audio information and picture information in the playing time interval.
15. A video quality recognition system, the system comprising a memory and a processor, the memory having stored therein a computer program that, when executed by the processor, performs the steps of:
acquiring playing information of a target video, and judging whether the target video is a suspected low-quality video or not based on the playing information; wherein the playing information is the information recorded in the playing process of the target video
After the target video is determined to be a suspected low-quality video, identifying at least one of audio information and picture information of the target video to judge whether specified quality characteristics exist in the target video; wherein the specified quality features represent content that results in a degradation of video quality;
acquiring a judgment threshold value associated with the specified quality feature, and judging whether the specified quality feature belongs to an abnormal quality feature or not based on the judgment threshold value;
determining whether the target video is a low-quality video based on a determination result for the specified quality feature; wherein, include: counting the total number of abnormal quality features existing in the target video, and when the counted total number is larger than or equal to a specified number threshold, judging that the target video is a low-quality video.
16. The system of claim 15, wherein the playback information comprises a ratio of a number of times the target video is viewed in its entirety to a total number of times the target video is viewed; accordingly, the computer program, when executed by the processor, further implements the steps of:
and when the ratio of the number of times that the target video is completely watched to the total number of times that the target video is watched is lower than a specified number of times proportional threshold, determining that the target video is a suspected low-quality video.
17. The system of claim 15, wherein the computer program, when executed by the processor, further performs the steps of:
dividing the audio information into a plurality of audio frames, and extracting audio feature vectors of the audio frames;
constructing a feature sequence of the audio information based on the extracted audio feature vector;
inputting the constructed characteristic sequence into a preset voice recognition model to obtain a prediction probability value corresponding to the characteristic sequence;
and when the prediction probability value is greater than or equal to a specified probability threshold, judging that the specified quality characteristics exist in the target video.
18. The system of claim 15, wherein the computer program, when executed by the processor, further performs the steps of:
acquiring operation behavior data when a user watches the target video, and determining a corresponding playing time interval when the target video is terminated according to the operation behavior data;
and identifying at least one of audio information and picture information in the playing time interval.
CN201711330513.9A 2017-12-13 2017-12-13 Video quality identification method and system Active CN109922334B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711330513.9A CN109922334B (en) 2017-12-13 2017-12-13 Video quality identification method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711330513.9A CN109922334B (en) 2017-12-13 2017-12-13 Video quality identification method and system

Publications (2)

Publication Number Publication Date
CN109922334A CN109922334A (en) 2019-06-21
CN109922334B true CN109922334B (en) 2021-11-19

Family

ID=66959009

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711330513.9A Active CN109922334B (en) 2017-12-13 2017-12-13 Video quality identification method and system

Country Status (1)

Country Link
CN (1) CN109922334B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112565814B (en) * 2019-09-26 2022-06-14 腾讯科技(深圳)有限公司 Video distribution method, device, storage medium and server
CN111274449B (en) * 2020-02-18 2023-08-29 腾讯科技(深圳)有限公司 Video playing method, device, electronic equipment and storage medium
CN111611973B (en) * 2020-06-01 2023-05-12 广州市百果园信息技术有限公司 Target user identification method, device and storage medium
CN113836966A (en) * 2020-06-08 2021-12-24 ***通信有限公司研究院 Video detection method, device, equipment and storage medium
CN112911385B (en) * 2021-01-12 2021-12-07 平安科技(深圳)有限公司 Method, device and equipment for extracting picture to be identified and storage medium
CN113573044B (en) * 2021-01-19 2022-12-09 腾讯科技(深圳)有限公司 Video data processing method and device, computer equipment and readable storage medium
CN114359271B (en) * 2022-03-10 2022-06-03 天津市北海通信技术有限公司 Method and device for detecting image playing quality of train display equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104216879A (en) * 2013-05-29 2014-12-17 酷盛(天津)科技有限公司 Video quality excavation system and method
CN105721899A (en) * 2016-03-21 2016-06-29 传线网络科技(上海)有限公司 Video quality scoring method and system
CN107368573A (en) * 2017-07-14 2017-11-21 北京奇艺世纪科技有限公司 Video quality evaluation method and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9325985B2 (en) * 2013-05-28 2016-04-26 Apple Inc. Reference and non-reference video quality evaluation

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104216879A (en) * 2013-05-29 2014-12-17 酷盛(天津)科技有限公司 Video quality excavation system and method
CN105721899A (en) * 2016-03-21 2016-06-29 传线网络科技(上海)有限公司 Video quality scoring method and system
CN107368573A (en) * 2017-07-14 2017-11-21 北京奇艺世纪科技有限公司 Video quality evaluation method and device

Also Published As

Publication number Publication date
CN109922334A (en) 2019-06-21

Similar Documents

Publication Publication Date Title
CN109922334B (en) Video quality identification method and system
Xie et al. Structure analysis of soccer video with domain knowledge and hidden Markov models
US10304458B1 (en) Systems and methods for transcribing videos using speaker identification
US8358837B2 (en) Apparatus and methods for detecting adult videos
CN112753226A (en) Machine learning for identifying and interpreting embedded information card content
BR102016007265A2 (en) MULTIMODAL AND REAL-TIME METHOD FOR SENSITIVE CONTENT FILTERING
US20160337691A1 (en) System and method for detecting streaming of advertisements that occur while streaming a media program
CN109063611B (en) Face recognition result processing method and device based on video semantics
CN111918130A (en) Video cover determining method and device, electronic equipment and storage medium
JP6557592B2 (en) Video scene division apparatus and video scene division program
CN110555136A (en) Video tag generation method and device and computer storage medium
CN110796140B (en) Subtitle detection method and device
Dumont et al. Automatic story segmentation for tv news video using multiple modalities
JP2004520756A (en) Method for segmenting and indexing TV programs using multimedia cues
CN111984821A (en) Method and device for determining dynamic cover of video, storage medium and electronic equipment
CN111061867B (en) Text generation method, equipment, storage medium and device based on quality perception
CN111314732A (en) Method for determining video label, server and storage medium
CN114220057A (en) Video trailer identification method and device, electronic equipment and readable storage medium
Mizutani et al. Commercial detection in heterogeneous video streams using fused multi-modal and temporal features
CN111625661B (en) Audio and video fragment classification method and device
CN108153882A (en) A kind of data processing method and device
Husa et al. Automatic thumbnail selection for soccer videos using machine learning
CN108985244B (en) Television program type identification method and device
Husa et al. HOST-ATS: automatic thumbnail selection with dashboard-controlled ML pipeline and dynamic user survey
KR102504321B1 (en) Apparatus and method for online action detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20200511

Address after: 310052 room 508, floor 5, building 4, No. 699, Wangshang Road, Changhe street, Binjiang District, Hangzhou City, Zhejiang Province

Applicant after: Alibaba (China) Co.,Ltd.

Address before: 100080 Beijing Haidian District city Haidian street A Sinosteel International Plaza No. 8 block 5 layer D

Applicant before: YOUKU INFORMATION TECHNOLOGY (BEIJING) Co.,Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20240618

Address after: 101400 Room 201, 9 Fengxiang East Street, Yangsong Town, Huairou District, Beijing

Patentee after: Youku Culture Technology (Beijing) Co.,Ltd.

Country or region after: China

Address before: 310052 room 508, 5th floor, building 4, No. 699 Wangshang Road, Changhe street, Binjiang District, Hangzhou City, Zhejiang Province

Patentee before: Alibaba (China) Co.,Ltd.

Country or region before: China