CN112528071B

CN112528071B - Video data ordering method and device, computer equipment and storage medium

Info

Publication number: CN112528071B
Application number: CN202011192412.1A
Authority: CN
Inventors: 陈臆淳
Original assignee: Bigo Technology Pte Ltd
Current assignee: Bigo Technology Pte Ltd
Priority date: 2020-10-30
Filing date: 2020-10-30
Publication date: 2024-07-23
Anticipated expiration: 2040-10-30
Also published as: WO2022089467A1; CN112528071A

Abstract

The embodiment of the invention provides a video data ordering method, a video data ordering device, computer equipment and a storage medium, wherein the method comprises the following steps: the method comprises the steps of obtaining a first video sequence to be sent to a client, wherein the first video sequence is provided with a plurality of target video data, traversing the first video sequence, sequentially calculating the similarity between the current target video data and adjacent target video data, taking the video similarity as a reference, generating a break-up coefficient for the current target video data, sequencing the plurality of target video data according to the break-up coefficient, and obtaining a second video sequence, on one hand, realizing automatic break-up, avoiding manual break-up, avoiding consuming a large amount of human resources, reducing break-up cost, and on the other hand, ensuring the matching degree of the break-up coefficient and the target video data, thereby ensuring the accuracy of the break-up coefficient, and avoiding the problems of breaking up irrelevant target video data or centralizing and displaying similar target video data.

Description

Video data ordering method and device, computer equipment and storage medium

Technical Field

The embodiment of the invention relates to the technical field of multimedia, in particular to a video data ordering method, a video data ordering device, computer equipment and a storage medium.

Background

Currently, video platforms generally recall video data based on recent hotspots and user preferences, and if the characteristics of the video data are similar, the video data with repeated content may be recalled, and in order to reduce the repeatability of the video data, before the video data is sent to the user, the video data is generally scattered, i.e. the video data is reordered globally, so that various video data are more uniformly distributed.

Before scattering, the category of the video data is marked by using manual definition and non-manual definition, and when scattering, video data of different categories are scattered as far as possible, so that the concentrated display of video data of the same category is avoided.

In the manual category definition mode, a category standard is manually defined in advance, a category is manually marked on video data, a certain proportion of video data is covered, the video data is used as supervision, and a classification model is trained to mark the category on the video data. In a manner of non-manually defining the categories, the categories are automatically labeled on the video data by using an unsupervised learning clustering algorithm.

However, for the manual category definition mode, a large amount of video data needs to be browsed manually to define accurate categories, and the video data is manually watched and labeled with the categories, and both operations consume a large amount of time and are high in cost. For the manner of non-manual definition of the category, the accuracy of the generated category is low, if the category is too thick, irrelevant video data may be scattered, and if the category is too thin, similar video data may be displayed in a concentrated manner.

Disclosure of Invention

The embodiment of the invention provides a video data ordering method, a video data ordering device, computer equipment and a storage medium, which are used for solving the problem of how to consider the cost and accuracy of breaking up video data.

In a first aspect, an embodiment of the present invention provides a method for ordering video data, including:

acquiring a first video sequence to be sent to a client, wherein the first video sequence is provided with a plurality of target video data;

Traversing the first video sequence, and sequentially calculating the similarity between the current target video data and the adjacent target video data to be used as video similarity;

Generating a scattering coefficient for the current target video data by taking the video similarity as a reference;

And sequencing the plurality of target video data according to the scattering coefficients to obtain a second video sequence.

In a second aspect, an embodiment of the present invention further provides a device for sorting video data, including:

a first video sequence acquisition module, configured to acquire a first video sequence to be sent to a client, where the first video sequence has a plurality of target video data;

The video similarity calculation module is used for traversing the first video sequence, and sequentially calculating the similarity between the current target video data and the adjacent target video data to be used as video similarity;

The break-up coefficient calculation module is used for generating a break-up coefficient for the current target video data by taking the video similarity as a reference;

and the second video sequence obtaining module is used for sequencing the plurality of target video data according to the scattering coefficients to obtain a second video sequence.

In a third aspect, an embodiment of the present invention further provides a computer apparatus, including:

One or more processors;

A memory for storing one or more programs,

The one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of ordering video data as described in the first aspect.

In a fourth aspect, embodiments of the present invention further provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method for ordering video data as described in the first aspect.

In this embodiment, a first video sequence to be sent to a client is obtained, the first video sequence has a plurality of target video data, the first video sequence is traversed, similarity between current target video data and adjacent target video data is calculated in sequence, the similarity is used as video similarity, a break-up coefficient is generated for the current target video data by taking the video similarity as a reference, the plurality of target video data are ordered according to the break-up coefficient, and a second video sequence is obtained.

Drawings

Fig. 1 is a flowchart of a video data sorting method according to a first embodiment of the present invention;

Fig. 2 is a flowchart illustrating a method for transmitting video data according to a first embodiment of the present invention;

Fig. 3 is a flowchart of a video data sorting method according to a second embodiment of the present invention;

fig. 4 is a schematic flow chart of a breaking operation according to a second embodiment of the present invention;

Fig. 5A to 5D are exemplary diagrams of a scattering operation according to a second embodiment of the present invention;

fig. 6 is a schematic structural diagram of a video data sorting device according to a third embodiment of the present invention;

Fig. 7 is a schematic structural diagram of a computer device according to a fourth embodiment of the present invention.

Detailed Description

The invention is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting thereof. It should be further noted that, for convenience of description, only some, but not all of the structures related to the present invention are shown in the drawings.

Example 1

Fig. 1 is a flowchart of a video data sorting method according to an embodiment of the present invention, where the method is applicable to a case of breaking up based on similarity of recalled video data, and the method may be performed by a video data sorting device, and the video data sorting device may be implemented by software and/or hardware, and may be configured in a computer device, for example, a server, a workstation, a personal computer, and so on, and specifically includes the following steps:

step 101, a first video sequence to be sent to a client is acquired.

In the video platform, a large amount of video data is stored to form a video library, when the video platform receives a request of a client, part of the video data can be extracted from the video library to serve as target video data, and the target video data are sequenced to obtain a first video sequence, namely the first video sequence is provided with a plurality of target video data, and the target video data are arranged according to a specified sequence.

It should be noted that, the request of the client may be actively triggered by a user, for example, the user inputs a keyword at the client and requests the video platform to search for video data related to the keyword, the user pulls down a list of existing video data to request the video platform to refresh the video data, etc., or the request of the client may not be actively triggered by the user, for example, the client requests the video platform to push high-quality video data when displaying a homepage, the client requests the video platform to push related video data before the current video data ends playing, etc., which is not limited in this embodiment.

In a specific implementation, as shown in FIG. 2, the target video data is extracted from the video library, typically by:

1. recall back

Wherein recall of video data from the video library may shrink the set of selectable video data.

Further, for different business scenarios, different recall policies may be used to recall portions of video data from the video library according to different business requirements (e.g., recall high quality (non-personalized) video data, recall video data that meets user personalized requirements, etc.).

In one example, recall policies include, but are not limited to:

An online recall (recall of video data of a host user on line (i.e., live program)), a subscription recall (recall of video data of a column subscribed by the user (e.g., a certain game, a restaurant, etc.), a homonational recall (recall of video data of the same country to which the user belongs), a homolinguistic recall (recall of video data of the same language as used by the user), a collaborative filtering recall (recall of video data using a collaborative filtering algorithm), a preference recall (recall of video data of the same preference as the user), a similar recall (recall of other video data similar to the recalled video data).

2. Coarse row

The number of the recalled video data is large, generally reaches tens of thousands, and an algorithm used for fine ranking is possibly complex, in order to improve the ranking speed, a coarse ranking link can be added between the recall and the fine ranking, the features of a small number of users and video data are loaded into a simple ranking model, such as an LR (Logistic Regression ) model, a GBDT (Gradient Boost Decision Tree, gradient lifting tree) model, and the like, the recalled video data are roughly ranked, a part of video data with higher ranking is selected, and the number of the fine ranking video data is further reduced to thousands or hundreds of orders under the premise of ensuring certain accuracy.

It should be noted that, according to the characteristics of the service scenario, the coarse rank is often optional, that is, the coarse rank may be applied, or the recall may be directly skipped to the fine rank, which is not limited in this embodiment.

3. Fine discharging device

Through the characteristics of more users and video data, the video data is loaded into a more complex sorting model, for example, CNN (Convolutional Neural Networks, convolutional neural network), RNN (Recurrent Neural Network, cyclic neural network) and the like, coarse-row video data are precisely sorted, partial video data with higher sorting is selected, sorting accuracy is improved as much as possible, the number of video data sent to a client is further reduced, and the number of video data can be generally reduced to be in the order of hundreds or tens.

In this embodiment, for the video data extracted after the fine-pitch, which may be referred to as a first video sequence, after the target video data in the first video sequence is scattered (also referred to as rearranged), the number (e.g., hundred and ten) of the target video data is maintained and sent to the client.

Step 102, traversing the first video sequence, and sequentially calculating the similarity between the current target video data and the adjacent target video data to serve as video similarity.

And traversing the first video sequence when scattering, sequentially determining each target video data as current target video data according to the sequence of each target video data in the first video sequence, and determining target video data adjacent to the current video data.

At this time, the similarity presented as a whole between the current target video data and each adjacent target video data may be sequentially calculated as the video similarity based on the characteristics of PSNR (PEAK SIGNAL to Noise Ratio), SSIM (structural similarity index), and the like.

Step 103, using the video similarity as a reference, generating a break-up coefficient for the current target video data.

The distribution situation of the video similarity between the current target video data and each adjacent target video data can be reflected in the distribution situation of the target video data belonging to the same category within a certain range, so that the embodiment can adapt to the distribution situation of the video similarity, and generate a proper break-up coefficient for the current target video data, wherein the break-up coefficient represents the strength of break-up (rearrangement), namely, the higher the break-up coefficient is, the larger the break-up probability is, and conversely, the lower the break-up coefficient is, the smaller the break-up probability is.

And 104, sorting the plurality of target video data according to the scattering coefficients to obtain a second video sequence.

In this embodiment, the scattering coefficient is used as a basis for scattering the target video data, and the plurality of target video data in the first video sequence is reordered to obtain the second video sequence.

In general, the target video data in the second video sequence is identical to the target video data in the first video sequence, and the order of the target video data in the second video sequence is different from the order of the target video data in the first video sequence.

In a specific implementation, the scattering coefficients of all the target video data may be evaluated in whole, the whole target video data may be reordered to obtain the second video sequence, or the scattering coefficients of part of the target video data may be evaluated in parallel and the whole target video data may be reordered in parallel to obtain the second video sequence, etc., which is not limited in this embodiment.

After the second video sequence is obtained by reordering, video information (e.g., a cover map, a video name, etc.) of each target video data in the second video sequence may be transmitted to the client, which displays the video information of each target video data in the order of each target video data in the second video sequence.

Example two

Fig. 3 is a flowchart of a video data sorting method according to a second embodiment of the present invention, where the method further refines operations for calculating video similarity, calculating break-up coefficients, and applying break-up coefficients based on the foregoing embodiment, and the method specifically includes the following steps:

Step 301, a first video sequence to be sent to a client is acquired.

As shown in fig. 4, a first video sequence having a plurality of target video data is extracted from a video library.

Step 302, extracting at least two video features from target video data.

In this embodiment, as shown in fig. 4, in order to achieve multiple understanding of the target video data, at least two video features may be extracted from the target video data, so that the video similarity may be expressed in the dimensions of the multiple features.

In one example, the video feature includes a text feature, then in this example, text information associated with the target video data may be extracted, e.g., descriptive information of the target video data, comment information of the target video data, a tag (HashTag) that was carried by the author making the target video data, and so forth.

A word vector model is loaded, e.g., word2vec, gloVe (Global Vectors for Word Representation, global vector of word representation), ELMo (Embeddings from Language Models, word vector based on language model), BERT (Bidirectional Encoder Representation from Transformers, bi-directional encoder representation of transformer), etc.

Text information related to target video data is input into a word vector model to extract characteristics of the text information, and vectors of words in the text information are learned to be used as video characteristics (namely text characteristics).

Taking Skip-Gram in word2vec as an example, skip-Gram predicts context from a large amount of text information related to target video data through given intermediate words, embeds words into vector space so that words with similar meaning have relatively close distance, and video features can be obtained by calculating average vectors of words in the text information.

In one example, the video features include image features that, in the absence of a large amount of annotation data, can be extracted using a pre-training model, which is a deep learning architecture, that trains a particular task through a large amount of data sets in advance, and stores the model architecture and tuned weight values so that the generic features can be learned efficiently.

In this example, a pre-training model, such as Xception, may be loaded, and when applied to learn features of image data, the pre-training model may typically be trained using a pre-generated dataset, and for use as extracted features, the output layer of the pre-training model may be removed, leaving the network available for feature extraction.

The image data in the video features is input into the pre-training model to extract the features of the image data, and the vectors of the image data are learned to serve as the video features (namely, the image features).

Of course, the above video features are merely examples, and other video features may be set according to actual situations when implementing the embodiment of the present invention, for example, video attributes (such as length, author, etc.), features of objects detected in the target video data, features of faces detected in the target video data, audio features, etc., and the embodiment of the present invention is not limited thereto. In addition, in addition to the video features described above, those skilled in the art may also use other video features according to actual needs, which is not limited by the embodiments of the present invention.

Step 303, compressing the video features.

If the video features are extracted through deep learning in advance, the accuracy of the video feature expression can be ensured, but the video features belong to high-latitude features, and besides occupying a large amount of storage space, the burden is also caused on calculating the similarity between target video data.

Therefore, in this embodiment, as shown in fig. 4, the video features may be compressed, the dimensions of the video features may be reduced, the storage space occupied by the video features may be reduced, the burden on calculating the similarity between the target video data may be reduced, and the speed of calculating the similarity between the target video data may be increased.

In one compression mode, the self-encoder Auto-Encoder that has been trained on target video data can be loaded, and the training is aimed at minimizing the difference between the input features and the output features, so that after the video features are compressed, the accuracy of video feature expression can be ensured, and the accuracy of similarity between the target video data can be ensured.

The self-encoder Auto-Encoder is an unsupervised learning algorithm, including encoder Encooder and Decoder.

The video features are input into an encoder Encooder for encoding, compressing the video features into meaningful, low-latitude vectors.

If the encoding is completed, the video features are input into a Decoder for decoding, and the video features are reconstructed.

Of course, the above compression method is merely an example, and other compression methods may be set according to practical situations when implementing the embodiment of the present invention, for example, model compression video features such as PCA (PRINCIPAL COMPONENT ANALYSIS ), t-SNE (t-distributed stochastic neighbor embedding, t-distributed random neighborhood coding), UMAP (uniform manifold approximation and projection, unified manifold approximation and projection), and the like, which are not limited in this embodiment of the present invention. In addition, in addition to the compression modes, those skilled in the art can adopt other compression modes according to actual needs, and the embodiment of the invention is not limited thereto.

In addition, if the video feature belongs to a low latitude feature, the video data may not be compressed, which is not limited in this embodiment.

Generally, as shown in fig. 4, the video features are extracted from the target video data (step 302) and the compressed video features (step 303) occupy more calculation resources and are more time-consuming to calculate, so that the video features can be extracted from the target video data offline (step 302) and the compressed video features (step 303), stored in the feature library, and when the similarity between the target video data is calculated by applying the video features online, the video features of the target video data are extracted from the feature library according to the identification (such as ID) of the target video data.

Of course, in consideration of the situation that the heat of the partial video data is low or sharing is prohibited, it is difficult to push the partial video data to the client as the target video data, and the offline extraction of video features and compression of video features from the full amount of video data wastes computing resources, so that when the video data is selected as the target video data, the video features may be extracted from the target video data online (step 302) and the compression of video features (step 303), which is not limited in this embodiment.

Step 304, sliding a preset window in the first video sequence to sequentially determine the current target video data and the adjacent target video data in the window.

In this embodiment, a window that can slide in the first video sequence may be set, and the size of the window may be set according to service requirements, such as 3,4,5, 6, and so on, which is not limited in this embodiment.

The position of the current target video data is set in the window, for example, when the window size is3, the position of the current target video data is2 nd bit, for example, when the window size is4, the position of the current target video data is3 rd bit or 4 th bit, for example, when the window size is 5, the position of the current target video data is3 rd bit or 4 th bit or 5 th bit, and the like, and other target video data in the window except the current target video data are target video data adjacent to the current target video data.

As shown in fig. 5A, it is assumed that the first video sequence is, in order, target video data 1, target video data 2, target video data 3, target video data 4, target video data 5, target video data 6, target video data 7, and target video data 8, wherein the contents between target video data 1, target video data 4, target video data 7, and target video data 8 are similar, the contents between target video data 2 and target video data 3 are similar, and the contents between target video data 5 and target video data 6 are similar.

As shown in fig. 5B, when sliding, the window with size 4 is set, and includes target video data 1, target video data 2, target video data 3, and target video data 4, where the current target video data is target video data 4, and similar target video data is target video data 1, target video data 2, and target video data 3.

As shown in fig. 5C, in the next sliding, the window includes target video data 2, target video data 3, target video data 4, and target video data 5, the current target video data is target video data 5, and the adjacent target video data is target video data 2, target video data 3, and target video data 4.

Step 305, calculating the similarity between each video feature of the current target video data and each video feature of the adjacent target video data as the feature similarity.

In the present embodiment, as shown in fig. 4, for the same kind of video features, such as text features, image features, and the like, the similarity between such video features of the current target video data and such video features of the adjacent target video data can be calculated as feature similarity, respectively.

In one example, a cosine value cosine similarity between each video feature of the current target video data and each video feature of the adjacent target video data is calculated, and the cosine value cosine similarity is set as a similarity between the video feature of the current target video data and the video feature of the adjacent target video data, to obtain the feature similarity.

Of course, the above manner of calculating the feature similarity is merely an example, and other manners of calculating the feature similarity may be set according to actual situations when implementing the embodiment of the present invention, for example, euclidean distance (Euclidean Distance), manhattan distance (MANHATTAN DISTANCE), mahalanobis distance (Mahalanobis Distance), and the like, which are not limited in this embodiment of the present invention. In addition, in addition to the above-mentioned manner of calculating the feature similarity, those skilled in the art may also adopt other manners of calculating the feature similarity according to actual needs, which is not limited in this embodiment of the present invention.

And 306, fusing the feature similarity into the similarity between the current target video data and the adjacent target video data to obtain the video similarity.

As shown in fig. 4, with respect to the current target video data and the adjacent target video data, feature similarities of video features of all kinds may be referred to, and the similarity between the current target video data and the adjacent target video data may be calculated, thereby obtaining video similarity.

In one embodiment of the present invention, the feature similarity may be fused to the similarity between the current target video data and the adjacent target video data by means of linear fusion, and in this embodiment, step 306 may include the following steps:

Step 3061, weight is configured for various feature similarities.

In this embodiment, a weight may be set for each video feature, and the weight may represent the importance degree of the video feature for the content relevance, that is, the more important the video feature for the content relevance, the higher the weight.

When calculating the similarity between the current target video data and the adjacent target video data, the weight can be configured to the feature similarity corresponding to the video features of the corresponding category.

In order to find out the preferred weight, a check video library may be used for weight optimization, which consists of the first sample video data and the second sample video data, and whether the second sample video data is related to the first sample video data or not is marked manually.

In training the weights, first sample video data and second sample video data may be obtained from a verification video library, wherein the second sample video data is associated with a tag that is used to represent a correlation with the first sample video data.

Characteristics of the first sample video data and the second sample video data are determined as video characteristics, such as text characteristics, image characteristics, and the like, respectively.

Setting an objective function for evaluating the degree to which a tag is matched by applying various video features and weights to perform a search operation for searching for second sample video data related to the first sample video data, for example, configuring weights for various feature similarities, separately calculating a third product between the feature similarities and the weights, setting a sum value between the third product as a similarity between the first sample video data and the second sample video data, ordering the second sample video data according to the sample similarities, and selecting the first n pieces of second sample video data as second sample video data related to the first sample video data, which can be referenced with the tag thereof, thereby determining whether searching for the second sample video data is accurate.

In one example, taking MAP (MEAN AVERAGE Precision ) as an objective function, MAP may be used to measure whether the system can preferentially display the related results, then in this example, the objective function is an average of the accuracies of searching the second sample video data, and the accuracies of searching the second sample video data are an average of second products, where the second products represent the accuracies of searching the current second sample video data multiplied by the correlation between the current second sample video data and the first sample video data, and are expressed as follows:

Wherein MAP is an objective function, avgP is an accuracy of searching the second sample video data, Q is a number of the second sample video data, P (k) is an accuracy up to kth second sample video data, rel (k) is a correlation of kth second sample video data and the first sample video data, and R is a number of the second sample video data correlated with the first sample video data.

Of course, the above objective functions are merely examples, and in implementing the embodiment of the present invention, other objective functions may be set according to actual situations, for example, for whether the first sample video data and the second sample are correlated with each other, PR (Precision Recall), ROC (receiver operating characteristic curve, subject operating characteristic curve) AUC (Area Under ROC Curve, area under curve) may be used as the objective function, second sample video data may be ordered forward, nDCG (Normalized Discounted cumulative gain, normalized break cumulative gain) may be used as the objective function, and so on, which is not limited by the embodiment of the present invention. In addition, in addition to the above-mentioned manner of calculating the feature similarity, those skilled in the art may also adopt other manners of calculating the feature similarity according to actual needs, which is not limited in this embodiment of the present invention.

If the objective function is set, a plurality of values may be assigned to the weights using random, exhaustive, gradient descent, etc., and a search operation is performed to calculate the result of the objective function at different values.

Comparing the results of the objective function at all values, and determining the weight to apply the value when the result of the objective function is optimal (such as the maximum result output by the objective function).

In general, weights can be set for various video features offline, and the weights of various video features can be read when video similarity between target video data is calculated online.

Step 3062, calculating a first product between the feature similarity and the weight.

Step 3063, setting the sum value between the first products as the similarity between the current target video data and the adjacent target video data, as the video similarity.

Each feature similarity is calculated separately and multiplied by a corresponding weight, and the first products are summed to form an overall similarity between the current target video data and the adjacent target video data, which is referred to as video similarity.

Then in an embodiment, the formula for calculating video similarity is as follows:

wherein Sim _merge is the video similarity, m is the category of video features, w _i is the weight of the ith video feature, sim _i is the feature similarity between the ith video features.

Typically, as shown in fig. 4, the number of video data in the video library reaches millions or even tens of millions, and the number of the first video sequences reaches hundreds or tens, if the video similarity between each video data and all other video data is calculated, a large amount of storage space is occupied, and the video similarity is used less frequently, so that the similarity between video features can be calculated on the target video data (step 305), and the feature similarity can be fused into the video similarity (step 306).

Of course, considering the situations of less video data in the video library, higher frequency of use of video similarity, enough storage space, etc., the similarity between video features may also be calculated for the target video data offline (step 305), the feature similarity is fused into video similarity (step 306), and the video similarity required for querying when the break-up coefficient is calculated by online application of the video similarity is not limited in this embodiment.

Step 307, comparing the video similarity.

Step 308, selecting the video similarity with the largest value as the scattering coefficient corresponding to the current target video data.

In this embodiment, as shown in fig. 4, for a given window, the video similarities in the window may be compared, and the video similarity with the largest value may be screened out, which is used as the break-up coefficient corresponding to the current target video data.

Assuming that the window size is 4 and the position of the current target video data is the 4 th bit, the break-up coefficient may be expressed as follows:

Scroe_k＝max(Sim(v_k,v_k-1),Sim(v_k,v_k-2),Sim(v_k,v_k-3))

Where v _k is the current target video data, v _k-1、v_k-2、v_k-3 is the adjacent target video data, scroe _k represents the break up coefficient, max () represents the maximum value, sim () represents the video similarity.

Of course, other manners may be used to calculate the break-up coefficient corresponding to the current target video data besides taking the maximum value, for example, calculating the average value of the video similarity as the break-up coefficient corresponding to the current target video data, linearly fusing the video similarity to obtain the break-up coefficient corresponding to the current target video data, and so on.

Step 309, querying quality values of the respective target video data.

In selecting the target video data, a quality value may be calculated for the target video data in a manner that may be representative of the quality of the target video data for determining the order of the target video data in the first video sequence.

In general, the quality value is positively correlated with the order of ordering, i.e., the higher the quality value, the earlier the order of ordering the target video data in the first video sequence, and conversely, the lower the quality value, the later the order of ordering the target video data in the first video sequence.

Step 310, adjusting quality values of the respective target video data using the break-up coefficients.

In the present embodiment, the quality value of each target video data is adjusted with the break-up coefficient as a reference, thereby adjusting the ordering of each target video data.

In one way of adjusting the quality value, it may be determined whether the break-up coefficient corresponding to the target video data is less than or equal to a preset threshold; if yes, maintaining the quality value of the target video data; if not, the quality value of the target video data is reduced.

In one example of reducing the quality value, using the natural number e as a base, n (n is a positive number, e.g., 2) times the break-up coefficient as an index, a penalty coefficient is generated, and subtracting the penalty coefficient from the quality value of the target video data as a new quality value, then in this example, the adjustment of the quality value of the target video data is represented as follows:

wherein recScore _k is a quality value of the kth target video data, score _k is a break-up coefficient of the kth target video data, and thresh is a threshold.

Of course, the foregoing manner of adjusting the quality value is merely an example, and other manners of adjusting the quality value may be set according to actual situations when implementing the embodiment of the present invention, for example, if the break-up coefficient corresponding to the target video data is smaller than or equal to the first threshold, the quality value of the target video data is improved, if the break-up coefficient corresponding to the target video data is greater than the first threshold and smaller than or equal to the second threshold, the quality value of the target video data is maintained, if the break-up coefficient corresponding to the target video data is greater than the second threshold, the quality value of the target video data is reduced, and so on, which are not limited in the embodiment of the present invention. In addition, other ways of adjusting the quality value may be adopted by those skilled in the art according to actual needs, which is not limited in the embodiments of the present invention.

Step 311, if the adjustment is completed, the target video data is ordered according to the quality value, and a second video sequence is obtained.

As shown in fig. 4, when the adjustment of the quality values of the respective target video data is completed, the target video data may be ordered according to the quality values, and a second video sequence is obtained.

In general, the sorting is performed in a descending order, that is, the higher the adjusted quality value, the earlier the order of sorting the target video data in the second video sequence, and vice versa, the lower the adjusted quality value, the later the order of sorting the target video data in the second video sequence.

For example, as shown in fig. 5D, in the first video sequence, since the target video data 2 and the target video data 3 are adjacent and similar in content, the break-up coefficient of the target video data 3 is high, thereby lowering the quality value of the target video data 3, and the quality value of the target video data 3 is lower than the quality value of the target video data 4, so that after the target video data 3 is adjusted to the target video data 4 in the second video sequence, and further, since the target video data 5 and the target video data 6 are adjacent and similar in content, the break-up coefficient of the target video data 6 is high, thereby lowering the quality value of the target video data 6, the quality value of the target video data 6 is lower than the quality value of the target video data 7, so that after the target video data 6 is adjusted to the target video data 7 in the second video sequence.

Of course, the above manner of sorting the target video data by using the break-up coefficient is merely taken as an example, and in implementing the embodiment of the present invention, other manners of sorting the target video data by using the break-up coefficient may be set according to practical situations, for example, if the break-up coefficient is greater than the sorting threshold, the target video data is divided into the same columns, so as to be broken up by columns (i.e., the target video data is sequentially taken out from different columns to be sorted), or a given window slides in the first video sequence, if the target break-up coefficient of the current target video data is greater than the window threshold, the current target video data is adjusted out of the window, and so on, which is not limited in the embodiment of the present invention. In addition, in addition to the above-mentioned manner of sorting the target video data by using the break-up coefficient, those skilled in the art may also use other manners of sorting the target video data by using the break-up coefficient according to actual needs, which is not limited in this embodiment of the present invention.

In this embodiment, the quality value of each target video data is queried, the quality value is used to determine the sequence of the target video data in the first video sequence, the quality value of each target video data is adjusted by using the scattering coefficient, if the adjustment is completed, the target video data is ordered according to the quality value, and a second video sequence is obtained, on one hand, the quality value is ordered, the ordered program can be multiplexed, the modification of the system is reduced, the development cost is reduced, and on the other hand, all the target video data is ordered at one time, the scattering operation is simplified, and the consumption of computing resources is reduced.

It should be noted that, for simplicity of description, the method embodiments are shown as a series of acts, but it should be understood by those skilled in the art that the embodiments are not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred embodiments, and that the acts are not necessarily required by the embodiments of the invention.

Example III

Fig. 6 is a block diagram of a video data sorting apparatus according to a third embodiment of the present invention, which may specifically include the following modules:

A first video sequence acquisition module 601, configured to acquire a first video sequence to be sent to a client, where the first video sequence has a plurality of target video data;

The video similarity calculation module 602 is configured to traverse the first video sequence, and sequentially calculate a similarity between the current target video data and the adjacent target video data, as a video similarity;

a break-up coefficient calculation module 603, configured to generate a break-up coefficient for the current target video data with the video similarity as a reference;

A second video sequence obtaining module 604, configured to sort the plurality of target video data according to the scattering coefficients, so as to obtain a second video sequence.

In one embodiment of the present invention, the video similarity calculation module 602 includes:

a video feature extraction sub-module for extracting at least two video features from the target video data;

A window sliding sub-module, configured to slide a preset window in the first video sequence, so as to determine, in sequence, the current target video data and the adjacent target video data in the window;

A feature similarity calculation submodule for calculating similarity between each video feature of the current target video data and each video feature of the adjacent target video data as feature similarity;

And the feature similarity fusion sub-module is used for fusing the feature similarity into the similarity between the current target video data and the adjacent target video data to obtain video similarity.

In one embodiment of the present invention, the video feature extraction submodule includes:

The word vector model recording unit is used for loading the word vector model;

A text feature extraction unit, configured to input text information related to the target video data into the word vector model to extract features of the text information as video features;

the pre-training model recording unit is used for loading the pre-training model;

And the image feature extraction unit is used for inputting the image data in the video features into the pre-training model to extract the features of the image data as video features.

In one embodiment of the present invention, the video similarity calculation module 602 further includes:

and the feature compression sub-module is used for compressing the video features.

In one embodiment of the present invention, the feature compression submodule includes:

a self-encoder loading unit for loading a self-encoder, wherein the self-encoder comprises an encoder and a decoder;

a feature encoding unit for inputting the video features into the encoder for encoding;

And the feature decoding unit is used for inputting the video features into the decoder for decoding if the encoding is completed.

In one embodiment of the present invention, the feature similarity calculation submodule includes:

a cosine value calculation unit configured to calculate a cosine value between each of the video features of the current target video data and each of the video features of the adjacent target video data;

And the cosine value setting unit is used for setting the cosine value as the similarity between the video feature of the current target video data and the video feature of the adjacent target video data, and obtaining feature similarity.

In one embodiment of the present invention, the feature similarity fusion submodule includes:

a weight configuration unit for configuring weights for the feature similarities;

A weight product unit, configured to calculate a first product between the feature similarity and the weight respectively;

And a video similarity setting unit configured to set a sum value between the first products as a similarity between the current target video data and the adjacent target video data as a video similarity.

In one embodiment of the present invention, the weight configuration unit includes:

a sample video acquisition subunit configured to acquire first sample video data and second sample video data associated with a tag for representing a correlation with the first sample video data;

A sample feature determination subunit configured to determine features of the first sample video data and the second sample video data, respectively, as video features;

An objective function setting subunit configured to set an objective function for evaluating a degree to which a search operation is performed for searching the second sample video data related to the first sample video data, to which various of the video features and weights are applied, to match the tag;

A weight assignment subunit, configured to assign a plurality of numerical values to the weights;

and the weight determining subunit is used for determining that the weight applies the numerical value when the result of the objective function is optimal.

In one example of the invention, the objective function is an average of the accuracy of searching the second sample video data;

The accuracy of searching the second sample video data is an average of a second product representing the accuracy of searching the current second sample video data multiplied by the correlation between the current second sample video data and the first sample video data.

In one embodiment of the present invention, the break-up coefficient calculating module 603 includes:

the video similarity comparison sub-module is used for comparing the video similarity;

and the break-up coefficient setting sub-module is used for selecting the video similarity with the largest value as the break-up coefficient corresponding to the current target video data.

In one embodiment of the present invention, the second video sequence obtaining module 604 includes:

A quality value querying sub-module, configured to query quality values of each of the target video data, where the quality values are used to determine an order of the target video data in the first video sequence;

a quality value adjustment sub-module, configured to adjust quality values of the respective target video data using the scattering coefficients;

And the video data ordering sub-module is used for ordering the target video data according to the quality value if the adjustment is completed, so as to obtain a second video sequence.

In one embodiment of the present invention, the quality value adjustment submodule includes:

The scattering coefficient comparison unit is used for judging whether the scattering coefficient corresponding to the target video data is smaller than or equal to a preset threshold value; if yes, the quality value maintaining unit is called, and if not, the quality value reducing unit is called;

A quality value maintaining unit for maintaining the quality value of the target video data;

and a quality value reducing unit for reducing the quality value of the target video data.

In one embodiment of the present invention, the quality value reducing unit includes:

A punishment coefficient generation subunit, configured to generate a punishment coefficient by using a natural number as a base number and an n-time value of the break-up coefficient as an index;

A penalty coefficient subtracting subunit, configured to subtract the penalty coefficient from the quality value of the target video data.

The video data sorting device provided by the embodiment of the invention can execute the video data sorting method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.

Example IV

Fig. 7 is a schematic structural diagram of a computer device according to a fourth embodiment of the present invention. Fig. 7 illustrates a block diagram of an exemplary computer device 12 suitable for use in implementing embodiments of the present invention. The computer device 12 shown in fig. 7 is only an example and should not be construed as limiting the functionality and scope of use of embodiments of the invention.

As shown in fig. 7, the computer device 12 is in the form of a general purpose computing device. Components of computer device 12 may include, but are not limited to: one or more processors or processing units 16, a system memory 28, a bus 18 that connects the various system components, including the system memory 28 and the processing units 16.

Bus 18 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, a processor, and a local bus using any of a variety of bus architectures. By way of example, and not limitation, such architectures include Industry Standard Architecture (ISA) bus, micro channel architecture (MAC) bus, enhanced ISA bus, video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.

Computer device 12 typically includes a variety of computer system readable media. Such media can be any available media that is accessible by computer device 12 and includes both volatile and nonvolatile media, removable and non-removable media.

The system memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM) 30 and/or cache memory 32. The computer device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from or write to non-removable, nonvolatile magnetic media (not shown in FIG. 7, commonly referred to as a "hard disk drive"). Although not shown in fig. 7, a magnetic disk drive for reading from and writing to a removable non-volatile magnetic disk (e.g., a "floppy disk"), and an optical disk drive for reading from or writing to a removable non-volatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In such cases, each drive may be coupled to bus 18 through one or more data medium interfaces. Memory 28 may include at least one program product having a set (e.g., at least one) of program modules configured to carry out the functions of embodiments of the invention.

A program/utility 40 having a set (at least one) of program modules 42 may be stored in, for example, memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment. Program modules 42 generally perform the functions and/or methods of the embodiments described herein.

The computer device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), one or more devices that enable a user to interact with the computer device 12, and/or any devices (e.g., network card, modem, etc.) that enable the computer device 12 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 22. Moreover, computer device 12 may also communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN) and/or a public network, such as the Internet, through network adapter 20. As shown, network adapter 20 communicates with other modules of computer device 12 via bus 18. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with computer device 12, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.

The processing unit 16 executes various functional applications and data processing by running programs stored in the system memory 28, for example, implementing the video data sorting method provided by the embodiment of the present invention.

Example five

The fifth embodiment of the present invention further provides a computer readable storage medium, on which a computer program is stored, where the computer program when executed by a processor implements each process of the video data sorting method described above, and the same technical effects can be achieved, so that repetition is avoided, and no further description is given here.

The computer readable storage medium may include, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

Note that the above is only a preferred embodiment of the present invention and the technical principle applied. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, while the invention has been described in connection with the above embodiments, the invention is not limited to the embodiments, but may be embodied in many other equivalent forms without departing from the spirit or scope of the invention, which is set forth in the following claims.

Claims

1. A method of ordering video data, comprising:

sorting the plurality of target video data according to the scattering coefficients to obtain a second video sequence;

Wherein the target video data in the second video sequence is the same as the target video data in the first video sequence;

The traversing the first video sequence, sequentially calculating the similarity between the current target video data and the adjacent target video data, and taking the similarity as the video similarity, wherein the method comprises the following steps:

extracting at least two video features from the target video data;

Sliding a preset window in the first video sequence to sequentially determine the current target video data and the adjacent target video data in the window;

Calculating the similarity between each video feature of the current target video data and each video feature of the adjacent target video data as feature similarity;

fusing the feature similarity into the similarity between the current target video data and the adjacent target video data to obtain video similarity;

Taking the video similarity as a reference, generating a break-up coefficient for the current target video data, including:

Comparing the video similarity;

and selecting the video similarity with the largest value as a scattering coefficient corresponding to the current target video data.

2. The method of claim 1, wherein the extracting at least two video features from the target video data comprises:

Loading a word vector model;

inputting text information related to the target video data into the word vector model to extract characteristics of the text information as video characteristics;

loading a pre-training model;

Inputting the image data in the video features into the pre-training model to extract the features of the image data as video features.

3. The method of claim 1, further comprising, after said determining the characteristics of the target video data as video characteristics:

compressing the video features.

4. The method of claim 3, wherein said compressing said video features comprises:

loading a self-encoder, the self-encoder comprising an encoder, a decoder;

Inputting the video features into the encoder for encoding;

and if the encoding is finished, inputting the video features into the decoder for decoding.

5. The method of claim 1, wherein said calculating a similarity between each of said video features of current said target video data and each of said video features of adjacent said target video data as a feature similarity comprises:

calculating a cosine value between each of the video features of the current target video data and each of the video features of the adjacent target video data;

and setting the cosine value as the similarity between the video feature of the current target video data and the video feature of the adjacent target video data, and obtaining feature similarity.

6. The method of claim 1, wherein the fusing the feature similarity to a similarity between the current target video data and the neighboring target video data to obtain a video similarity comprises:

configuring weights for the various feature similarities;

Respectively calculating a first product between the feature similarity and the weight;

and setting the sum value between the first products as the similarity between the current target video data and the adjacent target video data as the video similarity.

7. The method of claim 6, wherein said configuring weights for each of said feature similarities comprises:

Acquiring first sample video data and second sample video data, the second sample video data being associated with a tag for representing a correlation with the first sample video data;

respectively determining the characteristics of the first sample video data and the second sample video data as video characteristics;

setting an objective function for evaluating the degree to which a search operation is performed to search for the second sample video data related to the first sample video data, applying various of the video features and weights, to match the tag;

Assigning a plurality of values to the weights;

and when the result of the objective function is optimal, determining that the weight applies the numerical value.

8. The method of claim 7, wherein the step of determining the position of the probe is performed,

The objective function is an average value of accuracy of searching the second sample video data;

9. The method according to any one of claims 1-8, wherein said ordering said target video data according to said break up coefficients to obtain a second video sequence, comprises:

Querying a quality value of each of the target video data, the quality value being used to determine an order of the target video data in the first video sequence;

Adjusting the quality value of each target video data by using the scattering coefficient;

And if the adjustment is completed, sequencing the target video data according to the quality value to obtain a second video sequence.

10. The method of claim 9, wherein said adjusting the quality value of each of said target video data using said break up coefficients comprises:

Judging whether the scattering coefficient corresponding to the target video data is smaller than or equal to a preset threshold value or not;

If yes, maintaining the quality value of the target video data;

if not, the quality value of the target video data is reduced.

11. The method of claim 10, wherein said reducing said quality value of said target video data comprises:

generating a punishment coefficient by taking a natural number as a base number and taking an n-times value of the scattering coefficient as an index;

the penalty coefficient is subtracted from the quality value of the target video data.

12. A video data sorting apparatus, comprising:

a first video sequence acquisition module, configured to acquire a first video sequence to be sent to a client, where the first video sequence acquisition module is configured to

The first video sequence has a plurality of target video data;

the second video sequence obtaining module is used for sequencing the plurality of target video data according to the scattering coefficients to obtain a second video sequence;

the video similarity calculation module comprises: a video feature extraction sub-module for extracting at least two video features from the target video data;

The feature similarity fusion sub-module is used for fusing the feature similarity into the similarity between the current target video data and the adjacent target video data to obtain video similarity;

The break-up coefficient calculation module includes:

13. A computer device, the computer device comprising:

One or more processors;

A memory for storing one or more programs,

The one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of ordering video data of any of claims 1-11.

14. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements a method of ordering video data according to any one of claims 1-11.