CN110868615B

CN110868615B - Video processing method and device, electronic equipment and storage medium

Info

Publication number: CN110868615B
Application number: CN201911165040.0A
Authority: CN
Inventors: 庄钟鑫
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-11-25
Filing date: 2019-11-25
Publication date: 2021-05-28
Anticipated expiration: 2039-11-25
Also published as: CN110868615A

Abstract

The embodiment of the application discloses a video processing method, a video processing device, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring a video decoding mode acquisition request; the video decoding mode obtaining request comprises video decoding parameters, a first video decoding discrimination model is called to determine a first decoding mode corresponding to the video decoding parameters, a second video decoding discrimination model is called to determine a second decoding mode corresponding to the video decoding parameters, and a target decoding mode matched with the video decoding mode obtaining request is determined according to the first decoding mode and the second decoding mode. By the method and the device, the decoding accuracy of the terminal for the coded data can be improved.

Description

Video processing method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a video processing method and apparatus, and a related device.

Background

With the rapid development of network and multimedia technology, multimedia applications have become one of the most common application software in terminals. The multimedia data has the characteristics of large data volume and data redundancy, so that a large bandwidth is needed in the transmission process, in order to reduce the data volume and the redundancy of the multimedia data, the multimedia data is encoded and compressed before being transmitted, and the receiving terminal decodes the encoded data after receiving the encoded data to obtain the multimedia data.

The decoding modes of the coded data comprise a hardware decoding mode and a software decoding mode, wherein the hardware decoding mode is to call a special hardware processor for decoding, the hardware processor can efficiently run a decoding program, the burden of a conventional CPU is reduced, and only the multimedia data with lower definition can be decoded;

the software decoding mode is realized by using a conventional CPU instruction, the decoding program can be operated on the conventional CPU to decode the multimedia data with higher definition, but the CPU load is high, and the overall power consumption of the system is higher.

At present, a terminal randomly selects a decoding mode of video data, that is, randomly selects a hardware decoding mode or a software decoding mode, and the randomly selected decoding mode increases a decoding error rate due to instability of the random mode.

Disclosure of Invention

The embodiment of the application provides a video processing method, a video processing device and related equipment, which can improve the decoding accuracy of a terminal on coded data.

An embodiment of the present application provides a video processing method, including:

acquiring a video decoding mode acquisition request; the video decoding mode acquisition request comprises video decoding parameters;

calling a first video decoding discrimination model to determine a first decoding mode corresponding to the video decoding parameters;

calling a second video decoding discrimination model to determine a second decoding mode corresponding to the video decoding parameters;

and determining a target decoding mode matched with the video decoding mode acquisition request according to the first decoding mode and the second decoding mode.

Wherein the first decoding mode comprises a hardware decoding mode or a non-hardware decoding mode; the second decoding mode comprises a software decoding mode or a non-software decoding mode;

the determining a target decoding mode matching the video decoding mode acquisition request according to the first decoding mode and the second decoding mode includes:

when the first decoding mode is a hardware decoding mode and the second decoding mode is a non-software decoding mode, determining a target decoding mode matched with the video decoding mode acquisition request as the hardware decoding mode;

when the first decoding mode is a non-hardware decoding mode and the second decoding mode is a software decoding mode, determining a target decoding mode matched with the video decoding mode acquisition request as the software decoding mode;

when the first decoding mode is a non-hardware decoding mode and the second decoding mode is a non-software decoding mode, determining that a target decoding mode matching the video decoding mode acquisition request is a non-supported decoding mode.

The first video decoding discrimination model comprises a first feature extractor and a first discriminator;

the calling a first video decoding discrimination model to determine a first decoding mode corresponding to the video decoding parameters includes:

normalizing the video decoding parameters to an input vector;

extracting first hidden feature information of the input vector based on the first feature extractor;

identifying a first matching probability that the first hidden feature information corresponds to the hard decoding mode and a second matching probability that the first hidden feature information corresponds to the non-hard decoding mode based on the first discriminator;

determining the first decoding mode according to the first matching probability and the second matching probability.

Wherein said determining the first decoding mode according to the first and second match probabilities comprises:

determining the first decoding mode as the hardware decoding mode when the first matching probability is greater than the second matching probability;

determining the first decoding mode as the non-hardware decoding mode when the first match probability is less than or equal to the second match probability.

The first video decoding discrimination model and the second video decoding discrimination model belong to video decoding discrimination models;

the method further comprises the following steps:

obtaining a first video decoding parameter

Calling a first sample model to determine a first predictive decoding mode corresponding to the first video decoding parameter;

calling a second sample model to determine a second predictive decoding mode corresponding to the first video decoding parameter; the first sample model and the second sample model belong to a sample video decoding discrimination model;

determining a predictive decoding mode from the first predictive decoding mode and the second predictive decoding mode;

acquiring first decoding feedback information aiming at the predictive decoding mode, and determining a discrimination error according to the first decoding feedback information;

and training the sample video decoding discrimination model according to the discrimination error to obtain the video decoding discrimination model.

Wherein the first predictive decoding mode comprises a hardware decoding mode or a non-hardware decoding mode; the second predictive decoding mode comprises a software decoding mode or a non-software decoding mode;

the determining a predictive decoding mode from the first predictive decoding mode and the second predictive decoding mode includes:

determining the predictive decoding mode to be the hardware decoding mode if the first predictive decoding mode is the hardware decoding mode and the second predictive decoding mode is the non-software decoding mode;

then, the training the sample video decoding discrimination model according to the discrimination error to obtain the video decoding discrimination model includes:

and training the first sample model according to the discrimination error to obtain the first video decoding discrimination model.

Wherein, still include:

determining the predictive decoding mode to be the software decoding mode if the first predictive decoding mode is the non-hardware decoding mode and the second predictive decoding mode is the software decoding mode;

and training the second sample model according to the discrimination error to obtain the second video decoding discrimination model.

Wherein, still include:

acquiring a second video decoding parameter set, and acquiring a second decoding feedback information set aiming at the second video decoding parameter set; second decoding feedback information in the second decoding feedback set comprises the hardware decoding mode or the non-hardware decoding mode;

acquiring a third video decoding parameter set, and acquiring a third decoding feedback information set aiming at the third video decoding parameter set; third decoding feedback information in the third decoding feedback set comprises the software decoding mode and the non-software decoding mode; a difference amount between the number of the second set of video decoding parameters and the number of the third set of video decoding parameters is less than a difference amount threshold;

training a first original model according to the second video decoding parameter set and the second decoding feedback information set to obtain the first original model;

and training a second original model according to the third video decoding parameter set and the third decoding feedback information set to obtain the second sample model.

Another aspect of the embodiments of the present application provides a video processing method, including:

acquiring video coding data;

acquiring video decoding parameters associated with the video coding data;

packaging the video decoding parameters into a video decoding mode acquisition request, sending the video decoding mode acquisition request to a server, and instructing the server to determine a target decoding mode matched with the video decoding parameters based on a first video decoding discrimination model and a second video decoding discrimination model;

receiving a target decoding mode matched with the video decoding parameters returned by the server;

and decoding the video coded data according to the target decoding mode to generate a video to be played.

Wherein, still include:

acquiring decoding feedback information of the video coded data;

sending the video decoding parameters and the decoding feedback information to the server; the video decoding parameters and the decoding feedback information are used for updating the first video decoding discrimination model and the second video decoding discrimination model.

Another aspect of the embodiments of the present application provides a video processing apparatus, including:

the first acquisition module is used for acquiring a video decoding mode acquisition request; the video decoding mode acquisition request comprises video decoding parameters;

the first calling module is used for calling a first video decoding discrimination model to determine a first decoding mode corresponding to the video decoding parameters;

the second calling module is used for calling a second video decoding discrimination model to determine a second decoding mode corresponding to the video decoding parameters;

and the determining module is used for determining a target decoding mode matched with the video decoding mode acquisition request according to the first decoding mode and the second decoding mode.

the determining module includes:

a first determining unit configured to determine, when the first decoding mode is a hardware decoding mode and the second decoding mode is a non-software decoding mode, a target decoding mode that matches the video decoding mode acquisition request as the hardware decoding mode;

a second determining unit configured to determine, when the first decoding mode is a non-hardware decoding mode and the second decoding mode is a software decoding mode, a target decoding mode that matches the video decoding mode acquisition request as the software decoding mode;

the second determining unit is further configured to determine that a target decoding mode matching the video decoding mode acquisition request is a non-hardware decoding mode when the first decoding mode is a non-hardware decoding mode and the second decoding mode is a non-software decoding mode.

the first calling module comprises:

a combination unit, configured to normalize the video decoding parameters to an input vector, extract first hidden feature information of the input vector based on the first feature extractor, and identify a first matching probability that the first hidden feature information corresponds to the hard decoding mode and a second matching probability that the first hidden feature information corresponds to the non-hard decoding mode based on the first discriminator;

a third determining unit configured to determine the first decoding mode according to the first matching probability and the second matching probability.

The third determining unit is specifically configured to determine that the first decoding mode is the hardware decoding mode when the first matching probability is greater than the second matching probability, and determine that the first decoding mode is the non-hardware decoding mode when the first matching probability is less than or equal to the second matching probability.

the device further comprises:

a second obtaining module for obtaining the first video decoding parameter

The first calling module is further configured to call a first sample model to determine a first predictive decoding mode corresponding to the first video decoding parameter;

the second calling module is further configured to call a second sample model to determine a second predictive decoding mode corresponding to the first video decoding parameter; the first sample model and the second sample model belong to a sample video decoding discrimination model;

a prediction module for determining a predictive decoding mode from the first predictive decoding mode and the second predictive decoding mode;

the second obtaining module is further configured to obtain first decoding feedback information for the predictive decoding mode, and determine a discrimination error according to the first decoding feedback information;

and the training module is used for training the sample video decoding discrimination model according to the discrimination error to obtain the video decoding discrimination model.

the prediction module comprises:

a first prediction unit configured to determine that the predictive decoding mode is the hardware decoding mode if the first predictive decoding mode is the hardware decoding mode and the second predictive decoding mode is the non-software decoding mode;

the training module includes:

and the first training unit is used for training the first sample model according to the discrimination error to obtain the first video decoding discrimination model.

Wherein, still include:

a second prediction unit configured to determine that the predictive decoding mode is the software decoding mode if the first predictive decoding mode is the non-hardware decoding mode and the second predictive decoding mode is the software decoding mode;

the training module includes:

and the second training unit is used for training the second sample model according to the discrimination error to obtain the second video decoding discrimination model.

Wherein, still include:

a third obtaining module, configured to obtain a second video decoding parameter set, and obtain a second decoding feedback information set for the second video decoding parameter set; second decoding feedback information in the second decoding feedback set comprises the hardware decoding mode or the non-hardware decoding mode;

the third obtaining module is further configured to obtain a third video decoding parameter set, and obtain a third decoding feedback information set for the third video decoding parameter set; third decoding feedback information in the third decoding feedback set comprises the software decoding mode and the non-software decoding mode; a difference amount between the number of the second set of video decoding parameters and the number of the third set of video decoding parameters is less than a difference amount threshold;

the third obtaining module is further configured to train a first original model according to the second video decoding parameter set and the second decoding feedback information set, so as to obtain the first sample model;

the third obtaining module is further configured to train a second original model according to the third video decoding parameter set and the third decoding feedback information set, so as to obtain the second sample model.

the fourth acquisition module is used for acquiring video coded data and acquiring video decoding parameters related to the video coded data;

the first sending module is used for packaging the video decoding parameters into a video decoding mode obtaining request, sending the video decoding mode obtaining request to a server, and indicating the server to determine a target decoding mode matched with the video decoding parameters based on a first video decoding discrimination model and a second video decoding discrimination model;

the receiving module is used for receiving a target decoding mode which is returned by the server and matched with the video decoding parameters;

and the decoding module is used for decoding the video coding data according to the target decoding mode to generate a video to be played.

Wherein, still include:

the second sending module is used for acquiring decoding feedback information of the video coded data and sending the video decoding parameters and the decoding feedback information to the server; the video decoding parameters and the decoding feedback information are used for updating the first video decoding discrimination model and the second video decoding discrimination model.

Another aspect of the embodiments of the present application provides a video processing system, including: a server and a client;

the client is used for acquiring video coding data and video decoding parameters associated with the video coding data;

the client is further used for packaging the video decoding parameters into a video decoding mode acquisition request and sending the video decoding mode acquisition request to the server;

the server is used for calling a first video decoding discrimination model to determine a first decoding mode corresponding to the video decoding parameters;

the server is also used for calling a second video decoding discrimination model to determine a second decoding mode corresponding to the video decoding parameters;

the server is further used for determining a target decoding mode matched with the video decoding mode acquisition request according to the first decoding mode and the second decoding mode, and sending the target decoding mode to the client;

and the client is also used for decoding the video coding data according to the target decoding mode to generate a video to be played.

In another aspect, an embodiment of the present application provides an electronic device, which includes a memory and a processor, where the memory stores a computer program, and the computer program, when executed by the processor, causes the processor to execute the method in the foregoing embodiments.

Another aspect of the embodiments of the present application provides a computer storage medium storing a computer program, where the computer program includes program instructions, and the program instructions, when executed by a processor, perform the method in the above embodiments.

The method comprises the steps of determining a first decoding mode matched with video decoding parameters by calling a first video decoding discrimination model, determining a second decoding mode matched with the video decoding parameters by calling a second video decoding discrimination model, and determining a target decoding mode from the first decoding mode and the second decoding mode. In the above way, the decoding mode matched with the video decoding parameters of the current video to be decoded is predicted by using the two prediction models, and compared with the randomly selected decoding mode, the target decoding mode determined by the method is consistent with the video decoding parameters, so that the uncertainty of randomly selected decoding mode can be avoided, and the decoding accuracy of the terminal on the video data is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a system architecture diagram of a video processing system according to an embodiment of the present application;

2 a-2 b are schematic views of a video processing scene provided by an embodiment of the present application;

fig. 3 is a schematic flowchart of a video processing method according to an embodiment of the present application;

fig. 4 is a schematic flowchart of another video processing method provided in the embodiment of the present application;

fig. 5 is an interaction diagram of a video processing method according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of a video processing apparatus according to an embodiment of the present application;

fig. 7a is a schematic structural diagram of another video processing apparatus according to an embodiment of the present application;

fig. 7b is a schematic structural diagram of a video processing system according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present invention;

fig. 9 is a schematic structural diagram of another computer device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

The scheme provided by the embodiment of the application belongs to Machine Learning (ML) belonging to the field of artificial intelligence.

Machine learning is a multi-field cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence.

In the present application, a decoding mode that is most matched with a video decoding parameter is predicted based on a video decoding discrimination model, and specific technical means relate to techniques such as artificial neural network, logistic regression, deep learning, and the like in machine learning.

Fig. 1 is a system architecture diagram of video processing according to an embodiment of the present application. The application relates to a server 10d and a terminal device cluster, and the terminal device cluster may include: terminal device 10a, terminal device 10 b.

Taking the terminal device 10a as an example, when a video application in the terminal device 10a needs to play a video, video encoded data of the current video to be played, and video parameters and device parameters are obtained, and the video parameters and the device parameters are combined into video decoding parameters. The terminal device 10a sends the combined video decoding parameters to the server 10d, the server 10d determines a decoding mode matched with the video decoding parameters based on the trained first prediction model and the trained second prediction model, and the server 10d issues the predicted decoding mode to the terminal device 10 a. The terminal device 10a decodes the video coded data according to the decoding mode issued by the server 10d to obtain the video to be played.

Wherein determining the decoding mode matching the video decoding parameters based on the trained first prediction mode and the trained second prediction model may also be performed by the terminal device 10 a.

The

terminal devices

10a, 10b, 10c, etc. shown in fig. 1 may include a mobile phone, a tablet computer, a notebook computer, a palm computer, a Mobile Internet Device (MID), a wearable device (e.g., a smart watch, a smart bracelet, etc.), etc.

Fig. 2a to fig. 2b are described in detail below by taking how the terminal device 10a and the server 10d determine the decoding mode of the video to be played currently as an example:

please refer to fig. 2 a-2 b, which are schematic views of a video processing scene according to an embodiment of the present application. When a user needs to play a video a in the terminal device 10a, the terminal device 10a pulls video encoded data of the video a from the video server, where the video encoded data carries video parameters, and the video parameters include: a video coding format (e.g., the video coding format is h.264, i.e., data a is encoded using the h.264 coding lattice) and a definition (e.g., the definition is 1920 × 1080).

The terminal device 10a obtains device parameters, which may include: a chip model (e.g., the chip model is MT6755), a CPU core number (e.g., a 4-core processor), and a system version number (e.g., the system version number is 5.1).

The terminal device 10a may send the device parameters (including the chip model "MT 6755", the CPU core number "4", and the system version number "5.1") and the video parameters (including the video coding format "h.264" and the definition "1920 × 1080") and the combination as the decoding parameters 20a to the server 10d, so that the server 10d feeds back the decoding mode matching with the decoding parameters 20 a.

After receiving the decoding parameters 20a, the server normalizes and vectorizes the decoding parameters 20 a:

the chip model "MT 6755" is converted into a value 1, wherein the chip model "MT 6755" corresponds to a value 1, the chip signal "MT 6756" corresponds to a value 2, and so on.

The server 10d converts the CPU core number "4" into a value of 4, the system version number "5.1" into a value of 5.1, and the video coding format "h.264" into a value of 1, where the video coding format "h.264" corresponds to a value of 1, the video coding format "HEVC" corresponds to a value of 2, and so on.

The server 10d converts the definition "1920 × 1080" into a value of 4, wherein the definition "1920 × 1080" corresponds to a value of 4, the definition "1280 × 720" corresponds to a value of 3, and so on.

In the above normalization manner, the server 10d can convert the decoding parameters 20a into the input vectors 20 b: [1,4,5.1,1,4].

The server 10d acquires the trained first prediction model 20c and second prediction model 20d, where the first prediction model 20c and second prediction model 20d are models having a prediction function, and the models having a prediction function may be a BP (Back Propagation) neural network model, a convolutional neural network model, or various regression models (e.g., a linear regression model, a logistic regression model), or the like. The first prediction model 20c and the second prediction model 20d may have the same model structure or different model structures.

The first prediction model 20c may output a recommended coefficient decoded by hardware and a recommended coefficient not decoded by hardware, and the first prediction model 20d may output a recommended coefficient decoded by software and a recommended coefficient not decoded by software, where a higher recommended coefficient indicates a higher decoding success rate by using a corresponding decoding method.

The server 10d inputs the input vector 20b "[ 1, 4, 5.1, 1, 4 ]" into the first prediction model 20c, and the first prediction model 20c outputs the corresponding first prediction vector 20e, and as can be seen from the first prediction vector 20e, the recommendation coefficient of the first prediction model 20c for hardware decoding is 0.8, and the recommendation coefficient for non-hardware decoding is 0.2, it can also be understood that the probability that the first prediction model 20c predicts that hardware decoding succeeds is 0.8, and the probability that hardware decoding fails is 0.2.

The server 10d inputs the input vector 20b "[ 1, 4, 5.1, 1, 4 ]" into the second prediction model 20d, the second prediction model 20d outputs a corresponding second prediction vector 20f, and as can be seen from the second prediction vector 20f, the recommendation coefficient of the second prediction model 20d for software decoding is 0.4, and the recommendation coefficient for non-software decoding is 0.6, it can also be understood that the probability that the second prediction model 20d predicts that the decoding using the software is successful is 0.4, and the probability that the decoding using the software is unsuccessful is 0.6.

As can be seen from the first prediction vector 20e, the first prediction model 20e recommends the use of a hardware decoding mode; as can be seen from the second prediction vector 20f, the second prediction model 20d recommends using a non-software decoding mode, and the server 10d can determine the decoding mode matching the decoding parameters 20a as a hardware decoding mode.

As shown in fig. 2b, the server 10d may return the determined decoding mode (i.e., hardware decoding mode) to the terminal device 10, and the subsequent terminal device 10a may decode the video encoding data of the video a in the hardware decoding mode to play the video a.

A specific procedure for determining a first decoding mode (a hardware decoding mode recommended by the first prediction model in the above embodiment) corresponding to a video decoding parameter (the decoding parameter 20a in the above embodiment) based on the first video decoding discrimination model (the first prediction model 20c in the above embodiment), and determining a second decoding mode (a non-software decoding mode recommended by the second prediction model in the above embodiment) corresponding to the video decoding parameter based on the second video decoding discrimination model (the second prediction model 20d in the above embodiment) may refer to the embodiments corresponding to fig. 3 to 5 described below.

Fig. 3 is a schematic flow chart of a video processing method according to an embodiment of the present application, which illustrates how to respond to a video decoding mode acquisition request of a client from a server side. As shown in fig. 3, the video processing method may include the steps of:

step S101, acquiring a video decoding mode acquisition request; the video decoding mode acquisition request includes video decoding parameters.

Specifically, a server (e.g., the server 10d in the embodiment corresponding to fig. 2a to fig. 2 b) receives a video decoding acquisition request sent by a client, where the video decoding acquisition request includes video decoding parameters (e.g., the decoding parameters 20a in the embodiment corresponding to fig. 2a to fig. 2 b).

The video decoding parameters may include video parameters (such as the video parameters in the corresponding embodiments of fig. 2 a-2 b: video coding format "h.264" and definition "1920 × 1080") and device parameters (such as the device parameters in the corresponding embodiments of fig. 2 a-2 b: chip model "MT 6755", CPU core number "4" and system version number "5.1").

The video parameter refers to an attribute parameter of the video encoded data corresponding to the video decoding mode acquisition request, for example, a video encoding format, a video resolution, a video frame rate, and/or a video encoding rate.

The device parameter is a device attribute parameter of the terminal device where the client is located, for example, a terminal brand, a terminal model, a main chip model, a CPU core number, a CPU main frequency, a system version number, a system iteration number (i.e., Build number), a system SD storage amount, and/or a system memory amount.

The system in the application can be an Android operating system.

Step S102, a first video decoding discrimination model is called to determine a first decoding mode corresponding to the video decoding parameters.

Specifically, the server may determine a decoding mode (referred to as a first decoding mode) corresponding to the video decoding parameter by calling a first video decoding discrimination model (such as the first prediction model 20c in the corresponding embodiment of fig. 2 a-2 b described above).

The first decoding mode may include a hardware decoding mode or a non-hardware decoding mode, where the hardware decoding mode refers to invoking a dedicated hardware processor (e.g., GPU) to decode, and the hardware processor can efficiently run a decoding program, thereby reducing the burden of a conventional CPU.

The non-hardware decoding mode indicates a decoding mode other than the hardware decoding mode, and may be a software decoding mode or a non-hardware decoding mode.

The software decoding mode is to realize the decoding program by using a conventional CPU instruction, and the decoding program runs on the conventional CPU.

The non-support of the decoding mode means that the current video encoding data cannot be decoded in either the hardware decoding mode or the software decoding mode, and certainly, the reason for this may be that the performance of the terminal device is not enough, or the video data is encoded by using a special encoding mode, or the display requirement of the video data is too high for the common terminal device to meet the display requirement.

The first video decoding discrimination model is a model with a prediction function obtained through AI algorithm training, and the first video decoding discrimination model may be a BP neural network model, a convolutional neural network model, or various regression models (e.g., a linear regression model, a logistic regression model), and the like.

The following is a detailed description of how the first decoding mode is determined:

the server normalizes and vectorizes the video decoding parameters to obtain an input vector (such as input vector 20b "[ 1, 4, 5.1, 1, 4] in the corresponding embodiment of fig. 2 a-2 b described above).

The normalization refers to converting video decoding parameters into standard numerical fields, and the vectorization refers to combining the normalized numerical fields into vectors according to a preset sequence.

For example, the video decoding parameters include: the video coding format is H.264 and the chip model is ACD01, if the standard numerical value field corresponding to the video coding format is 1; the standard value field corresponding to the video coding format "MPEG 1" is 2; the standard value field corresponding to the video coding format "MPEG 3" is 3; and the standard numerical field corresponding to the chip model 'AFG 40' is 1; the standard value field corresponding to the chip model "ACD 01" is 2.

Then, the video encoding format "h.264" in the video encoding parameter may be converted into a numerical field 1, the chip model "ACD 01" may be converted into a numerical field 2, and the 2 numerical fields are combined into an input vector according to a preset sequence of the video encoding format before and the chip model after: [1,2].

Of course, the rule for converting the video decoding parameters into the standard value fields is defined in advance, and the rule is adopted to convert the video decoding parameters transmitted by the client into the standardized vectors whether model training or mode use, wherein the standardization refers to the consistent dimension of the vectors.

The server calls a trained first video decoding discriminant model, wherein the first video decoding discriminant model comprises a feature extractor (called a first feature extractor) and a discriminant (called a first discriminant), and the feature extractor is used for extracting hidden features of the input vector.

The deep-level hidden features of the input vector (referred to as first hidden feature information) are extracted based on a first feature extractor, and the matching probability between the extracted first hidden feature information and the hardware decoding mode (referred to as first matching probability, such as the recommended coefficient for hardware decoding 0.8 in the corresponding embodiment of fig. 2 a-2 b) and the non-hardware decoding mode (referred to as second matching probability, such as the recommended coefficient for non-hardware decoding 0.2 in the corresponding embodiment of fig. 2 a-2 b) is identified based on a first discriminator.

Since the hardware decoding mode and the non-hardware decoding mode correspond to the yes or no, the sum of the first matching probability and the second matching probability is equal to 1.

If the first matching probability is greater than the second matching probability, the server may determine that the first decoding mode is a hardware decoding mode;

if the first match probability is less than or equal to the second match probability, the server may determine that the first decoding mode is a non-hardware decoding mode.

Step S103, calling a second video decoding discrimination model to determine a second decoding mode corresponding to the video decoding parameters.

Specifically, similar to step S102, the server may determine a decoding mode (referred to as a second decoding mode) corresponding to the video decoding parameter by calling a second video decoding discrimination model (such as the second prediction model 20d in the corresponding embodiment of fig. 2 a-2 b).

The second decoding mode may include a software decoding mode or a non-software decoding mode, and the software decoding mode is to implement the decoding program by using a conventional CPU instruction, and the decoding program runs on the conventional CPU.

The non-software decoding mode indicates a decoding mode other than the software decoding mode, and may be a hardware decoding mode or a non-supported decoding mode.

The hardware decoding mode refers to calling a special hardware processor (such as a GPU) for decoding, and the hardware processor can run a decoding program efficiently, so that the burden of a conventional CPU is reduced.

The second video decoding discrimination model is also a model with a prediction function obtained through AI algorithm training, and the first video decoding discrimination model may be a BP neural network model, a convolutional neural network model, or various regression models (e.g., a linear regression model, a logistic regression model), and the like.

The model structure of the first video decoding discrimination model and the model structure of the second video decoding discrimination model can be the same or different; and the sequencing of determining the first decoding mode based on the first video discriminant model and determining the second decoding mode based on the second video discriminant model is not limited, and can also be performed simultaneously.

How to determine the second decoding mode is described in detail below:

the server normalizes and vectorizes the video decoding parameters to obtain the input vector, and the specific process of obtaining the input vector may refer to step S102.

The server calls a trained second video decoding discriminant model, wherein the second video decoding discriminant model comprises a feature extractor (called a second feature extractor) and a discriminant (called a second discriminant), and the feature extractor is used for extracting hidden features of the input vector.

The deep-level hidden features of the input vector are extracted based on the second feature extractor (referred to as second hidden feature information), and the matching probability between the extracted second hidden feature information and the software decoding mode (referred to as third matching probability, such as the recommended coefficient for software decoding 0.4 in the corresponding embodiment of fig. 2 a-2 b described above) and the non-software decoding mode (referred to as fourth matching probability, such as the recommended coefficient for non-software decoding 0.6 in the corresponding embodiment of fig. 2 a-2 b described above) is identified based on the second discriminator.

Since the software decoding mode and the non-software decoding mode correspond to the yes or no, the sum of the third matching probability and the fourth matching probability is equal to 1.

If the third matching probability is greater than the fourth matching probability, the server may determine that the second decoding mode is a software decoding mode;

if the third match probability is less than or equal to the fourth match probability, the server may determine that the second decoding mode is a non-software decoding mode.

And step S104, determining a target decoding mode matched with the video decoding mode acquisition request according to the first decoding mode and the second decoding mode.

Specifically, it can be determined from step S102 that the first decoding mode is either a hardware decoding mode or a non-hardware decoding mode; from step S103 it can be determined whether the second decoding mode is either a software decoding mode or a non-software decoding mode.

If the first decoding mode is a hardware decoding mode and the second decoding mode is a non-software decoding mode, the server may determine that a target decoding mode matched with the video decoding mode acquisition request is the hardware decoding mode;

if the first decoding mode is a non-hardware decoding mode and the second decoding mode is a software decoding mode, the server may determine that a target decoding mode matching the video decoding mode acquisition request is the software decoding mode;

if the first decoding mode is a non-hardware decoding mode and the second decoding mode is a non-software decoding mode, the server may determine that the target decoding mode matched with the video decoding mode obtaining request is a non-supported decoding mode;

if the first decoding mode is a non-hardware decoding mode and the second decoding mode is a non-software decoding mode, the server may determine that the target decoding mode matching the video decoding mode acquisition request is an arbitrary decoding mode.

Any decoding mode can be a hardware decoding mode or a software decoding mode randomly.

Subsequently, the server may send the determined target decoding mode to the client, so as to respond to a video decoding mode acquisition request of the client.

If the target decoding mode received by the client is the hardware decoding mode, the client can decode the video coded data by adopting the hardware decoding mode;

if the target decoding mode received by the client is a software decoding mode, the client can decode the video coded data by adopting the software decoding mode;

if the target decoding mode received by the client is any decoding mode, the client can randomly adopt a hardware decoding mode or a software decoding mode to decode the video coded data;

if the target decoding mode received by the client side is the decoding mode which is not supported, the client side can display a prompt message for prompting a user that the current terminal equipment cannot decode the current video coding data; or the client goes to the video server again to request the rest video coding data. For example, the same video but encoded video data encoded by different encoding methods is requested, or the same video but encoded video data with lower video resolution is requested.

Optionally, 2 classification models (i.e., the first video decoding discrimination model and the second video decoding discrimination model) may be used to determine the target decoding mode, or 1 classification model may be used to determine the target decoding mode. The 3 classification models respectively output the probability that the video decoding parameter is matched with the hardware decoding mode (called as a first auxiliary probability), the probability that the video decoding parameter is matched with the software decoding mode (called as a second auxiliary probability) and the probability that the video decoding parameter is matched with the non-supported decoding mode (called as a third auxiliary probability), and the sum of the 3 probabilities at the moment can be unequal to 1. When the target decoding mode is determined, a probability threshold may be set, and the decoding mode corresponding to the assist probability greater than the probability threshold is taken as the target decoding mode.

This application adopts 2 classification models to compare and has 2 with 1 advantage of 3 classification models: firstly, when the model is trained, the convergence rate of the 2-class model is far faster than that of the 3-class model, and multiple experiments prove that: the training time spent on 2 classification models is less than the training time spent on 13 classification models. Therefore, in order to improve the model training speed, the method adopts a mode of 2 classification models;

secondly, during model training, a real tag is needed to determine a classification error for reversely adjusting model parameters, and the source of the real tag in the application is that a client tries to decode by using different decoding modes to obtain feedback information, and if the client tries to decode by using a hardware decoding mode, the decoding fails, which cannot indicate that a software decoding mode succeeds, so that in order to obtain a 3-class label, the client needs to try to decode by using a software decoding mode to determine that a really matched decoding mode is a software decoding mode or a non-supported decoding mode, which causes the process of obtaining the real tag to be complicated and consumes a long time. The method and the device adopt 2 classification models (either success or failure), the process of obtaining the real label is simple, and the method and the device are more suitable for the application scene of video decoding.

The above describes the usage process of the first video decoding discrimination model and the second video decoding discrimination model, the discrimination models are obtained by training a large amount of sample data, and the following describes the training process of the models:

when the model is trained, the first video decoding discrimination model and the second video decoding discrimination model participate in training together, the server obtains video decoding parameters (called as first video decoding parameters) which are sent by the client and used for model training, and calls the first sample model to determine a decoding mode (called as a first prediction decoding mode) corresponding to the first video decoding parameters. The first predictive decoding mode includes a hardware decoding mode or a non-hardware decoding mode.

The server invokes a second sample model to determine a decoding mode (referred to as a second predictive decoding mode) corresponding to the first video decoding parameters, the second predictive mode comprising a software decoding mode or a non-software decoding mode. The first sample model and the second sample model both belong to a sample video decoding discrimination model.

The process of determining the first and second predictive decoding modes is similar to that described above.

The method comprises the steps that a server determines a prediction decoding mode according to a first prediction mode and a second prediction mode, the server can issue the prediction decoding mode to a client, the client decodes according to the prediction decoding mode, after decoding, the client feeds back feedback information (called as first decoding feedback information) aiming at the prediction decoding mode to the server, the first decoding feedback information is used for identifying whether the client successfully decodes the prediction decoding mode issued by the server, a judgment error can be determined according to the first decoding feedback information (of course, if decoding succeeds, the judgment error is small, and if decoding fails, the judgment error is large), and the server trains a sample video decoding judgment model according to the judgment error.

Because the sample video decoding discrimination model comprises the first sample model and the second sample model, training the sample video decoding discrimination model according to the discrimination error is divided into training the first sample model and the second sample model:

if the first predictive decoding mode is a hardware decoding mode and the second predictive decoding mode is a non-software decoding mode, the server may determine that the predictive decoding mode is a hardware decoding mode. On the premise, the first decoding feedback information fed back by the server from the client includes a hardware mode or a non-hardware mode, wherein the hardware mode in the first decoding feedback information is used for identifying that the client successfully decodes by using the hardware decoding mode, and the non-hardware mode in the first decoding feedback information is used for identifying that the client fails to use the hardware decoding mode. And determining a discrimination error according to the first decoding feedback information, and training the first sample model according to the discrimination error.

If the first predictive decoding mode is a non-hardware decoding mode and the second predictive decoding mode is a software decoding mode, the server may determine that the predictive decoding mode is a software decoding mode. The server receives first decoding feedback information which is fed back by the client and aims at the software decoding mode, and on the premise, the first decoding feedback information comprises a software mode or a non-software mode, wherein the software mode in the first decoding feedback information is used for identifying that the client successfully decodes by adopting the software decoding mode, and the non-software mode in the first decoding feedback information is used for identifying that the client fails to adopt the software decoding mode. And determining a discrimination error according to the first decoding feedback information, and training a second sample model according to the discrimination error.

The sample video decoding discrimination model is continuously and iteratively trained in the above manner, and when the sample video decoding discrimination model (i.e. the first sample model and the second sample model) meets the convergence condition, the sample video decoding discrimination model can be used as the above video decoding discrimination model, the first sample model is determined as the first video decoding discrimination model, and the second sample model is determined as the second video decoding discrimination model.

It is to be noted that, when model training is performed for the first time, model parameters in the sample video decoding model may be randomized, but the randomization may cause that the prediction decoding modes output by the first model are all hardware decoding modes (or all software decoding modes), and then the clients all decode according to the hardware decoding modes (or all decode according to the software decoding modes), so that the sample video decoding discrimination model cannot learn knowledge of the software decoding modes (or cannot learn knowledge of the hardware decoding modes).

In order to make the number of samples for training the first sample model and the second sample model uniform, therefore, in the initial situation of type training, different decoding modes are randomly selected and issued to the client, so that the client tries different decoding modes to obtain corresponding feedback information for training the sample video decoding discrimination model, when the training times are enough, the output of the sample video decoding discrimination model is adopted to let the client try decoding, and then the sample video decoding discrimination model is further trained to obtain the video decoding discrimination model, the specific process is as follows:

the server acquires a plurality of video decoding parameters (referred to as a second video decoding parameter set) sent by the client for model training, and feedback information (referred to as second decoding feedback information, all of which may be referred to as a second decoding feedback information set) sent by the client for each second video decoding parameter, each second decoding feedback information including a hardware decoding mode or a non-hardware decoding mode. The hardware decoding mode in the second decoding feedback information is used for identifying that the client adopts the hardware decoding mode to decode successfully, and the non-hardware decoding mode in the second decoding feedback information is used for identifying that the client adopts the hardware decoding mode to decode unsuccessfully.

That is, the second set of decoding feedback information is feedback information of whether the client attempts to decode with the hardware decoding mode successfully, based on which the model can be made to learn knowledge about the hardware decoding mode.

The server acquires a plurality of video decoding parameters (referred to as a third video decoding parameter set) sent by the client for model training, and feedback information (referred to as third decoding feedback information, all the third decoding feedback information may be referred to as a third decoding feedback information set) sent by the client for each third video decoding parameter, wherein each third decoding feedback information includes a software decoding mode or a non-software decoding mode. The software decoding mode in the third decoding feedback information is used for identifying that the client adopts the software decoding mode to decode successfully, and the non-software decoding mode in the third decoding feedback information is used for identifying that the client adopts the software decoding mode to decode unsuccessfully.

That is, the third set of decoding feedback information is feedback information of whether the client attempts to decode with the software decoding mode successfully, based on which the model can be made to learn knowledge about the software decoding mode.

The number of the second video decoding parameter sets is close to that of the third video decoding parameter sets, so that sample data used for training the first sample model and the second sample model subsequently is uniform, and the condition that the samples are unbalanced does not exist.

And the server trains the first original model according to the second video decoding parameter set and the second decoding feedback information set to obtain a first sample model.

And the server trains a second original model according to the third video decoding parameter set and the third decoding feedback information set to obtain a second sample model.

The specific process of training the first and second primitive models is similar to the process of training the first and second sample models described above.

Fig. 4 is a schematic flow chart of another video processing method according to an embodiment of the present application, which illustrates how to send a video decoding acquisition request to a server from a client side to acquire a decoding mode matched with video encoded data to be played. As shown in fig. 4, the video processing method may include the steps of:

in step S201, video encoded data is acquired.

Specifically, the client obtains video encoding data of a video to be played, where the video encoding data is to convert a video (i.e., an image frame sequence) into another data expression mode through an encoding algorithm in order to reduce the space occupation of the video in the transmission or storage process, and the encoding algorithm may include: h.265, MPEG, or JVT, etc.

Step S202, obtaining video decoding parameters associated with the video coding data.

Specifically, the client obtains video decoding parameters of the video coded data, and the video decoding parameters may include video parameters and device parameters.

The video parameter refers to an attribute parameter of the video encoding data, such as a video encoding format, a video resolution, a video frame rate, and/or a video encoding rate.

Step S203, packaging the video decoding parameters into a video decoding mode obtaining request, sending the video decoding mode obtaining request to a server, and instructing the server to determine a target decoding mode matched with the video decoding parameters based on a video decoding discrimination model.

Specifically, the client packages the acquired video decoding parameters into a video decoding acquisition request, and sends the video decoding acquisition request to the server, so that the server determines a target decoding mode matched with the video decoding parameters based on the trained first video decoding discrimination model and the trained second video decoding discrimination model, where a specific process of determining the target decoding mode by the server may refer to steps S101 to S104 in the embodiment corresponding to fig. 3.

And step S204, receiving a target decoding mode matched with the video decoding parameters returned by the server.

And step S205, decoding the video coding data according to the target decoding mode to generate a video to be played.

Specifically, if the target decoding mode received by the client is a hardware decoding mode, the client may decode the video encoded data by using the hardware decoding mode to obtain a video to be played, and may play the video to be played;

if the target decoding mode received by the client is the software decoding mode, the client can decode the video coding data by adopting the software decoding mode to obtain a video to be played, and the video to be played can be played;

if the target decoding mode received by the client is any decoding mode, the client can randomly adopt a hardware decoding mode or a software decoding mode to decode video coding data to obtain a video to be played, and the video to be played can be played;

if the target decoding mode received by the client side is the decoding mode which is not supported, the client side can display a prompt message for prompting a user that the current terminal equipment cannot decode the video coded data; or the client goes to the video server again to request the rest video data. For example, the video data which is requested to be the same video but encoded by adopting different encoding modes, or the video encoding data which is requested to be the same video but has lower video resolution.

Optionally, subsequently, the client may obtain feedback information of the video encoded data, where the feedback information is used to indicate whether the client successfully decodes the video encoded data in the target decoding mode. The client can send the video decoding parameters and the decoding feedback information to the server together, so that the server updates the first video decoding discrimination model and the second video decoding discrimination model according to the video decoding parameters and the decoding feedback information.

The above two embodiments describe the specific processes of the video processing method from the server side and the client side, respectively, and the following describes the specific processes of the video processing method through the interaction between the client and the server.

Please refer to fig. 5, which is an interaction diagram of a video processing method according to an embodiment of the present application, where a data reporting server, a deep learning server, and a decoding mode query server in the following embodiments all belong to the servers in the present application, and the video processing includes the following steps:

step S301, the client acquires sample equipment parameters and sample video parameters for model training, and sends the sample equipment parameters and the sample video parameters to a data reporting server.

Step S302, the data reporting server sends the sample equipment parameters and the sample video parameters to the deep learning server.

Step S303, the deep learning server trains the model based on the sample device parameters and the sample video parameters to obtain a first video decoding discrimination model and a second video decoding discrimination model.

Specifically, the deep learning server trains a sample video decoding discrimination model based on the sample device parameters and the sample video parameters to obtain a video decoding discrimination model, wherein the video decoding discrimination model comprises a first video decoding discrimination model and a second video decoding discrimination model.

The specific process of the deep learning server training the sample video decoding discriminant model can be referred to step S104 in the corresponding embodiment of fig. 3.

Step S304, the client side obtains the parameters of the equipment to be inquired and the parameters of the video to be inquired, and sends the parameters of the equipment to be inquired and the parameters of the video to be inquired to the decoding mode inquiry server.

Step S305, the decoding mode query server transmits the parameters of the equipment to be queried and the parameters of the video to be queried to a server for deep learning.

Step S306, the deep learning server determines a decoding mode matched with the parameters of the equipment to be inquired and the parameters of the video to be inquired based on the first video decoding discrimination model and the second video decoding discrimination model.

In step S307, the deep learning server sends the determined decoding mode to the decoding mode query server.

Step S308, the decoding mode query server issues the decoding mode to the client.

Step S309, the client decodes the video coding data according to the decoding mode, acquires feedback information whether the decoding is successful, and sends the feedback information to the data reporting server.

And step S310, the data reporting server forwards the feedback information to the deep learning server.

In step S311, the deep learning server updates the video decoding discrimination model based on the feedback information.

In the above way, the decoding mode matched with the video decoding parameters of the current video to be decoded is predicted by using the two prediction modes, and compared with the randomly selected decoding mode, the target decoding mode determined by the method is consistent with the video decoding parameters, so that the uncertainty of random selection can be avoided, and the accuracy of the terminal in decoding the video data can be improved.

Further, please refer to fig. 6, which is a schematic structural diagram of a video processing apparatus according to an embodiment of the present application. As shown in fig. 6, the video processing apparatus 1 may be applied to the server in the corresponding embodiments of fig. 3 to 5, and the video processing apparatus 1 may include: a first obtaining module 11, a first calling module 12, a second calling module 13 and a determining module 14.

A first obtaining module 11, configured to obtain a video decoding mode obtaining request; the video decoding mode acquisition request comprises video decoding parameters;

a first calling module 12, configured to call a first video decoding discrimination model to determine a first decoding mode corresponding to the video decoding parameter;

a second calling module 13, configured to call a second video decoding discrimination model to determine a second decoding mode corresponding to the video decoding parameter;

a determining module 14, configured to determine, according to the first decoding mode and the second decoding mode, a target decoding mode matching the video decoding mode obtaining request.

For specific functional implementation manners of the first obtaining module 11, the first calling module 12, the second calling module 13, and the determining module 14, reference may be made to steps S101 to S104 in the embodiment corresponding to fig. 3, which is not described herein again.

Referring to fig. 6, the first decoding mode includes a hardware decoding mode or a non-hardware decoding mode; the second decoding mode comprises a software decoding mode or a non-software decoding mode;

the determination module 14 may include: a first determination unit 141 and a second determination unit 142.

A first determining unit 141, configured to determine, when the first decoding mode is a hardware decoding mode and the second decoding mode is a non-software decoding mode, that a target decoding mode matching the video decoding mode acquisition request is the hardware decoding mode;

a second determining unit 142, configured to determine, when the first decoding mode is a non-hardware decoding mode and the second decoding mode is a software decoding mode, that a target decoding mode matching the video decoding mode obtaining request is the software decoding mode;

the second determining unit 142 is further configured to determine that the target decoding mode matching the video decoding mode obtaining request is a non-hardware decoding mode when the first decoding mode is a non-hardware decoding mode and the second decoding mode is a non-software decoding mode.

For specific functional implementation manners of the first determining unit 141 and the second determining unit 142, reference may be made to step S104 in the embodiment corresponding to fig. 3, which is not described herein again.

Referring to fig. 6, the first calling module 12 may include: a combining unit 121 and a third determining unit 122.

A combining unit 121, configured to normalize the video decoding parameters to an input vector, extract first hidden feature information of the input vector based on the first feature extractor, and identify a first matching probability that the first hidden feature information corresponds to the hard decoding mode and a second matching probability that the first hidden feature information corresponds to the non-hard decoding mode based on the first discriminator;

a third determining unit 122, configured to determine the first decoding mode according to the first matching probability and the second matching probability;

the third determining unit 122 is specifically configured to determine that the first decoding mode is the hardware decoding mode when the first matching probability is greater than the second matching probability, and determine that the first decoding mode is the non-hardware decoding mode when the first matching probability is less than or equal to the second matching probability.

For specific functional implementation manners of the combining unit 121 and the third determining unit 122, reference may be made to step S102 in the embodiment corresponding to fig. 3, which is not described herein again.

Referring to fig. 6, the video processing apparatus 1 may include: the system comprises a first acquisition module 11, a first calling module 12, a second calling module 13 and a determination module 14; the method can also comprise the following steps: a second acquisition module 15, a prediction module 16 and a training module 17.

A second obtaining module 15 for obtaining the first video decoding parameter

The first invoking module 12 is further configured to invoke a first sample model to determine a first predictive decoding mode corresponding to the first video decoding parameter;

the second calling module 13 is further configured to call a second sample model to determine a second predictive decoding mode corresponding to the first video decoding parameter; the first sample model and the second sample model belong to a sample video decoding discrimination model;

a prediction module 16 for determining a predictive decoding mode according to the first predictive decoding mode and the second predictive decoding mode;

the second obtaining module 15 is further configured to obtain first decoding feedback information for the predictive decoding mode, and determine a decision error according to the first decoding feedback information;

and the training module 17 is configured to train the sample video decoding discrimination model according to the discrimination error to obtain the video decoding discrimination model.

The specific processes of the second obtaining module 15, the predicting module 16 and the training module 17 may refer to step S104 in the embodiment corresponding to fig. 3, and are not described herein again.

Referring to fig. 6, the first predictive decoding mode includes a hardware decoding mode or a non-hardware decoding mode; the second predictive decoding mode comprises a software decoding mode or a non-software decoding mode;

the prediction module 16 may include: a first prediction unit 161;

a first prediction unit 161, configured to determine that the predictive decoding mode is the hardware decoding mode if the first predictive decoding mode is the hardware decoding mode and the second predictive decoding mode is the non-software decoding mode;

the training module 17 may comprise: a first training unit 171.

The first training unit 171 is configured to train the first sample model according to the discriminant error to obtain the first video decoding discriminant model.

The prediction module 16 may further include: a second prediction unit 162;

a second prediction unit 162, configured to determine that the predictive decoding mode is the software decoding mode if the first predictive decoding mode is the non-hardware decoding mode and the second predictive decoding mode is the software decoding mode;

the training module 17 may comprise: a second training unit 172.

And a second training unit 172, configured to train the second sample model according to the discrimination error, so as to obtain the second video decoding discrimination model.

The specific processes of the first prediction unit 161, the first training unit 171, the second prediction unit 162, and the second training unit 172 may refer to step S104 in the embodiment corresponding to fig. 3, and are not described herein again.

Referring to fig. 6, the video processing apparatus 1 may include: a first obtaining module 11, a first calling module 12, a second calling module 13, a determining module 14, a second obtaining module 15, a predicting module 16 and a training module 17; the method can also comprise the following steps: a third acquisition module 18.

A third obtaining module 18, configured to obtain a second video decoding parameter set, and obtain a second decoding feedback information set for the second video decoding parameter set; second decoding feedback information in the second decoding feedback set comprises the hardware decoding mode or the non-hardware decoding mode;

the third obtaining module 18 is further configured to obtain a third video decoding parameter set, and obtain a third decoding feedback information set for the third video decoding parameter set; third decoding feedback information in the third decoding feedback set comprises the software decoding mode and the non-software decoding mode; a difference amount between the number of the second set of video decoding parameters and the number of the third set of video decoding parameters is less than a difference amount threshold;

the third obtaining module 18 is further configured to train a first original model according to the second video decoding parameter set and the second decoding feedback information set, so as to obtain the first sample model;

the third obtaining module 18 is further configured to train a second original model according to the third video decoding parameter set and the third decoding feedback information set, so as to obtain the second sample model.

The specific process of the third obtaining module 18 may refer to step S104 in the embodiment corresponding to fig. 3, which is not described herein again.

Further, please refer to fig. 7a, which is a schematic structural diagram of another video processing apparatus according to an embodiment of the present application. As shown in fig. 7a, the video processing apparatus 2 may be applied to the client in the corresponding embodiments of fig. 3 to fig. 5, and the video processing apparatus 2 may include: a fourth obtaining module 21, a first sending module 22, a receiving module 23 and a decoding module 24.

A fourth obtaining module 21, configured to obtain video coded data, and obtain video decoding parameters associated with the video coded data;

the first sending module 22 is configured to package the video decoding parameters into a video decoding mode obtaining request, send the video decoding mode obtaining request to a server, and instruct the server to determine, based on a first video decoding discrimination model and a second video decoding discrimination model, a target decoding mode matching the video decoding parameters;

a receiving module 23, configured to receive a target decoding mode that is returned by the server and matches the video decoding parameter;

and the decoding module 24 is configured to decode the video encoded data according to the target decoding mode, and generate a video to be played.

For specific processes of the fourth obtaining module 21, the first sending module 22, the receiving module 23, and the decoding module 24, reference may be made to steps S201 to S204 in the embodiment corresponding to fig. 4, which is not described herein again.

Referring to fig. 7a, the video processing apparatus 2 may include: a fourth obtaining module 21, a first sending module 22, a receiving module 23 and a decoding module 24; the method can also comprise the following steps: a second sending module 25.

A second sending module 25, configured to obtain decoding feedback information of the video encoded data, and send the video decoding parameter and the decoding feedback information to the server; the video decoding parameters and the decoding feedback information are used for updating the first video decoding discrimination model and the second video decoding discrimination model.

The specific process of the second sending module 25 may refer to step S204 in the embodiment corresponding to fig. 4, which is not described herein again.

Fig. 7b is a schematic structural diagram of a video processing system according to an embodiment of the present invention. The video processing system 3 may include: server 100a and client 100c, and client 100c and server 100a establish a connection over network 100 b.

A client 100c for acquiring video encoding data and video decoding parameters associated with the video encoding data;

specifically, the client 100c obtains video encoding data of a video to be played, where the video encoding data is to convert a video (i.e., an image frame sequence) into another data expression mode through an encoding algorithm in order to reduce the space occupation amount of the video during transmission or storage, and the encoding algorithm may include: h.265, MPEG, or JVT, etc.

The client 100c obtains video decoding parameters of the video encoding data, and the video decoding parameters may include video parameters and device parameters.

The device parameter is a device attribute parameter of the client 100c, such as a terminal brand, a terminal model, a main chip model, a CPU core number, a CPU main frequency, a system version number, a system iteration number (i.e., Build number), a system SD storage amount, and/or a system memory amount.

The client 100c is further configured to package the video decoding parameters into a video decoding mode obtaining request, and send the video decoding mode obtaining request to the server 100a through the server 100 b;

the server 100a is configured to invoke a first video decoding discrimination model to determine a first decoding mode corresponding to the video decoding parameter;

specifically, the server 100a may determine a decoding mode (referred to as a first decoding mode) corresponding to the video decoding parameter by calling a first video decoding discrimination model.

Wherein the first decoding mode may include a hardware decoding mode or a non-hardware decoding mode.

the server 100a normalizes and vectorizes the video decoding parameters to obtain an input vector. The server 100a calls a trained first video decoding discriminant model, wherein the first video decoding discriminant model includes a feature extractor (referred to as a first feature extractor) and a discriminant (referred to as a first discriminant), and the feature extractor is used for extracting hidden features of the input vector.

The method comprises the steps of extracting deep hidden features (called first hidden feature information) of an input vector based on a first feature extractor, and identifying matching probability (called first matching probability) between the extracted first hidden feature information and a hardware decoding mode and matching probability (called second matching probability) between the extracted first hidden feature information and a non-hardware decoding mode based on a first discriminator.

If the first matching probability is greater than the second matching probability, the server 100a may determine that the first decoding mode is a hardware decoding mode;

if the first match probability is less than or equal to the second match probability, the server 100a may determine that the first decoding mode is a non-hardware decoding mode.

The server 100a is further configured to invoke a second video decoding discrimination model to determine a second decoding mode corresponding to the video decoding parameter;

specifically, the server 100a may determine a decoding mode (referred to as a second decoding mode) corresponding to the video decoding parameter by calling a second video decoding discrimination model.

How to determine the second decoding mode is described in detail below:

the server 100a normalizes and vectorizes the video decoding parameters to obtain an input vector.

The server 100a calls a trained second video decoding discriminant model, where the second video decoding discriminant model includes a feature extractor (referred to as a second feature extractor) and a discriminant (referred to as a second discriminant), and the feature extractor is used for extracting hidden features of the input vector.

The deep-level hidden features of the input vector (called second hidden feature information) are extracted based on a second feature extractor, and the matching probability between the extracted second hidden feature information and the software decoding mode (called third matching probability) and the matching probability between the extracted second hidden feature information and the non-software decoding mode (called fourth matching probability) are identified based on a second discriminator.

If the third matching probability is greater than the fourth matching probability, the server 100a may determine that the second decoding mode is a software decoding mode;

if the third match probability is less than or equal to the fourth match probability, the server 100a may determine that the second decoding mode is a non-software decoding mode.

The server 100a is further configured to determine a target decoding mode matched with the video decoding mode obtaining request according to the first decoding mode and the second decoding mode, and send the target decoding mode to the client 100 c;

specifically, if the first decoding mode is a hardware decoding mode and the second decoding mode is a non-software decoding mode, the server 100a may determine that the target decoding mode matched with the video decoding mode obtaining request is the hardware decoding mode;

if the first decoding mode is a non-hardware decoding mode and the second decoding mode is a software decoding mode, the server 100a may determine that the target decoding mode matching the video decoding mode acquisition request is the software decoding mode;

if the first decoding mode is a non-hardware decoding mode and the second decoding mode is a non-software decoding mode, the server 100a may determine that the target decoding mode matching the video decoding mode acquisition request is a non-supported decoding mode;

if the first decoding mode is a non-hardware decoding mode and the second decoding mode is a non-software decoding mode, the server 100a may determine that the target decoding mode matching the video decoding mode acquisition request is an arbitrary decoding mode.

Subsequently, the server 100a may send the determined target decoding mode to the client 100c in response to the video decoding mode acquisition request of the client 100 c.

The client 100c is further configured to decode the video encoded data according to the target decoding mode, and generate a video to be played.

Specifically, if the target decoding mode received by the client 100c is the hardware decoding mode, the client 100c may decode the video encoded data by using the hardware decoding mode to obtain a video to be played, and may play the video to be played;

if the target decoding mode received by the client 100c is the software decoding mode, the client 100c may decode the video encoded data by using the software decoding mode to obtain a video to be played, and may play the video to be played;

if the target decoding mode received by the client 100c is any decoding mode, the client 100c may randomly decode the video encoded data by using a hardware decoding mode or a software decoding mode to obtain a video to be played, and may play the video to be played;

if the target decoding mode received by the client 100c is the decoding unsupported mode, the client 100c may display a prompt message for prompting the user that the current terminal device cannot decode the video encoded data; or the client 100c goes back to the video server to request the rest of the video data. For example, the video data which is requested to be the same video but encoded by adopting different encoding modes, or the video encoding data which is requested to be the same video but has lower video resolution.

According to the method and the device, the decoding mode matched with the video decoding parameters of the current video coding data to be decoded is predicted by using the two prediction models, compared with the randomly selected decoding mode, the target decoding mode determined by the method is consistent with the video decoding parameters, the uncertainty of the randomly selected decoding mode can be avoided, and the decoding accuracy of the terminal on the video data is improved.

Further, please refer to fig. 8, which is a schematic structural diagram of an electronic device according to an embodiment of the present invention. The server in the embodiments corresponding to fig. 3 to fig. 5 may be an electronic device 1000, as shown in fig. 8, where the electronic device 1000 may include: a user interface 1002, a processor 1004, an encoder 1006, and a memory 1008. Signal receiver 1016 is used to receive or transmit data via cellular interface 1010, WIFI interface 1012. The encoder 1006 encodes the received data into a computer-processed data format. The memory 1008 has stored therein a computer program by which the processor 1004 is arranged to perform the steps of any of the method embodiments described above. The memory 1008 may include volatile memory (e.g., dynamic random access memory DRAM) and may also include non-volatile memory (e.g., one time programmable read only memory OTPROM). In some examples, the memory 1008 can further include memory located remotely from the processor 1004, which can be connected to the electronic device 1000 via a network. The user interface 1002 may include: a keyboard 1018, and a display 1020.

In the electronic device 1000 shown in fig. 8, the processor 1004 may be configured to call the memory 1008 to store a computer program to implement:

In one embodiment, the first decoding mode comprises a hardware decoding mode or a non-hardware decoding mode; the second decoding mode comprises a software decoding mode or a non-software decoding mode;

when the processor 1004 determines the target decoding mode matching the video decoding mode acquisition request according to the first decoding mode and the second decoding mode, specifically, the following steps are performed:

In one embodiment, the first video decoding discrimination model includes a first feature extractor and a first discriminator;

when the processor 1004 executes the first video decoding discrimination model to determine the first decoding mode corresponding to the video decoding parameter, the following steps are specifically executed:

normalizing the video decoding parameters to an input vector;

In one embodiment, when the processor 1004 determines the first decoding mode according to the first matching probability and the second matching probability, the following steps are specifically performed:

In one embodiment, the first video decoding discrimination model and the second video decoding discrimination model belong to a video decoding discrimination model;

the processor 1004 also performs the following steps:

obtaining a first video decoding parameter

In one embodiment, the first predictive decoding mode comprises a hardware decoding mode or a non-hardware decoding mode; the second predictive decoding mode comprises a software decoding mode or a non-software decoding mode;

when the processor 1004 determines the predictive decoding mode according to the first predictive decoding mode and the second predictive decoding mode, specifically, the following steps are performed:

In one embodiment, the processor 1004 further performs the following steps:

It should be understood that the electronic device 1000 described in the embodiment of the present invention may perform the description of the video processing method in the embodiment corresponding to fig. 3 to fig. 5, and may also perform the description of the video processing apparatus 1 in the embodiment corresponding to fig. 6, which is not described herein again. In addition, the beneficial effects of the same method are not described in detail.

Further, here, it is to be noted that: an embodiment of the present invention further provides a computer storage medium, where the computer storage medium stores the aforementioned computer program executed by the video processing apparatus 1, and the computer program includes program instructions, and when the processor executes the program instructions, the description of the video processing method in the embodiment corresponding to fig. 3 to fig. 5 can be performed, so that details are not repeated here. In addition, the beneficial effects of the same method are not described in detail. For technical details not disclosed in the embodiments of the computer storage medium to which the present invention relates, reference is made to the description of the method embodiments of the present invention.

Further, please refer to fig. 9, which is a schematic structural diagram of another computer device according to an embodiment of the present invention. The client in the embodiments corresponding to fig. 3 to fig. 5 may be a computer device 2000, as shown in fig. 9, the computer device 2000 may include: a user interface 2002, a processor 2004, an encoder 2006, and a memory 2008. The signal receiver 2016 is configured to receive or transmit data via the cellular interface 2010, the WIFI interface 2012, the. Encoder 2006 encodes the received data into a computer-processed data format. The memory 2008 has stored therein a computer program, and the processor 2004 is arranged to execute the steps of any of the method embodiments described above by means of the computer program. The memory 2008 may include a volatile memory (e.g., dynamic random access memory DRAM) and may also include a non-volatile memory (e.g., an otp rom OTPROM). In some examples, the memory 2008 may further include memory remotely located from the processor 2004, which may be connected to the computer device 2000 via a network. The user interface 2002 may include: a keyboard 2018 and a display 2020.

In the computer device 2000 shown in fig. 9, the processor 2004 may be configured to call the memory 2008 to store a computer program to implement:

acquiring video coding data;

acquiring video decoding parameters associated with the video coding data;

In one embodiment, the processor 2004 further performs the steps of:

acquiring decoding feedback information of the video coded data;

It should be understood that the computer device 2000 described in the embodiment of the present invention may perform the description of the video processing method in the embodiment corresponding to fig. 3-5, and may also perform the description of the video processing apparatus 2 in the embodiment corresponding to fig. 7a, which is not described herein again. In addition, the beneficial effects of the same method are not described in detail.

Further, here, it is to be noted that: an embodiment of the present invention further provides a computer storage medium, and the computer storage medium stores the aforementioned computer program executed by the video processing apparatus 2, and the computer program includes program instructions, and when the processor executes the program instructions, the description of the video processing method in the embodiment corresponding to fig. 3 to 5 can be performed, so that details are not repeated here. In addition, the beneficial effects of the same method are not described in detail. For technical details not disclosed in the embodiments of the computer storage medium to which the present invention relates, reference is made to the description of the method embodiments of the present invention.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

The above disclosure is only for the purpose of illustrating the preferred embodiments of the present invention, and it is therefore to be understood that the invention is not limited by the scope of the appended claims.

Claims

1. A video processing method, comprising:

2. The method of claim 1, wherein the first video decoding discrimination model comprises a first feature extractor and a first discriminator;

normalizing the video decoding parameters to an input vector;

identifying a first matching probability corresponding to the first hidden feature information and the hardware decoding mode and a second matching probability corresponding to the first hidden feature information and the non-hardware decoding mode based on the first discriminator;

3. The method of claim 2, wherein determining the first decoding mode based on the first and second match probabilities comprises:

4. The method of claim 1, wherein the first video decoding discrimination model and the second video decoding discrimination model belong to a video decoding discrimination model;

the method further comprises the following steps:

acquiring a first video decoding parameter;

5. The method of claim 4, wherein the first predictive decoding mode comprises a hardware decoding mode or a non-hardware decoding mode; the second predictive decoding mode comprises a software decoding mode or a non-software decoding mode;

6. The method of claim 5, further comprising:

7. The method of claim 4, further comprising:

8. A video processing method, comprising:

acquiring video coding data;

acquiring video decoding parameters associated with the video coding data;

when the server determines that a first decoding mode corresponding to the video decoding parameter is a hardware decoding mode based on a first video decoding discriminant model and determines that a second decoding mode corresponding to the video decoding parameter is a non-software decoding mode based on a second video decoding discriminant model, a target decoding mode matched with the video decoding mode acquisition request is the hardware decoding mode;

when the server determines that a first decoding mode corresponding to the video decoding parameter is a non-hardware decoding mode based on a first video decoding discriminant model and determines that a second decoding mode corresponding to the video decoding parameter is a software decoding mode based on a second video decoding discriminant model, determining that a target decoding mode matched with the video decoding mode acquisition request is the software decoding mode;

when the server determines that a first decoding mode corresponding to the video decoding parameter is a non-hardware decoding mode based on a first video decoding discrimination model and determines that a second decoding mode corresponding to the video decoding parameter is a non-software decoding mode based on a second video decoding discrimination model, determining that a target decoding mode matched with the video decoding mode acquisition request is a non-support decoding mode;

9. The method of claim 8, further comprising:

acquiring decoding feedback information of the video coded data;

10. A video processing apparatus, comprising:

a determining module, configured to determine a target decoding mode matching the video decoding mode obtaining request according to the first decoding mode and the second decoding mode, including: when the first decoding mode is a hardware decoding mode and the second decoding mode is a non-software decoding mode, determining a target decoding mode matched with the video decoding mode acquisition request as the hardware decoding mode;

11. A video processing apparatus, comprising:

12. A video processing system comprising a server and a client;

the server is further configured to determine a target decoding mode matched with the video decoding mode obtaining request according to the first decoding mode and the second decoding mode, and send the target decoding mode to the client, where when the first decoding mode is a hardware decoding mode and the second decoding mode is a non-software decoding mode, it is determined that the target decoding mode matched with the video decoding mode obtaining request is the hardware decoding mode; when the first decoding mode is a non-hardware decoding mode and the second decoding mode is a software decoding mode, determining a target decoding mode matched with the video decoding mode acquisition request as the software decoding mode; when the first decoding mode is a non-hardware decoding mode and the second decoding mode is a non-software decoding mode, determining that a target decoding mode matched with the video decoding mode acquisition request is a non-supported decoding mode;

13. An electronic device comprising a memory and a processor, the memory storing a computer program that, when executed by the processor, causes the processor to perform the steps of the method according to any one of claims 1-9.

14. A computer storage medium, characterized in that the computer storage medium stores a computer program comprising program instructions which, when executed by a processor, perform the method according to any one of claims 1-9.