CN116453023B

CN116453023B - Video abstraction system, method, electronic equipment and medium for 5G rich media information

Info

Publication number: CN116453023B
Application number: CN202310437286.9A
Authority: CN
Inventors: 沈浩; 黄海量; 吴东进; 韩松乔; 吴优
Original assignee: Shanghai Zhixun Information Technology Co ltd
Current assignee: Shanghai Zhixun Information Technology Co ltd
Priority date: 2023-04-23
Filing date: 2023-04-23
Publication date: 2024-01-26
Anticipated expiration: 2043-04-23
Also published as: CN116453023A

Abstract

The embodiment of the invention discloses a video abstraction system, a method, electronic equipment and a medium of 5G rich media information, wherein the video abstraction method of the 5G rich media information comprises the following steps: acquiring a video message information set Y in a 5G rich media message, and constructing a training set based on the video message information set Y; constructing a video abstract model, wherein the video abstract model comprises a time decoder, a perceptron and a transducer module which are sequentially connected; training the video abstract model through the training set to obtain a trained video abstract model; and inputting the video to be identified into the trained video abstract model to obtain a video sampling picture set y' of the video to be identified. The video abstraction method of the 5G rich media information solves the problems that in the prior art, the video content identification difficulty is large, the identification time is long, and the method is not suitable for a high-concurrency short message sending scene.

Description

Video abstraction system, method, electronic equipment and medium for 5G rich media information

Technical Field

The invention relates to the technical field of computers, in particular to a video abstraction system, a video abstraction method, electronic equipment and a video abstraction medium for 5G rich media information.

Background

The 5G rich media message is a great leap of the communication capability of the short message industry, and compared with the traditional text short message, the 5G rich media message has more supported media formats and richer expression forms, can send rich media information such as long text, pictures, voice, video and the like, and also comprises the user interaction and feedback capability such as public numbers, applets and the like, so that the application scene, the content quality and the application range of the 5G rich media message are greatly improved.

A 5G rich media message comprises a plurality of text message information sets X (X ₁ ,x ₂ ,..) and a plurality of video message information sets Y (Y ₁ ,y ₂ ,..) and a plurality of picture message information sets Z (Z) ₁ ,z ₂ ,..), the video message information contains large video content, but the video content is difficult to identify, the identification time is long, and the video message information is not suitable for a high-concurrency short message sending scene.

Therefore, a method for identifying a video picture summary of a 5G rich media message suitable for a high concurrency short message transmission scene is needed.

Disclosure of Invention

The embodiment of the invention aims to provide a video abstraction system, a method, electronic equipment and a medium for 5G rich media information, which are used for solving the problems that in the prior art, the video content identification difficulty is high, the identification time is long and the video abstraction system is not suitable for a high-concurrency short message sending scene.

In order to achieve the above objective, an embodiment of the present invention provides a method for video abstraction of 5G rich media information, which specifically includes:

acquiring a video message information set Y in a 5G rich media message, and constructing a training set based on the video message information set Y;

constructing a video abstract model, wherein the video abstract model comprises a time decoder, a perceptron and a transducer module which are sequentially connected;

training the video abstract model through the training set to obtain a trained video abstract model;

and inputting the video to be identified into the trained video abstract model to obtain a video sampling picture set y' of the video to be identified.

Based on the technical scheme, the invention can also be improved as follows:

further, the constructing a video summary model, where the video summary model includes a time decoder, a perceptron, and a transducer module connected in sequence, includes:

performing time sequence processing on the video message information set Y based on the time decoder;

generating a corresponding segmentation sequence based on the two layers of perceptrons;

vectorization analysis is carried out on each segmented sequence through a transducer module to obtain a sequence feature set R of each segmented sequence, tolerance rate between every two sequence features in the sequence feature set R is calculated, and a segmented sequence set with the maximum tolerance rate is obtained based on the tolerance rate

From the set of partitioned sequencesAnd randomly extracting n pictures from each segmentation sequence to form a video sampling picture set y' of the video to be identified.

Further, the video abstraction method of the 5G rich media information further comprises the following steps:

acquiring a picture message in a 5G rich media message, and constructing a picture message information set Z based on the picture message and the video sampling picture set y';

constructing a feature extraction model and a bad picture classification model;

performing feature extraction on the picture message information set Z based on the feature extraction model to obtain a picture depth feature set Z;

and sequentially inputting the pictures in the picture depth feature set z into the bad picture classification model to judge whether all the pictures in the picture depth feature set z are all compliant.

constructing a voice-to-text model;

and converting the video message information set Y into a video text set Y through the voice-to-text model.

acquiring a text message in the 5G rich media message;

constructing a text message information set X based on the text message and the video text set y;

constructing a sensitive word variant recognition model;

and sequentially inputting the text messages in the text message information set X into the sensitive word variant recognition model to judge whether all the text messages in the text message information set X are all compliant.

and when all the text messages in the text message information set X are in compliance and all the pictures in the picture depth feature set z are in compliance, judging that the 5G rich media message can be normally sent.

Further, the training the video abstract model through the training set to obtain a trained video abstract model includes:

dividing the video message information set Y into a training set, a testing set and a verification set;

training the video summary model based on the training set;

performing performance verification on the video abstract model based on the verification set, and storing an improved CTC model meeting performance conditions;

and evaluating the identification effect of the video abstract model based on the test set.

A video summarization system for 5G rich media information, comprising:

the acquisition module is used for acquiring a video message information set Y in the 5G rich media message and constructing a training set based on the video message information set Y;

the construction module is used for constructing a video abstraction model, wherein the video abstraction model comprises a time decoder, a perceptron and a transducer module which are connected in sequence;

the training module is used for training the video abstract model through the training set to obtain a trained video abstract model;

An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method when the computer program is executed.

A non-transitory computer readable medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method.

The embodiment of the invention has the following advantages:

according to the video abstraction method of the 5G rich media information, a video message information set Y in the 5G rich media information is obtained, and a training set is constructed based on the video message information set Y; constructing a video abstract model, wherein the video abstract model comprises a time decoder, a perceptron and a transducer module which are sequentially connected; training the video abstract model through the training set to obtain a trained video abstract model; the video to be identified is input into the trained video abstract model to obtain a video sampling picture set y' of the video to be identified, so that the problems that in the prior art, the video content identification difficulty is high, the identification time is long, and the method is not suitable for a high-concurrency short message sending scene are solved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It will be apparent to those of ordinary skill in the art that the drawings in the following description are exemplary only and that other implementations can be obtained from the extensions of the drawings provided without inventive effort.

The structures, proportions, sizes, etc. shown in the present specification are shown only for the purposes of illustration and description, and are not intended to limit the scope of the invention, which is defined by the claims, so that any structural modifications, changes in proportions, or adjustments of sizes, which do not affect the efficacy or the achievement of the present invention, should fall within the ambit of the technical disclosure.

FIG. 1 is a flow chart of a method for video summarization of 5G rich media information according to the present invention;

FIG. 2 is a first architecture diagram of a video summarization system for 5G rich media information according to the present invention;

FIG. 3 is a second architecture diagram of the video summarization system of the present invention for 5G rich media information;

FIG. 4 is a flowchart of the method for generating the abstract of the 5G rich media video picture;

fig. 5 is a schematic diagram of an entity structure of an electronic device according to the present invention.

Wherein the reference numerals are as follows:

the system comprises an acquisition module 10, a construction module 20, a training module 30, a video abstraction model 40, a voice-to-text model 50, a sensitive word variant recognition model 60, a feature extraction model 70, a bad picture classification model 80, an electronic device 90, a processor 901, a memory 902 and a bus 903.

Detailed Description

Other advantages and advantages of the present invention will become apparent to those skilled in the art from the following detailed description, which, by way of illustration, is to be read in connection with certain specific embodiments, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Examples

Fig. 1 is a flowchart of an embodiment of a video summarization method for 5G rich media information according to the present invention, and as shown in fig. 1, the video summarization method for 5G rich media information according to the embodiment of the present invention includes the following steps:

s101, acquiring a video message information set Y in a 5G rich media message, and constructing a training set based on the video message information set Y;

specifically, a 5G rich media message is obtained, and because the maximum capacity of a single 5G rich media message is 3M, a plurality of text messages, a plurality of pictures and a plurality of video/audio segments can be contained in one 5G rich media message. A 5G rich media message may be denoted as T _xyz T may comprise a plurality of sets of text message information X (X ₁ ,x ₂ ,..), a plurality of video message information sets Y (Y ₁ ,y ₂ ,..) and a plurality of tile message information sets Z (Z) ₁ ,z ₂ ,...)。

The video information set Y contains video content and audio content, and the video and audio information needs to be checked simultaneously in the security compliance check, and the video information set Y (Y ₁ ,y ₂ ,..) into a set of video text Y and a set of video sample pictures Y', and constructs a training set based on the set of video message information Y.

S102, constructing a video abstraction model, wherein the video abstraction model comprises a time decoder, a perceptron and a transducer module which are connected in sequence;

specifically, timing processing is performed on the videos in the video message information set Y based on the time decoder;

From the segmentation orderColumn setAnd randomly extracting n pictures from each segmentation sequence to form a video sampling picture set y' of the video to be identified.

S103, training the video abstract model through a training set to obtain a trained video abstract model;

specifically, the video message information set Y is divided into a training set, a testing set and a verification set;

training the video summary model 40 based on the training set;

performing performance verification on the video summary model 40 based on the verification set, and storing an improved CTC model meeting performance conditions;

the recognition effect of the video summary model 40 is evaluated based on the test set.

S104, inputting the video to be identified into the trained video abstract model to obtain a video sampling picture set y' of the video to be identified.

The video abstraction method of the 5G rich media information further comprises the following steps:

constructing a voice-to-text model 50;

converting the set of video message information Y into a set of video text Y by the speech-to-text model 50;

preferably, the voice-to-text model 50 is a CTC model, and the maximum entropy function is introduced on the basis of the CTC model to improve the original loss function of CTC in the CTC model; training the improved CTC model through the training set to obtain a trained improved CTC model; and converting the video message information set Y into a video text set Y through the trained improved CTC model.

Improving the original loss function of the CTC through a formula 1;

in the method, in the process of the invention,for the loss function of the improved CTC model, < >>For CTC original loss function, α is the coefficient regularized by maximum conditional entropy, H (p (pi|l, X)) is the entropy of the feasible path for the given input sequence and target sequence.

Solving for H (p (pi|l, X)) by equation 2;

where (p (pi|l, X)) represents the conditional probability of a certain feasible path pi given 5G speech information X and the true output I;

logp (pi|X) represents the logarithm of the conditional probability of pi for a certain feasible path given 5G speech information X;representing the sum of all output conditional probabilities of the 5G speech information X, whether or not the true output I is given.

The loss function used in the invention can select L1Loss, MSEloss, crossEntropyLoss and the like, and has no great difference on the final effect of improving the CTC model.

acquiring a text message in the 5G rich media message; constructing a text message information set X based on the text message and the video text set y;

constructing a sensitive word variant recognition model 60; preferably, the sensitive word variant recognition model 60 is a Text CNN model, and the poor short Text recognition method based on the Text CNN model is already used in short message Text examination.

The sensitive word variant recognition model 60 used in the invention can be replaced by models such as CRNN, LSTM+CTC and the like besides the Text CNN model, and the recognition effect is not greatly different.

Firstly, the 5G rich media message to be processed needs to be subjected to pretreatment such as digital character standardization, english character standardization, complex body to simplified body conversion, special meaning symbol processing, symbol noise removal, unified continuous digital payment representation, character string segmentation and the like.

And secondly, vectorizing the short text through word2vec, carrying out high-dimensional convolution and extension on the text vector in a convolution layer, carrying out vector activation on the sensitive vocabulary by using a pooling layer and a full connection layer, and calculating the hit probability of the sensitive vocabulary through a softMax function. The SoftMax function expression chosen here is as follows:

where x represents a word vector.

Finally, the text messages in the text message information set X are sequentially input into the sensitive word variant recognition model 60 to determine whether all the text messages in the text message information set X are all compliant. If the text message compliance is judged to be non-compliance, the method is converted into manual judgment or early warning. And if the text message is judged to be compliant, entering a subsequent judging process.

Constructing a bad picture classification model 80 and a feature extraction model 70;

extracting features of the picture message information set Z to obtain a picture depth feature set Z; preferably, the original image feature extraction method used in the invention is LBP, HOG, SIFT, other similar feature extraction algorithms can be used for substitution, and the substitution effect does not greatly affect the effect of the final bad image classification model 80.

And sequentially inputting the pictures in the picture depth feature set z into the bad picture classification model 80 to judge whether all the pictures in the picture depth feature set z are all compliant. If a picture or some characteristic information in the picture is judged to be non-compliant, the picture is judged to be non-compliant. If the picture or any characteristic information in the picture is judged to be compliant, the picture is judged to be compliant.

The video abstraction method of the 5G rich media information obtains a video message information set Y in the 5G rich media information, and constructs a training set based on the video message information set Y; constructing a video abstraction model 40, wherein the video abstraction model 40 comprises a time decoder, a perceptron and a transducer module which are connected in sequence; training the video abstract model 40 through the training set to obtain a trained video abstract model 40; inputting the video to be identified into the trained video abstraction model 40 to obtain a video sampling picture set y' of the video to be identified. The method solves the problems that in the prior art, the video content identification difficulty is high, the identification time is long, and the method is not suitable for a high-concurrency short message sending scene.

FIGS. 2-3 are flowcharts of a video summarization system for 5G rich media information according to embodiments of the present invention; as shown in fig. 2-3, the video summary system of 5G rich media information provided by the embodiment of the invention includes the following steps:

the acquisition module 10 is configured to acquire a video message information set Y in the 5G rich media message, and construct a training set based on the video message information set Y;

the building module 20 is configured to build a video summary model 40, where the video summary model 40 includes a time decoder, a perceptron, and a transducer module that are sequentially connected;

performing time sequence processing on videos in the video message information set Y based on the time decoder;

vector each segmented sequence by a transducer moduleAnalyzing to obtain a sequence feature set R of each segmented sequence, calculating the tolerance rate between every two sequence features in the sequence feature set R, and obtaining the segmented sequence set with the maximum tolerance rate based on the tolerance rate

The training module 30 is configured to train the video abstract model 40 through the training set to obtain a trained video abstract model 40;

inputting the video to be identified into the trained video abstraction model 40 to obtain a video sampling picture set y' of the video to be identified.

The acquisition module 10 is further configured to:

acquiring a text message in the 5G rich media message;

the feature extraction model 70 performs feature extraction on the picture message information set Z based on the feature extraction model 70 to obtain a picture depth feature set Z;

the poor picture classification model 80 inputs pictures in the picture depth feature set z into the poor picture classification model 80 in sequence to judge whether all the pictures in the picture depth feature set z are all compliant;

a voice-to-text model 50, converting the video message information set Y into a video text set Y by the voice-to-text model 50;

the sensitive word variant recognition model 60 inputs the text messages in the text message information set X into the sensitive word variant recognition model 60 in sequence to judge whether all the text messages in the text message information set X are in compliance;

According to the video abstraction system of the 5G rich media information, a video message information set Y in the 5G rich media information is acquired through an acquisition module 10, and a training set is constructed based on the video message information set Y; constructing a video abstraction model 40 by a construction module 20, wherein the video abstraction model 40 comprises a time decoder, a perceptron and a transducer module which are connected in sequence; training the video abstract model 40 by the training module 30 through the training set to obtain a trained video abstract model 40; inputting the video to be identified into the trained video abstraction model 40 to obtain a video sampling picture set y' of the video to be identified. The video abstraction method of the 5G rich media information solves the problems that in the prior art, the video content identification difficulty is large, the identification time is long, and the method is not suitable for a high-concurrency short message sending scene.

Fig. 5 is a schematic diagram of an entity structure of an electronic device according to an embodiment of the present invention, as shown in fig. 5, an electronic device 90 includes: a processor 901 (processor), a memory 902 (memory), and a bus 903;

the processor 901 and the memory 902 complete communication with each other through the bus 903;

the processor 901 is configured to call program instructions in the memory 902 to perform the methods provided in the above method embodiments, for example, including: acquiring a video message information set Y in a 5G rich media message, and constructing a training set based on the video message information set Y; constructing a video abstraction model 40, wherein the video abstraction model 40 comprises a time decoder, a perceptron and a transducer module which are connected in sequence; training the video abstract model 40 through the training set to obtain a trained video abstract model 40; inputting the video to be identified into the trained video abstraction model 40 to obtain a video sampling picture set y' of the video to be identified.

The present embodiment provides a non-transitory computer readable medium storing computer instructions that cause a computer to perform the methods provided by the above-described method embodiments, for example, including: acquiring a video message information set Y in a 5G rich media message, and constructing a training set based on the video message information set Y; constructing a video abstraction model 40, wherein the video abstraction model 40 comprises a time decoder, a perceptron and a transducer module which are connected in sequence; training the video abstract model 40 through the training set to obtain a trained video abstract model 40; inputting the video to be identified into the trained video abstraction model 40 to obtain a video sampling picture set y' of the video to be identified.

Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the above method embodiments may be implemented by hardware associated with program instructions, where the foregoing program may be stored in a computer readable medium, and when executed, the program performs steps including the above method embodiments; and the aforementioned medium includes: various media that can store program code, such as ROM, RAM, magnetic or optical disks.

The apparatus embodiments described above are merely illustrative, wherein elements illustrated as separate elements may or may not be physically separate, and elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable medium such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the embodiments or the methods of some parts of the embodiments.

While the invention has been described in detail in the foregoing general description and specific examples, it will be apparent to those skilled in the art that modifications and improvements can be made thereto. Accordingly, such modifications or improvements may be made without departing from the spirit of the invention and are intended to be within the scope of the invention as claimed.

Claims

1. The video abstraction method for the 5G rich media information is characterized by comprising the following steps:

From the set of partitioned sequencesRandomly extracting n pictures from each segmentation sequence to form a video sampling picture set y' of the video to be identified;

2. The method for video summarization of 5G rich media information according to claim 1, wherein the method for video summarization of 5G rich media information further comprises:

constructing a feature extraction model and a bad picture classification model;

3. The method for video summarization of 5G rich media information according to claim 1, wherein the method for video summarization of 5G rich media information further comprises:

constructing a voice-to-text model;

4. The method for video summarization of 5G rich media information according to claim 3, wherein the method for video summarization of 5G rich media information further comprises:

acquiring a text message in the 5G rich media message;

constructing a sensitive word variant recognition model;

5. The method for video summarization of 5G rich media information according to claim 4, wherein the method for video summarization of 5G rich media information further comprises:

6. The method for video summarization of 5G rich media information according to claim 1, wherein the training the video summarization model by the training set to obtain a trained video summarization model comprises:

training the video summary model based on the training set;

performing performance verification on the video abstract model based on the verification set, and storing the video abstract model meeting performance conditions;

7. A video summarization system for 5G rich media information, comprising:

the video summary model is used for:

8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any one of claims 1 to 6 when the computer program is executed.

9. A non-transitory computer readable medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method according to any of claims 1 to 6.