CN116453023B - Video abstraction system, method, electronic equipment and medium for 5G rich media information - Google Patents

Video abstraction system, method, electronic equipment and medium for 5G rich media information Download PDF

Info

Publication number
CN116453023B
CN116453023B CN202310437286.9A CN202310437286A CN116453023B CN 116453023 B CN116453023 B CN 116453023B CN 202310437286 A CN202310437286 A CN 202310437286A CN 116453023 B CN116453023 B CN 116453023B
Authority
CN
China
Prior art keywords
video
model
rich media
message
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310437286.9A
Other languages
Chinese (zh)
Other versions
CN116453023A (en
Inventor
沈浩
黄海量
吴东进
韩松乔
吴优
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Zhixun Information Technology Co ltd
Original Assignee
Shanghai Zhixun Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Zhixun Information Technology Co ltd filed Critical Shanghai Zhixun Information Technology Co ltd
Priority to CN202310437286.9A priority Critical patent/CN116453023B/en
Publication of CN116453023A publication Critical patent/CN116453023A/en
Application granted granted Critical
Publication of CN116453023B publication Critical patent/CN116453023B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/49Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/776Validation; Performance evaluation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/778Active pattern-learning, e.g. online learning of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • G06V20/47Detecting features for summarising video content

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the invention discloses a video abstraction system, a method, electronic equipment and a medium of 5G rich media information, wherein the video abstraction method of the 5G rich media information comprises the following steps: acquiring a video message information set Y in a 5G rich media message, and constructing a training set based on the video message information set Y; constructing a video abstract model, wherein the video abstract model comprises a time decoder, a perceptron and a transducer module which are sequentially connected; training the video abstract model through the training set to obtain a trained video abstract model; and inputting the video to be identified into the trained video abstract model to obtain a video sampling picture set y' of the video to be identified. The video abstraction method of the 5G rich media information solves the problems that in the prior art, the video content identification difficulty is large, the identification time is long, and the method is not suitable for a high-concurrency short message sending scene.

Description

Video abstraction system, method, electronic equipment and medium for 5G rich media information
Technical Field
The invention relates to the technical field of computers, in particular to a video abstraction system, a video abstraction method, electronic equipment and a video abstraction medium for 5G rich media information.
Background
The 5G rich media message is a great leap of the communication capability of the short message industry, and compared with the traditional text short message, the 5G rich media message has more supported media formats and richer expression forms, can send rich media information such as long text, pictures, voice, video and the like, and also comprises the user interaction and feedback capability such as public numbers, applets and the like, so that the application scene, the content quality and the application range of the 5G rich media message are greatly improved.
A 5G rich media message comprises a plurality of text message information sets X (X 1 ,x 2 ,..) and a plurality of video message information sets Y (Y 1 ,y 2 ,..) and a plurality of picture message information sets Z (Z) 1 ,z 2 ,..), the video message information contains large video content, but the video content is difficult to identify, the identification time is long, and the video message information is not suitable for a high-concurrency short message sending scene.
Therefore, a method for identifying a video picture summary of a 5G rich media message suitable for a high concurrency short message transmission scene is needed.
Disclosure of Invention
The embodiment of the invention aims to provide a video abstraction system, a method, electronic equipment and a medium for 5G rich media information, which are used for solving the problems that in the prior art, the video content identification difficulty is high, the identification time is long and the video abstraction system is not suitable for a high-concurrency short message sending scene.
In order to achieve the above objective, an embodiment of the present invention provides a method for video abstraction of 5G rich media information, which specifically includes:
acquiring a video message information set Y in a 5G rich media message, and constructing a training set based on the video message information set Y;
constructing a video abstract model, wherein the video abstract model comprises a time decoder, a perceptron and a transducer module which are sequentially connected;
training the video abstract model through the training set to obtain a trained video abstract model;
and inputting the video to be identified into the trained video abstract model to obtain a video sampling picture set y' of the video to be identified.
Based on the technical scheme, the invention can also be improved as follows:
further, the constructing a video summary model, where the video summary model includes a time decoder, a perceptron, and a transducer module connected in sequence, includes:
performing time sequence processing on the video message information set Y based on the time decoder;
generating a corresponding segmentation sequence based on the two layers of perceptrons;
vectorization analysis is carried out on each segmented sequence through a transducer module to obtain a sequence feature set R of each segmented sequence, tolerance rate between every two sequence features in the sequence feature set R is calculated, and a segmented sequence set with the maximum tolerance rate is obtained based on the tolerance rate
From the set of partitioned sequencesAnd randomly extracting n pictures from each segmentation sequence to form a video sampling picture set y' of the video to be identified.
Further, the video abstraction method of the 5G rich media information further comprises the following steps:
acquiring a picture message in a 5G rich media message, and constructing a picture message information set Z based on the picture message and the video sampling picture set y';
constructing a feature extraction model and a bad picture classification model;
performing feature extraction on the picture message information set Z based on the feature extraction model to obtain a picture depth feature set Z;
and sequentially inputting the pictures in the picture depth feature set z into the bad picture classification model to judge whether all the pictures in the picture depth feature set z are all compliant.
Further, the video abstraction method of the 5G rich media information further comprises the following steps:
constructing a voice-to-text model;
and converting the video message information set Y into a video text set Y through the voice-to-text model.
Further, the video abstraction method of the 5G rich media information further comprises the following steps:
acquiring a text message in the 5G rich media message;
constructing a text message information set X based on the text message and the video text set y;
constructing a sensitive word variant recognition model;
and sequentially inputting the text messages in the text message information set X into the sensitive word variant recognition model to judge whether all the text messages in the text message information set X are all compliant.
Further, the video abstraction method of the 5G rich media information further comprises the following steps:
and when all the text messages in the text message information set X are in compliance and all the pictures in the picture depth feature set z are in compliance, judging that the 5G rich media message can be normally sent.
Further, the training the video abstract model through the training set to obtain a trained video abstract model includes:
dividing the video message information set Y into a training set, a testing set and a verification set;
training the video summary model based on the training set;
performing performance verification on the video abstract model based on the verification set, and storing an improved CTC model meeting performance conditions;
and evaluating the identification effect of the video abstract model based on the test set.
A video summarization system for 5G rich media information, comprising:
the acquisition module is used for acquiring a video message information set Y in the 5G rich media message and constructing a training set based on the video message information set Y;
the construction module is used for constructing a video abstraction model, wherein the video abstraction model comprises a time decoder, a perceptron and a transducer module which are connected in sequence;
the training module is used for training the video abstract model through the training set to obtain a trained video abstract model;
and inputting the video to be identified into the trained video abstract model to obtain a video sampling picture set y' of the video to be identified.
An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method when the computer program is executed.
A non-transitory computer readable medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method.
The embodiment of the invention has the following advantages:
according to the video abstraction method of the 5G rich media information, a video message information set Y in the 5G rich media information is obtained, and a training set is constructed based on the video message information set Y; constructing a video abstract model, wherein the video abstract model comprises a time decoder, a perceptron and a transducer module which are sequentially connected; training the video abstract model through the training set to obtain a trained video abstract model; the video to be identified is input into the trained video abstract model to obtain a video sampling picture set y' of the video to be identified, so that the problems that in the prior art, the video content identification difficulty is high, the identification time is long, and the method is not suitable for a high-concurrency short message sending scene are solved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It will be apparent to those of ordinary skill in the art that the drawings in the following description are exemplary only and that other implementations can be obtained from the extensions of the drawings provided without inventive effort.
The structures, proportions, sizes, etc. shown in the present specification are shown only for the purposes of illustration and description, and are not intended to limit the scope of the invention, which is defined by the claims, so that any structural modifications, changes in proportions, or adjustments of sizes, which do not affect the efficacy or the achievement of the present invention, should fall within the ambit of the technical disclosure.
FIG. 1 is a flow chart of a method for video summarization of 5G rich media information according to the present invention;
FIG. 2 is a first architecture diagram of a video summarization system for 5G rich media information according to the present invention;
FIG. 3 is a second architecture diagram of the video summarization system of the present invention for 5G rich media information;
FIG. 4 is a flowchart of the method for generating the abstract of the 5G rich media video picture;
fig. 5 is a schematic diagram of an entity structure of an electronic device according to the present invention.
Wherein the reference numerals are as follows:
the system comprises an acquisition module 10, a construction module 20, a training module 30, a video abstraction model 40, a voice-to-text model 50, a sensitive word variant recognition model 60, a feature extraction model 70, a bad picture classification model 80, an electronic device 90, a processor 901, a memory 902 and a bus 903.
Detailed Description
Other advantages and advantages of the present invention will become apparent to those skilled in the art from the following detailed description, which, by way of illustration, is to be read in connection with certain specific embodiments, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Examples
Fig. 1 is a flowchart of an embodiment of a video summarization method for 5G rich media information according to the present invention, and as shown in fig. 1, the video summarization method for 5G rich media information according to the embodiment of the present invention includes the following steps:
s101, acquiring a video message information set Y in a 5G rich media message, and constructing a training set based on the video message information set Y;
specifically, a 5G rich media message is obtained, and because the maximum capacity of a single 5G rich media message is 3M, a plurality of text messages, a plurality of pictures and a plurality of video/audio segments can be contained in one 5G rich media message. A 5G rich media message may be denoted as T xyz T may comprise a plurality of sets of text message information X (X 1 ,x 2 ,..), a plurality of video message information sets Y (Y 1 ,y 2 ,..) and a plurality of tile message information sets Z (Z) 1 ,z 2 ,...)。
The video information set Y contains video content and audio content, and the video and audio information needs to be checked simultaneously in the security compliance check, and the video information set Y (Y 1 ,y 2 ,..) into a set of video text Y and a set of video sample pictures Y', and constructs a training set based on the set of video message information Y.
S102, constructing a video abstraction model, wherein the video abstraction model comprises a time decoder, a perceptron and a transducer module which are connected in sequence;
specifically, timing processing is performed on the videos in the video message information set Y based on the time decoder;
generating a corresponding segmentation sequence based on the two layers of perceptrons;
vectorization analysis is carried out on each segmented sequence through a transducer module to obtain a sequence feature set R of each segmented sequence, tolerance rate between every two sequence features in the sequence feature set R is calculated, and a segmented sequence set with the maximum tolerance rate is obtained based on the tolerance rate
From the segmentation orderColumn setAnd randomly extracting n pictures from each segmentation sequence to form a video sampling picture set y' of the video to be identified.
S103, training the video abstract model through a training set to obtain a trained video abstract model;
specifically, the video message information set Y is divided into a training set, a testing set and a verification set;
dividing the video message information set Y into a training set, a testing set and a verification set;
training the video summary model 40 based on the training set;
performing performance verification on the video summary model 40 based on the verification set, and storing an improved CTC model meeting performance conditions;
the recognition effect of the video summary model 40 is evaluated based on the test set.
S104, inputting the video to be identified into the trained video abstract model to obtain a video sampling picture set y' of the video to be identified.
The video abstraction method of the 5G rich media information further comprises the following steps:
constructing a voice-to-text model 50;
converting the set of video message information Y into a set of video text Y by the speech-to-text model 50;
preferably, the voice-to-text model 50 is a CTC model, and the maximum entropy function is introduced on the basis of the CTC model to improve the original loss function of CTC in the CTC model; training the improved CTC model through the training set to obtain a trained improved CTC model; and converting the video message information set Y into a video text set Y through the trained improved CTC model.
Improving the original loss function of the CTC through a formula 1;
in the method, in the process of the invention,for the loss function of the improved CTC model, < >>For CTC original loss function, α is the coefficient regularized by maximum conditional entropy, H (p (pi|l, X)) is the entropy of the feasible path for the given input sequence and target sequence.
Solving for H (p (pi|l, X)) by equation 2;
where (p (pi|l, X)) represents the conditional probability of a certain feasible path pi given 5G speech information X and the true output I;
logp (pi|X) represents the logarithm of the conditional probability of pi for a certain feasible path given 5G speech information X;representing the sum of all output conditional probabilities of the 5G speech information X, whether or not the true output I is given.
The loss function used in the invention can select L1Loss, MSEloss, crossEntropyLoss and the like, and has no great difference on the final effect of improving the CTC model.
Acquiring a picture message in a 5G rich media message, and constructing a picture message information set Z based on the picture message and the video sampling picture set y';
acquiring a text message in the 5G rich media message; constructing a text message information set X based on the text message and the video text set y;
constructing a sensitive word variant recognition model 60; preferably, the sensitive word variant recognition model 60 is a Text CNN model, and the poor short Text recognition method based on the Text CNN model is already used in short message Text examination.
The sensitive word variant recognition model 60 used in the invention can be replaced by models such as CRNN, LSTM+CTC and the like besides the Text CNN model, and the recognition effect is not greatly different.
Firstly, the 5G rich media message to be processed needs to be subjected to pretreatment such as digital character standardization, english character standardization, complex body to simplified body conversion, special meaning symbol processing, symbol noise removal, unified continuous digital payment representation, character string segmentation and the like.
And secondly, vectorizing the short text through word2vec, carrying out high-dimensional convolution and extension on the text vector in a convolution layer, carrying out vector activation on the sensitive vocabulary by using a pooling layer and a full connection layer, and calculating the hit probability of the sensitive vocabulary through a softMax function. The SoftMax function expression chosen here is as follows:
where x represents a word vector.
Finally, the text messages in the text message information set X are sequentially input into the sensitive word variant recognition model 60 to determine whether all the text messages in the text message information set X are all compliant. If the text message compliance is judged to be non-compliance, the method is converted into manual judgment or early warning. And if the text message is judged to be compliant, entering a subsequent judging process.
Constructing a bad picture classification model 80 and a feature extraction model 70;
extracting features of the picture message information set Z to obtain a picture depth feature set Z; preferably, the original image feature extraction method used in the invention is LBP, HOG, SIFT, other similar feature extraction algorithms can be used for substitution, and the substitution effect does not greatly affect the effect of the final bad image classification model 80.
And sequentially inputting the pictures in the picture depth feature set z into the bad picture classification model 80 to judge whether all the pictures in the picture depth feature set z are all compliant. If a picture or some characteristic information in the picture is judged to be non-compliant, the picture is judged to be non-compliant. If the picture or any characteristic information in the picture is judged to be compliant, the picture is judged to be compliant.
And when all the text messages in the text message information set X are in compliance and all the pictures in the picture depth feature set z are in compliance, judging that the 5G rich media message can be normally sent.
The video abstraction method of the 5G rich media information obtains a video message information set Y in the 5G rich media information, and constructs a training set based on the video message information set Y; constructing a video abstraction model 40, wherein the video abstraction model 40 comprises a time decoder, a perceptron and a transducer module which are connected in sequence; training the video abstract model 40 through the training set to obtain a trained video abstract model 40; inputting the video to be identified into the trained video abstraction model 40 to obtain a video sampling picture set y' of the video to be identified. The method solves the problems that in the prior art, the video content identification difficulty is high, the identification time is long, and the method is not suitable for a high-concurrency short message sending scene.
FIGS. 2-3 are flowcharts of a video summarization system for 5G rich media information according to embodiments of the present invention; as shown in fig. 2-3, the video summary system of 5G rich media information provided by the embodiment of the invention includes the following steps:
the acquisition module 10 is configured to acquire a video message information set Y in the 5G rich media message, and construct a training set based on the video message information set Y;
the building module 20 is configured to build a video summary model 40, where the video summary model 40 includes a time decoder, a perceptron, and a transducer module that are sequentially connected;
performing time sequence processing on videos in the video message information set Y based on the time decoder;
generating a corresponding segmentation sequence based on the two layers of perceptrons;
vector each segmented sequence by a transducer moduleAnalyzing to obtain a sequence feature set R of each segmented sequence, calculating the tolerance rate between every two sequence features in the sequence feature set R, and obtaining the segmented sequence set with the maximum tolerance rate based on the tolerance rate
From the set of partitioned sequencesAnd randomly extracting n pictures from each segmentation sequence to form a video sampling picture set y' of the video to be identified.
The training module 30 is configured to train the video abstract model 40 through the training set to obtain a trained video abstract model 40;
inputting the video to be identified into the trained video abstraction model 40 to obtain a video sampling picture set y' of the video to be identified.
The acquisition module 10 is further configured to:
acquiring a picture message in a 5G rich media message, and constructing a picture message information set Z based on the picture message and the video sampling picture set y';
acquiring a text message in the 5G rich media message;
constructing a text message information set X based on the text message and the video text set y;
the feature extraction model 70 performs feature extraction on the picture message information set Z based on the feature extraction model 70 to obtain a picture depth feature set Z;
the poor picture classification model 80 inputs pictures in the picture depth feature set z into the poor picture classification model 80 in sequence to judge whether all the pictures in the picture depth feature set z are all compliant;
a voice-to-text model 50, converting the video message information set Y into a video text set Y by the voice-to-text model 50;
the sensitive word variant recognition model 60 inputs the text messages in the text message information set X into the sensitive word variant recognition model 60 in sequence to judge whether all the text messages in the text message information set X are in compliance;
and when all the text messages in the text message information set X are in compliance and all the pictures in the picture depth feature set z are in compliance, judging that the 5G rich media message can be normally sent.
According to the video abstraction system of the 5G rich media information, a video message information set Y in the 5G rich media information is acquired through an acquisition module 10, and a training set is constructed based on the video message information set Y; constructing a video abstraction model 40 by a construction module 20, wherein the video abstraction model 40 comprises a time decoder, a perceptron and a transducer module which are connected in sequence; training the video abstract model 40 by the training module 30 through the training set to obtain a trained video abstract model 40; inputting the video to be identified into the trained video abstraction model 40 to obtain a video sampling picture set y' of the video to be identified. The video abstraction method of the 5G rich media information solves the problems that in the prior art, the video content identification difficulty is large, the identification time is long, and the method is not suitable for a high-concurrency short message sending scene.
Fig. 5 is a schematic diagram of an entity structure of an electronic device according to an embodiment of the present invention, as shown in fig. 5, an electronic device 90 includes: a processor 901 (processor), a memory 902 (memory), and a bus 903;
the processor 901 and the memory 902 complete communication with each other through the bus 903;
the processor 901 is configured to call program instructions in the memory 902 to perform the methods provided in the above method embodiments, for example, including: acquiring a video message information set Y in a 5G rich media message, and constructing a training set based on the video message information set Y; constructing a video abstraction model 40, wherein the video abstraction model 40 comprises a time decoder, a perceptron and a transducer module which are connected in sequence; training the video abstract model 40 through the training set to obtain a trained video abstract model 40; inputting the video to be identified into the trained video abstraction model 40 to obtain a video sampling picture set y' of the video to be identified.
The present embodiment provides a non-transitory computer readable medium storing computer instructions that cause a computer to perform the methods provided by the above-described method embodiments, for example, including: acquiring a video message information set Y in a 5G rich media message, and constructing a training set based on the video message information set Y; constructing a video abstraction model 40, wherein the video abstraction model 40 comprises a time decoder, a perceptron and a transducer module which are connected in sequence; training the video abstract model 40 through the training set to obtain a trained video abstract model 40; inputting the video to be identified into the trained video abstraction model 40 to obtain a video sampling picture set y' of the video to be identified.
Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the above method embodiments may be implemented by hardware associated with program instructions, where the foregoing program may be stored in a computer readable medium, and when executed, the program performs steps including the above method embodiments; and the aforementioned medium includes: various media that can store program code, such as ROM, RAM, magnetic or optical disks.
The apparatus embodiments described above are merely illustrative, wherein elements illustrated as separate elements may or may not be physically separate, and elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable medium such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the embodiments or the methods of some parts of the embodiments.
While the invention has been described in detail in the foregoing general description and specific examples, it will be apparent to those skilled in the art that modifications and improvements can be made thereto. Accordingly, such modifications or improvements may be made without departing from the spirit of the invention and are intended to be within the scope of the invention as claimed.

Claims (9)

1. The video abstraction method for the 5G rich media information is characterized by comprising the following steps:
acquiring a video message information set Y in a 5G rich media message, and constructing a training set based on the video message information set Y;
constructing a video abstract model, wherein the video abstract model comprises a time decoder, a perceptron and a transducer module which are sequentially connected;
performing time sequence processing on videos in the video message information set Y based on the time decoder;
generating a corresponding segmentation sequence based on the two layers of perceptrons;
vectorization analysis is carried out on each segmented sequence through a transducer module to obtain a sequence feature set R of each segmented sequence, tolerance rate between every two sequence features in the sequence feature set R is calculated, and a segmented sequence set with the maximum tolerance rate is obtained based on the tolerance rate
From the set of partitioned sequencesRandomly extracting n pictures from each segmentation sequence to form a video sampling picture set y' of the video to be identified;
training the video abstract model through the training set to obtain a trained video abstract model;
and inputting the video to be identified into the trained video abstract model to obtain a video sampling picture set y' of the video to be identified.
2. The method for video summarization of 5G rich media information according to claim 1, wherein the method for video summarization of 5G rich media information further comprises:
acquiring a picture message in a 5G rich media message, and constructing a picture message information set Z based on the picture message and the video sampling picture set y';
constructing a feature extraction model and a bad picture classification model;
performing feature extraction on the picture message information set Z based on the feature extraction model to obtain a picture depth feature set Z;
and sequentially inputting the pictures in the picture depth feature set z into the bad picture classification model to judge whether all the pictures in the picture depth feature set z are all compliant.
3. The method for video summarization of 5G rich media information according to claim 1, wherein the method for video summarization of 5G rich media information further comprises:
constructing a voice-to-text model;
and converting the video message information set Y into a video text set Y through the voice-to-text model.
4. The method for video summarization of 5G rich media information according to claim 3, wherein the method for video summarization of 5G rich media information further comprises:
acquiring a text message in the 5G rich media message;
constructing a text message information set X based on the text message and the video text set y;
constructing a sensitive word variant recognition model;
and sequentially inputting the text messages in the text message information set X into the sensitive word variant recognition model to judge whether all the text messages in the text message information set X are all compliant.
5. The method for video summarization of 5G rich media information according to claim 4, wherein the method for video summarization of 5G rich media information further comprises:
and when all the text messages in the text message information set X are in compliance and all the pictures in the picture depth feature set z are in compliance, judging that the 5G rich media message can be normally sent.
6. The method for video summarization of 5G rich media information according to claim 1, wherein the training the video summarization model by the training set to obtain a trained video summarization model comprises:
dividing the video message information set Y into a training set, a testing set and a verification set;
training the video summary model based on the training set;
performing performance verification on the video abstract model based on the verification set, and storing the video abstract model meeting performance conditions;
and evaluating the identification effect of the video abstract model based on the test set.
7. A video summarization system for 5G rich media information, comprising:
the acquisition module is used for acquiring a video message information set Y in the 5G rich media message and constructing a training set based on the video message information set Y;
the construction module is used for constructing a video abstraction model, wherein the video abstraction model comprises a time decoder, a perceptron and a transducer module which are connected in sequence;
the video summary model is used for:
performing time sequence processing on videos in the video message information set Y based on the time decoder;
generating a corresponding segmentation sequence based on the two layers of perceptrons;
vectorization analysis is carried out on each segmented sequence through a transducer module to obtain a sequence feature set R of each segmented sequence, tolerance rate between every two sequence features in the sequence feature set R is calculated, and a segmented sequence set with the maximum tolerance rate is obtained based on the tolerance rate
From the set of partitioned sequencesRandomly extracting n pictures from each segmentation sequence to form a video sampling picture set y' of the video to be identified;
the training module is used for training the video abstract model through the training set to obtain a trained video abstract model;
and inputting the video to be identified into the trained video abstract model to obtain a video sampling picture set y' of the video to be identified.
8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any one of claims 1 to 6 when the computer program is executed.
9. A non-transitory computer readable medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method according to any of claims 1 to 6.
CN202310437286.9A 2023-04-23 2023-04-23 Video abstraction system, method, electronic equipment and medium for 5G rich media information Active CN116453023B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310437286.9A CN116453023B (en) 2023-04-23 2023-04-23 Video abstraction system, method, electronic equipment and medium for 5G rich media information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310437286.9A CN116453023B (en) 2023-04-23 2023-04-23 Video abstraction system, method, electronic equipment and medium for 5G rich media information

Publications (2)

Publication Number Publication Date
CN116453023A CN116453023A (en) 2023-07-18
CN116453023B true CN116453023B (en) 2024-01-26

Family

ID=87125277

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310437286.9A Active CN116453023B (en) 2023-04-23 2023-04-23 Video abstraction system, method, electronic equipment and medium for 5G rich media information

Country Status (1)

Country Link
CN (1) CN116453023B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113887419A (en) * 2021-09-30 2022-01-04 四川大学 Human behavior identification method and system based on video temporal-spatial information extraction
CN113901200A (en) * 2021-09-28 2022-01-07 特赞(上海)信息科技有限公司 Text summarization method and device based on topic model and storage medium
WO2022104967A1 (en) * 2020-11-19 2022-05-27 深圳大学 Pre-training language model-based summarization generation method
CN114547370A (en) * 2022-02-15 2022-05-27 北京大学 Video abstract extraction method and system
CN114598933A (en) * 2022-03-16 2022-06-07 平安科技(深圳)有限公司 Video content processing method, system, terminal and storage medium
CN115953645A (en) * 2022-12-15 2023-04-11 百度在线网络技术(北京)有限公司 Model training method and device, electronic equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022104967A1 (en) * 2020-11-19 2022-05-27 深圳大学 Pre-training language model-based summarization generation method
CN113901200A (en) * 2021-09-28 2022-01-07 特赞(上海)信息科技有限公司 Text summarization method and device based on topic model and storage medium
CN113887419A (en) * 2021-09-30 2022-01-04 四川大学 Human behavior identification method and system based on video temporal-spatial information extraction
CN114547370A (en) * 2022-02-15 2022-05-27 北京大学 Video abstract extraction method and system
CN114598933A (en) * 2022-03-16 2022-06-07 平安科技(深圳)有限公司 Video content processing method, system, terminal and storage medium
CN115953645A (en) * 2022-12-15 2023-04-11 百度在线网络技术(北京)有限公司 Model training method and device, electronic equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"基于改进的双向长短期记忆网络的视频摘要生成模型";武光利 等;《计算机应用》;第41卷(第7期);全文 *
"基于空时变换网络的视频摘要生成";李群 等;《软件学报》;第33卷(第9期);全文 *

Also Published As

Publication number Publication date
CN116453023A (en) 2023-07-18

Similar Documents

Publication Publication Date Title
US11776530B2 (en) Speech model personalization via ambient context harvesting
CN106683680B (en) Speaker recognition method and device, computer equipment and computer readable medium
CN108875807B (en) Image description method based on multiple attention and multiple scales
CN112465008B (en) Voice and visual relevance enhancement method based on self-supervision course learning
WO2021114840A1 (en) Scoring method and apparatus based on semantic analysis, terminal device, and storage medium
CN111046133A (en) Question-answering method, question-answering equipment, storage medium and device based on atlas knowledge base
CN112233698B (en) Character emotion recognition method, device, terminal equipment and storage medium
CN112818861B (en) Emotion classification method and system based on multi-mode context semantic features
CN113254654B (en) Model training method, text recognition method, device, equipment and medium
CN112528637B (en) Text processing model training method, device, computer equipment and storage medium
CN107480144A (en) Possess the image natural language description generation method and device across language learning ability
CN112632244A (en) Man-machine conversation optimization method and device, computer equipment and storage medium
CN112967713A (en) Audio-visual voice recognition method, device, equipment and storage medium based on multi-modal fusion
CN111738169A (en) Handwriting formula recognition method based on end-to-end network model
CN112669215A (en) Training text image generation model, text image generation method and device
CN116524931A (en) System, method, electronic equipment and medium for converting voice of 5G rich media message into text
CN113128284A (en) Multi-mode emotion recognition method and device
CN116453023B (en) Video abstraction system, method, electronic equipment and medium for 5G rich media information
CN111914068A (en) Method for extracting knowledge points of test questions
CN115019137A (en) Method and device for predicting multi-scale double-flow attention video language event
CN114170997A (en) Pronunciation skill detection method, pronunciation skill detection device, storage medium and electronic equipment
CN114443889A (en) Audio acquisition method and device, electronic equipment and storage medium
CN114333786A (en) Speech emotion recognition method and related device, electronic equipment and storage medium
CN113870896A (en) Motion sound false judgment method and device based on time-frequency graph and convolutional neural network
CN113593525A (en) Method, device and storage medium for training accent classification model and accent classification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant