CN113673376A

CN113673376A - Bullet screen generation method and device, computer equipment and storage medium

Info

Publication number: CN113673376A
Application number: CN202110888205.8A
Authority: CN
Inventors: 翁力雳
Original assignee: Beijing QIYI Century Science and Technology Co Ltd
Current assignee: Beijing QIYI Century Science and Technology Co Ltd
Priority date: 2021-08-03
Filing date: 2021-08-03
Publication date: 2021-11-19
Anticipated expiration: 2041-08-03
Also published as: CN113673376B

Abstract

The application relates to a bullet screen generation method, a bullet screen generation device, computer equipment and a storage medium, wherein the method comprises the following steps: acquiring a current image frame and a context barrage corresponding to the current moment; the context barrage comprises an upper barrage and/or a lower barrage, the upper barrage is a barrage in an image frame corresponding to the previous moment of the current moment, and the lower barrage is a barrage in an image frame corresponding to the next moment of the current moment; and generating a new barrage of the current image frame according to the current image frame and the context barrage. The application can enrich the barrage in the video, so that the praise amount, the comment amount and the report amount of the user can be improved, namely, the interaction rate between the user and the barrage is improved, and the film watching experience is further improved.

Description

Bullet screen generation method and device, computer equipment and storage medium

Technical Field

The present application relates to the field of bullet screens, and in particular, to a bullet screen generation method and apparatus, a computer device, and a storage medium.

Background

The user can express own thoughts, emotions and the like by sending the barrage in real time in the film watching process, so that the barrage is gradually the film watching habit of the user, and the film watching experience of the user can be well improved in a good barrage environment. However, the barrage quantity of some videos on the video website is low, the richness of the barrage is low, and there are few recommendable points with the barrage, so that the interactive behaviors of users such as praise, comment and report on the barrage are also few, and therefore the barrage needs to be supplemented.

Disclosure of Invention

In order to solve the technical problem or at least partially solve the technical problem, the present application provides a bullet screen generating method, apparatus, computer device and storage medium.

In a first aspect, the present application provides a bullet screen generating method, including:

acquiring a current image frame and a context barrage corresponding to the current moment; the context barrage comprises an upper barrage and/or a lower barrage, the upper barrage is a barrage in an image frame corresponding to the previous moment of the current moment, and the lower barrage is a barrage in an image frame corresponding to the next moment of the current moment;

and generating a new barrage of the current image frame according to the current image frame and the context barrage.

In a second aspect, the present application provides a bullet screen generating device, including:

the information acquisition module is used for acquiring a current image frame and a context barrage corresponding to the current moment; the context barrage comprises an upper barrage and/or a lower barrage, the upper barrage is a barrage in an image frame corresponding to the previous moment of the current moment, and the lower barrage is a barrage in an image frame corresponding to the next moment of the current moment;

and the new bullet screen generation module is used for generating a new bullet screen of the current image frame according to the current image frame and the context bullet screen.

In a third aspect, the present application provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the above method when executing the computer program.

In a fourth aspect, the present application provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the above-described method.

According to the method and the device, the current image frame and the context barrage corresponding to the current moment are obtained firstly, then the new barrage is generated based on the current image frame and the context barrage, and then the barrage of the current image frame is supplemented by the new barrage. When the new barrage is generated, not only the current image frame but also the context barrage are considered, namely the complex dependency relationship between the barrage and the image is considered, so that the generated new barrage can conform to the actual application scene. By adopting the method provided by the application, the barrage in the video can be enriched, the praise amount, the comment amount and the report amount of the user are increased, namely, the interaction rate between the user and the barrage is increased, and the viewing experience is further improved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.

Fig. 1a is a schematic flowchart of a bullet screen generating method according to an embodiment of the present disclosure;

FIG. 1b is a schematic diagram of an image frame supplemented with a new bullet screen according to an embodiment of the present application;

fig. 1c is a schematic structural diagram of a bullet screen generation model provided in the embodiment of the present application;

fig. 2 is a schematic structural diagram of a bullet screen generating device according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

In a first aspect, a bullet screen generating method provided in an embodiment of the present application is shown in fig. 1a, and the method includes the following steps:

s110, obtaining a current image frame and a context barrage corresponding to the current moment;

the context barrage comprises an upper barrage and/or a lower barrage, the upper barrage is a barrage in an image frame corresponding to the previous moment of the current moment, and the lower barrage is a barrage in an image frame corresponding to the next moment of the current moment;

for example, in the unit of time of second, the upper bullet screen is the bullet screen in the image frame corresponding to the second immediately before the current time, and the lower bullet screen is the bullet screen in the image frame corresponding to the second immediately after the current time.

In a specific implementation, one or more bullet screens can be taken from the image frame corresponding to the previous moment as the above bullet screen, and if a plurality of bullet screens are selected, the bullet screens are connected to form a long above bullet screen. And one or more bullet curtains are taken from the image frame corresponding to the next moment as the lower bullet curtain, and if a plurality of bullet curtains are selected, the bullet curtains are connected to form a long lower bullet curtain. Of course, one barrage can be randomly selected from the image frames corresponding to the previous moment as the upper barrage, and one barrage can be randomly selected from the image frames corresponding to the next moment as the lower barrage, so that the method is simpler and more convenient.

And S120, generating a new barrage of the current image frame according to the current image frame and the context barrage.

It will be appreciated that the generation of the new barrage is not completely dependent on the content in the current image frame, but is also related to the barrages before and after the current image frame. For example, we often see some "high-energy-warning" type barrage to forecast an upcoming thriller, or a dramatic barrage tells the user a later drama in advance, or a review of a previously-occurring drama, etc. Therefore, not only the current image frame but also the context barrage is taken into consideration when generating the new barrage, that is, the complex dependency relationship between the barrage and the image is taken into consideration.

In a specific implementation, step S120 may specifically include: and inputting the current image frame and the context barrage into a preset barrage generating model to obtain a new barrage of the current image frame. Namely, a new bullet screen is generated by adopting the bullet screen generation model.

The bullet screen generation model is trained in advance, and the specific structure of the bullet screen generation model may include multiple types, where one of the structures is introduced, as shown in fig. 1 c:

the bullet screen generating model 200 includes an image encoder 210, a bullet screen encoder 220 and a bullet screen decoder 230 connected in sequence, wherein: the input of the image encoder 210 is the current image frame, and is used for encoding the current image frame to obtain first encoding information; the input of the bullet screen encoder 220 includes the context bullet screen and the first encoding information, and is configured to encode the context bullet screen and the first encoding information to obtain second encoding information; the input of the bullet screen decoder 230 includes the first encoded information and the second encoded information, and is configured to decode the first encoded information and the second encoded information to obtain the new bullet screen.

It can be understood that the image encoder 210 encodes the current image frame to obtain the first encoding information, i.e. the first encoding information is the encoding information of the current image frame. The bullet screen encoder 220 encodes the context bullet screen and the first encoded information to obtain second encoded information, which includes both the related information of the context bullet screen and the related information of the current image frame. The bullet screen decoder 230 decodes the first encoded information and the second encoded information to obtain a new bullet screen. The bullet screen generation model 200 is an encoder-decoder architecture.

In particular implementations, the image encoder 210 may have a variety of configurations, one of which is described below:

the image encoder 210 includes a convolutional network module 211, a first multi-headed attention module 212, and a first feed-forward module 213 connected in sequence, wherein: the convolution network module 211 is configured to extract an image feature of the current image frame to obtain a first vector; the first multi-head attention module 212 is configured to perform weighting processing on the first feature vector by using a multi-head attention mechanism to obtain a second vector; the first feedforward module 213 is configured to perform dimension transformation on the second vector to obtain a third vector with a preset dimension, where the third vector is the first encoded information.

The convolutional network module 211, that is, the CNN, performs feature extraction on the current image frame to obtain a first vector, where the first vector is actually a feature vector of the current image frame.

The first multi-head attention module 212, i.e., a multi-head attention module, may map the input first vector to different subspaces, and may understand the first vector from different angles, i.e., the first vector is processed by using multiple attention methods, i.e., a multi-head attention mechanism. An output vector is generated through one attention method, a plurality of output vectors are obtained through a plurality of attention methods, and then the plurality of output vectors are mapped into one vector, namely a second vector, which is seen to be image characteristic information with attention information. The multi-head attention mechanism is adopted to help capture richer information.

The weighting processing is to add different weight values to each element in the first vector, where the weight values of the respective elements in the first vector are the same, and since the importance degrees of different elements in the first vector to the whole vector are different, different weight values are set for different elements here. Weighting the first vector by an attention method to obtain an output vector, obtaining a plurality of output vectors by a plurality of attention methods, and mapping the plurality of output vectors into a vector, wherein the process is a multi-head attention mechanism process. Different attention methods focus on different aspects of the first vector, and therefore different attention methods set different weight values for each element in the first vector. One attention method focuses on the title content in the image, while the other attention method focuses on the line content in the image, and of course, other attention methods may focus on the other content, and the element corresponding to the information focused on is set with a larger weight value, while the element corresponding to the information not focused on is set with a smaller weight value.

The first feedforward module 213 is used for performing dimension conversion on the second vector, and since different image frames correspond to first vectors with different dimensions and further correspond to second vectors with different dimensions, the first feedforward module 213 is used for converting the second vectors with different dimensions into vectors with the same dimension, i.e., a third vector, so as to facilitate subsequent operations. It is understood that the dimension here refers to the length of the vector, i.e. the number of elements in the vector. That is, since different image frames may obtain first vectors with different lengths, and thus second vectors with different lengths, and the length of the third vector is fixed, the purpose of the first feed-forward module 213 is to convert the second vectors with different lengths into the third vectors with fixed lengths.

As can be seen, with the image encoder 210 of the above structure, the first encoding information in the form of a vector can be obtained.

In particular implementations, the bullet screen encoder 220 can have a variety of configurations, one of which is described below:

the bullet screen encoder 220 comprises a first vector quantization characterization module 221, a second multi-head attention module 222, a third multi-head attention module 223 and a second feedforward module 224 which are connected in sequence; wherein: the first vector quantization characterization module 221 is configured to convert the context barrage into a corresponding fourth vector, where the fourth vector is used to characterize the meaning of the context barrage; the second multi-head attention module 222 is configured to perform weighting processing on the fourth vector by using a multi-head attention mechanism to obtain a fifth vector; the third multi-head attention module 223 is configured to perform weighting processing on a vector formed by connecting the third vector and the fifth vector by using a multi-head attention mechanism to obtain a sixth vector; the second feedforward module 224 is configured to perform dimension transformation on the sixth vector to obtain a seventh vector with a preset dimension, where the seventh vector is the second encoding information.

The first vector quantization characterization module 221 is an embedding module, and is configured to convert each word in the context bullet screen into a corresponding vector, and then connect the vectors to form a fourth vector, where the vector corresponding to each word is used to characterize the meaning of the word, and thus the fourth vector may be used to characterize the meaning of the context bullet screen. If a previous bullet screen is selected as the context bullet screen, each word in the previous bullet screen is converted into a corresponding vector, and then the vectors corresponding to the words are spliced to form a fourth vector, namely the fourth vector is used for representing the meaning of the previous bullet screen. The same is true of the bullet screen below. If one upper bullet screen and one lower bullet screen are selected as context bullet screens, the upper bullet screens and the lower bullet screens are spliced to obtain spliced bullet screens, then each word in the spliced bullet screens is converted into a vector, then the vectors of all the words are spliced to form a fourth vector, and therefore the fourth vector represents the meaning of the upper bullet screen and the lower bullet screen.

For a detailed description of the second multi-headed attention module 222, reference may be made to the first multi-headed attention module 212. Through the image encoder 210, a third vector may be obtained; a fifth vector can be obtained through the second multi-head attention module 222 in the bullet screen encoder 220; the third vector and the fifth vector are spliced to form a long vector, and then the long vector is input into the third multi-head attention module 223, and a sixth vector can be obtained through the weighting processing of the multi-head attention mechanism of the third multi-head attention module 223. It can be understood that, in order to enable the bullet screen encoder 220 to use both the content of the image frame and the content of the bullet screen text, the third vector and the fifth vector are spliced and then processed.

The second feedforward module 224, similar to the first feedforward module 213, also converts the dimensionality of the vector, because the lengths of different bullet curtains are different, and after passing through the second multi-head attention module 222 and the third multi-head attention module 223, the obtained dimensionality of the sixth vector is different, and the sixth vectors with different dimensionalities are all converted into the seventh vector with the same dimensionality, so that subsequent operations are facilitated. That is, the lengths of the sixth vectors obtained for the bullet curtains with different lengths are different, and here, the second feed-forward module 224 converts the sixth vectors with different lengths into the seventh vectors with fixed lengths.

As can be seen, with the bullet screen encoder 220 of the above structure, the second encoded information in the form of a vector can be obtained.

In one implementation, the bullet screen decoder 230 may have a variety of structures, one of which is described below:

the bullet screen decoder 230 comprises a second directional quantization characterization module 231, a first mask multi-head attention module 232, a fourth multi-head attention module 233, a fifth multi-head attention module 234 and a third feed-forward module 235 which are connected in sequence; wherein: the second quantization characterization module 231 is configured to convert its input text into a corresponding eighth vector; the eighth vector is used for representing the meaning of the input text, the input text of the second quantization representing module 231 for the first time is empty, and the input text of the second quantization representing module 231 for the nth time is formed by the words output by the bullet screen decoder 230 for the first time to the N-1 st time; n is a positive integer greater than 1; the first mask multi-head attention module 232 is configured to perform mask processing and weighting processing on the eighth vector to obtain a ninth vector; the fourth multi-head attention module 233 is configured to perform weighting processing on a vector formed by connecting the ninth vector and the third vector by using a multi-head attention mechanism, so as to obtain a tenth vector; the fifth multi-head attention module 234 is configured to perform weighting processing on a vector formed by connecting the tenth vector and the seventh vector by using a multi-head attention mechanism, so as to obtain an eleventh vector; the third feedforward module 235 is configured to perform dimension transformation on the eleventh vector to obtain a twelfth vector with a preset dimension, where the twelfth vector corresponds to a word in the new bullet screen.

The second quantization characterization module 231, that is, the embedding module, is used to convert each word in the input text of the second quantization characterization module 231 into a corresponding vector, and then concatenate the vectors to obtain a corresponding eighth vector. When the bullet screen decoder 230 is used for the first decoding process, that is, the input text input to the second quantization characterization module 231 for the first time is empty, the second quantization characterization module 231 converts the empty input text into an eighth vector. After the first decoding processing of the bullet screen decoder 230, a first word is obtained, the first word is used as a second input text of the second directional quantization representation module 231, corresponding processing of the other modules is performed to obtain a second word, then the first word and the second word are combined into a third input text of the second directional quantization representation module 231, and so on until the output content of the bullet screen decoder 230 is a period. It is understood that the empty input text is actually equivalent to a word, the meaning of the word is the beginning of a sentence, each word corresponds to a vector, the empty word also corresponds to a vector, and a specific vector can be searched from a mapping database (i.e. a dictionary database) of words and vectors.

It can be seen that the bullet screen decoder 230 is an autoregressive process from the whole, that is, the bullet screen decoder 230 predicts words by words, and the currently output prediction result is used as the next input of the bullet screen decoder 230 to participate in the prediction of the next word.

It is understood that the second vector quantization characterization module 231 can convert the empty input text into the corresponding eighth vector, and the bullet screen decoder 230 can output a period, which are the result of the learning of the bullet screen generation model 200.

The first mask multi-head attention module 232, namely a masked multi-head attention module, is not only used for setting a corresponding weight for each element in the vector, but also used for performing a mask process. The mask process is performed because: the method ensures that only the output result before the ith word is relied on when the ith word is predicted, and ensures that information after the ith word, namely future information, is not contacted when the ith word is predicted. In machine translation, the decoding process is a sequential operation process, that is, when decoding the vector corresponding to the ith word, we can only see the decoding result of the (i-1) th word and the previous decoding result, so that the mask processing is performed here, thereby realizing the prediction of one word.

Wherein, through the image encoder 210, a third vector can be obtained; a ninth vector is obtained by the first masking multi-headed attention module 232 in the bullet screen decoder 230, the third vector and the ninth vector are connected to form a long vector, and the long vector is input to the fourth multi-headed attention module 233 to obtain a tenth vector. A seventh vector is obtained by the bullet screen encoder 220, a tenth vector is obtained by the fourth multi-headed attention module 233 of the bullet screen decoder 230, the tenth vector and the seventh vector are connected to form a long vector, and the long vector is input into the fifth multi-headed attention module 234 to obtain an eleventh vector. It can be appreciated that since the new barrage is generated to describe the image frames, if the new barrage is not generated based on the image frames, the generated new barrage may be far from the subject. Moreover, the meaning of the image or the character is included in the vector obtained by the encoder, and the corresponding barrage can be learned through the vector.

The third feed-forward module 235 also converts the eleventh vector with different dimensions into a vector with the same dimensions, i.e. a twelfth vector, which corresponds to a word, for example, the twelfth vector is input into the output layer softmax of the bullet screen decoder 230, and the corresponding word can be obtained.

In specific implementation, the image frames and the context barrage of the hot video may be used as training samples to train the barrage generating model 200, and the trained barrage generating model 200 is used to supplement the barrage of the cold video. The hot video refers to a video with a large play amount and a large bullet screen amount, and the cold video refers to a video with a small play amount and a small bullet screen amount.

It is understood that the operation principle of each of the feedforward modules (i.e., the first feedforward module 213, the second feedforward module 224, and the third feedforward module 235) is similar, and a vector of any dimension can be converted into an N-dimensional vector by using a fully-connected network and setting N hidden nodes in the last layer of the fully-connected network. Of course, the values of M and N may be different for different feedforward modules.

The bullet screen generation model 200 is obtained by training a training sample, and after a certain number of iterations, the loss function converges, which indicates that the training of the bullet screen generation model 200 is completed.

For example, based on an image frame and a context barrage "bamboo forest is really beautiful, xiao touting normally rains all the time in this drama, the clothes is waterproof and you say that it is not air, and the new barrage generated is" the clothes are not wet at all ".

10 cold movies are selected in a video website, the bullet screen magnitude range of each movie is about 2000-4000 (which is statistical data before an experiment), and about 500-700 bullet screen amount is generated for each movie by using a bullet screen generation model. Calling a bullet screen sending interface at the front end of a video website, simulating an online real film watching user to send a bullet screen, starting to send a generated new bullet screen from 22 days in 3 months in 2019, sending one bullet screen every 1.5min on average, and finishing the sending of the first batch in 28 days in 3 months in 2019; the second batch was sent from 3 days 4/2019, and was sent over 8 days 4/2019. The final display effect of the front end of the video website is shown in fig. 1b, and the circled barrage is a new barrage which simulates the real user to send. Before and after the new barrage is sent, the interaction rate of the barrage is obviously improved through comparison, and the difference between the average interaction rate of the 10 movies and the average interaction rate of the movie channel as a whole is obviously reduced.

According to the bullet screen generation method, the current image frame and the context bullet screen corresponding to the current moment are obtained, then the new bullet screen is generated based on the current image frame and the context bullet screen, and then the bullet screen of the current image frame is supplemented by the new bullet screen. When the new barrage is generated, not only the current image frame but also the context barrage are considered, namely the complex dependency relationship between the barrage and the image is considered, so that the generated new barrage can conform to the actual application scene. By adopting the method provided by the application, the barrage in the video can be enriched, so that the praise amount, the comment amount and the report amount of the user can be increased, namely, the interaction rate between the user and the barrage is increased, and the viewing experience is further improved. The method is particularly suitable for bullet screen supplement of long-tail videos (namely cold videos).

In a second aspect, the present application provides a bullet screen generating apparatus, as shown in fig. 2, the apparatus 100 includes the following modules:

the information obtaining module 110 is configured to obtain a current image frame and a context barrage corresponding to a current time; the context barrage comprises an upper barrage and/or a lower barrage, the upper barrage is a barrage in an image frame corresponding to the previous moment of the current moment, and the lower barrage is a barrage in an image frame corresponding to the next moment of the current moment;

a new barrage generating module 120, configured to generate a new barrage for the current image frame according to the current image frame and the context barrage.

It can be understood that, for the explanation, exemplification, and beneficial effects of the bullet screen generating device provided in the embodiment of the present application, reference may be made to corresponding parts in the first aspect, and details are not described here.

In a third aspect, an embodiment of the present application provides a computer device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the method provided in the first aspect when executing the computer program.

FIG. 3 is a diagram illustrating an internal structure of a computer device in one embodiment. As shown in fig. 3, the computer apparatus includes a processor, a memory, a network interface, an input device, a display screen, and the like, which are connected through a system bus. Wherein the memory includes a non-volatile storage medium and an internal memory. The non-volatile storage medium of the computer device stores an operating system and may also store a computer program that, when executed by the processor, causes the processor to implement the bullet screen generating method. The internal memory may also store a computer program, and the computer program, when executed by the processor, may cause the processor to execute the bullet screen generating method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.

Those skilled in the art will appreciate that the architecture shown in fig. 2 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, the bullet screen generating apparatus provided in the present application may be implemented in a form of a computer program, and the computer program may be run on a computer device as shown in fig. 3. The memory of the computer device may store various program modules constituting the capacity expansion apparatus, such as the information acquisition module 110 and the new bullet screen generation module 120 shown in fig. 2. The computer program constituted by the respective program modules causes the processor to execute the steps in the bullet screen generating method of the respective embodiments of the present application described in the present specification.

For example, the computer device shown in fig. 3 may perform, by the information obtaining module 110 in the bullet screen generating apparatus shown in fig. 2, obtaining the current image frame and the context bullet screen corresponding to the current time; the context barrage comprises an upper barrage and/or a lower barrage, the upper barrage is the barrage in the image frame corresponding to the last moment of the current moment, and the lower barrage is the barrage in the image frame corresponding to the next moment of the current moment. The computer device may execute, by the new barrage generation module 120, generating a new barrage for the current image frame from the current image frame and the context barrage.

It is understood that, for the computer device provided in the embodiments of the present application, for explanation, examples, and beneficial effects, reference may be made to corresponding parts in the first aspect, and details are not described here.

In a fourth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the method provided in the first aspect.

It is to be understood that, for the explanation, examples, and beneficial effects of the computer-readable storage medium provided in the embodiments of the present application, reference may be made to corresponding parts in the first aspect, and details are not described here.

It is to be appreciated that any reference to memory, storage, database, or other medium used in the embodiments provided herein can include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The foregoing are merely exemplary embodiments of the present invention, which enable those skilled in the art to understand or practice the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A bullet screen generation method is characterized by comprising the following steps:

2. The method of claim 1, wherein said generating a new barrage of the current image frame from the current image frame and the contextual barrage comprises:

and inputting the current image frame and the context barrage into a preset barrage generation model to obtain the new barrage.

3. The method of claim 2, wherein the bullet screen generating model comprises:

the image encoder is used for encoding the current image frame to obtain first encoding information;

the barrage encoder inputs the context barrage and the first coding information and is used for coding the context barrage and the first coding information to obtain second coding information;

and the barrage decoder is used for decoding the first coding information and the second coding information to obtain the new barrage, and the input of the barrage decoder comprises the first coding information and the second coding information.

4. The method of claim 3, wherein the image encoder comprises a convolutional network module, a first multi-headed attention module, and a first feed forward module connected in series, wherein:

the convolution network module is used for extracting the image characteristics of the current image frame to obtain a first vector;

the first multi-head attention module is used for weighting the first feature vector by adopting a multi-head attention mechanism to obtain a second vector;

the first feedforward module is used for carrying out dimension transformation on the second vector to obtain a third vector with a preset dimension, and the third vector is the first coding information.

5. The method of claim 4, wherein the bullet screen encoder comprises a first multi-headed attention module, a second multi-headed attention module, a third multi-headed attention module and a second feed-forward module which are connected in sequence; wherein:

the first vector quantization characterization module is used for converting the context barrage into a corresponding fourth vector, and the fourth vector is used for characterizing the meaning of the context barrage;

the second multi-head attention module is used for weighting the fourth vector by adopting a multi-head attention mechanism to obtain a fifth vector;

the third multi-head attention module is used for weighting a vector formed by connecting the third vector and the fifth vector by adopting a multi-head attention mechanism to obtain a sixth vector;

the second feedforward module is configured to perform dimension transformation on the sixth vector to obtain a seventh vector with a preset dimension, where the seventh vector is the second encoding information.

6. The method of claim 5, wherein the bullet screen decoder comprises a second quantization characterization module, a first mask multi-headed attention module, a fourth multi-headed attention module, a fifth multi-headed attention module and a third feed-forward module which are connected in sequence; wherein:

the second quantization representation module is used for converting the input text of the second quantization representation module into a corresponding eighth vector; the eighth vector is used for representing the meaning of the input text, the input text of the second vector quantization representation module for the first time is empty, and the input text of the second vector quantization representation module for the Nth time is composed of words output by the bullet screen decoder for the first time to the Nth-1 st time; n is a positive integer greater than 1;

the first mask multi-head attention module is used for performing mask processing and weighting processing on the eighth vector to obtain a ninth vector;

the fourth multi-head attention module is used for weighting a vector formed by connecting the ninth vector and the third vector by using a multi-head attention mechanism to obtain a tenth vector;

the fifth multi-head attention module is configured to perform weighting processing on a vector formed by connecting the tenth vector and the seventh vector by using a multi-head attention mechanism to obtain an eleventh vector;

the third feedforward module is used for carrying out dimension transformation on the eleventh vector to obtain a twelfth vector with a preset dimension, and the twelfth vector corresponds to one word in the new bullet screen.

7. A bullet screen generating device, comprising:

8. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method of any of claims 1 to 6 are implemented when the computer program is executed by the processor.

9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 6.