CN113691838A

CN113691838A - Audio bullet screen processing method and device, electronic equipment and storage medium

Info

Publication number: CN113691838A
Application number: CN202110973984.1A
Authority: CN
Inventors: 池源
Original assignee: Beijing Shareit Information Technology Co Ltd
Current assignee: Beijing Shareit Information Technology Co Ltd
Priority date: 2021-08-24
Filing date: 2021-08-24
Publication date: 2021-11-23

Abstract

The embodiment of the disclosure relates to an audio barrage processing method and device, electronic equipment and a storage medium. The audio barrage processing method applied to the server can comprise the following steps: acquiring bullet screen information of the audio bullet screen; determining a video clip allowing to insert an audio barrage and clip information of the video clip according to the video content of the target video; and distributing the audio barrage, the barrage information and the fragment information to a client, wherein the barrage information and the fragment information are used for the client to insert the audio barrage into the video fragment allowing the audio barrage to be inserted.

Description

Audio bullet screen processing method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of information technologies, and in particular, to an audio bullet screen processing method and apparatus, an electronic device, and a storage medium.

Background

The audio barrage is in an audio format, and sound can be generated when the audio barrage is played. When the video is played, the audio barrage is played, and if the video is played with sound, and the audio barrage is played with sound, sound superposition may be generated. The sound superimposed together may interfere with the playing of the audio content. Therefore, in general, the barrage is played in the form of text or small pictures, but not in the form of audio barrage.

If the video is played in the audio barrage, the playing effect of the video itself may be lost. Therefore, how to allow the inter-cut of the audio barrage under the condition of not reducing the playing effect of the video per se is a problem to be further solved in the related technology.

Disclosure of Invention

The embodiment of the disclosure provides an audio barrage processing method and device, electronic equipment and a storage medium.

A first aspect of the embodiments of the present disclosure provides an audio barrage processing method, which is applied in a server, and the method includes: acquiring bullet screen information of the audio bullet screen; determining a video clip allowing to insert an audio barrage and clip information of the video clip according to the video content of the target video; and distributing the audio barrage, the barrage information and the fragment information to a client, wherein the barrage information and the fragment information are used for the client to insert the audio barrage into the video fragment allowing the audio barrage to be inserted.

Based on the above scheme, the method further comprises:

determining the playing weight of the audio bullet screen according to the relevance between the bullet screen content of the audio bullet screen and the video content of the target video; and the playing weight is used for determining the audio barrage inserted into the video clip.

Based on the above scheme, the bullet screen information includes: the bullet screen duration and the playing weight of the audio bullet screen; the fragment information includes: playing time information of the video clip;

and the bullet screen information and the fragment information are used for the client to determine the audio bullet screen of the video fragment which is allowed to be inserted into the audio bullet screen.

Based on the above scheme, the method further comprises:

determining a target bullet screen of a video clip allowing to be inserted into the audio bullet screen from the audio bullet screen according to the playing weight;

the distributing the audio barrage, the barrage information and the fragment information to the client comprises:

and distributing the target bullet screen, bullet screen information of the target bullet screen and the fragment information to the client.

Based on the above scheme, the determining, according to the play weight, a target bullet screen of a video clip that is allowed to be inserted into an audio bullet screen from the audio bullet screen includes:

selecting an alternative barrage to be inserted according to the playing weight;

when the bullet screen duration of the alternative bullet screen is less than or equal to the remaining duration of the video clip, determining the video clip and the alternative bullet screen as the target bullet screen, wherein the remaining duration is: and the segment duration of the video segment deducts the duration of the determined bullet screen duration of the target bullet screen.

Based on the above scheme, the method further comprises:

and when the remaining duration of a video segment of an allowed audio bullet screen is less than the bullet screen duration of any remaining audio bullet screen to be inserted, discarding any alternative bullet screen remaining to be inserted into the video segment.

Based on the above scheme, the bullet screen information further includes: bullet screen generation time; the method further comprises the following steps:

determining a target segment from video segments allowing to be inserted into the audio bullet screen according to the bullet screen generation time of the nth audio bullet screen, wherein the target segment is as follows: and allowing the video segment with the start time later than the bullet screen generation time of the nth audio bullet screen and the video segment with the start time closest to the bullet screen generation time of the nth audio bullet screen to be inserted into the audio bullet screen.

Based on the above scheme, the determining the play weight of the audio bullet screen according to the relevance between the bullet screen content of the audio bullet screen and the video content of the target video includes:

acquiring a bullet screen text of the audio bullet screen and a video text of the target video;

determining bullet screen keywords and video keywords of the video text according to the bullet screen text;

and determining the playing weight of the audio bullet screen according to the matching degree of the bullet screen keywords and the video keywords.

Based on the scheme, the video keywords have word weights, wherein the word weights of the video keywords are positively correlated with the occurrence times of the video keywords in the video text;

determining the playing weight of the audio bullet screen according to the matching degree of the bullet screen keywords and the video keywords, wherein the determining step comprises the following steps:

matching the bullet screen keywords with the video keywords to determine the video keywords meeting the similar meaning conditions with the bullet screen keywords;

and determining the playing weight of the video keywords meeting similar meaning conditions with the bullet screen keywords.

Based on the above scheme, the determining, according to the video content of the target video, the video segment and the segment information that are allowed to be inserted into the audio barrage includes:

determining the voice silence segment as a first type segment allowing to be inserted into an audio barrage according to the video content of the target video; wherein the first type of segment allows audio barrage insertion of arbitrary content;

determining segment information of the first-class segment according to the voice silence duration, wherein the segment information comprises: a start position and an end position of the first type of segment in the target video; or the starting position and the duration of the first type of segment in the target video.

determining the second type of segment of the video segment with preset content according to the video content of the target video; and the second type of segment allows the insertion of the barrage content and the audio barrage associated with the preset content.

A second aspect of the embodiments of the present disclosure provides an audio barrage processing method, which is applied in a client, and the method includes:

receiving an audio barrage, barrage information and fragment information when a target video is played;

determining a video clip allowing the target video to be inserted into the audio barrage in playing according to the clip information;

and according to the bullet screen information, inserting the audio bullet screen in the video clip allowing the insertion of the audio bullet screen.

the playing the audio bullet screen in the video clip allowing the insertion of the audio bullet screen according to the bullet screen information comprises:

determining an alternative barrage to be inserted into the video clip according to the playing weight;

when the bullet screen duration of the alternative bullet screen is less than or equal to the remaining duration of the video clip, determining that the alternative bullet screen is a target bullet screen; the remaining duration is: the segment duration of the video segment deducts the duration of the bullet screen duration of the inserted target bullet screen;

and inserting the target bullet screen when the video clip is played.

Based on the above scheme, the method further comprises:

Based on the above scheme, when the target video is played, receiving the bullet screen insertion information, the audio bullet screen and the clip information, including:

when the target video is played, receiving a target bullet screen, bullet screen information of the target bullet screen and the fragment information, wherein the target bullet screen is as follows: an audio bullet screen has been determined to insert the video segment.

A third aspect of the embodiments of the present disclosure provides an audio bullet screen processing apparatus, which is applied in a server, and the method includes:

the acquisition module is used for acquiring the bullet screen information of the audio bullet screen;

the first determining module is used for determining a video clip allowing to be inserted into the audio barrage and clip information of the video clip according to the video content of the target video;

and the distribution module is used for distributing the audio barrage, the barrage information and the segment information to a client, wherein the barrage information and the segment information are used for the client to insert the audio barrage into the video segment allowing the audio barrage to be inserted.

Based on the above scheme, the apparatus further comprises:

the second determining module is used for determining the playing weight of the audio bullet screen according to the relevance between the bullet screen content of the audio bullet screen and the video content of the target video; and the playing weight is used for determining the audio barrage inserted into the video clip.

Based on the above scheme, the apparatus further comprises:

the third determining module is used for determining a target bullet screen of a video clip which is allowed to be inserted into the audio bullet screen from the audio bullet screen according to the playing weight;

the distribution module is specifically configured to distribute the target barrage, the barrage information of the target barrage, and the fragment information to the client.

Based on the above scheme, the third determining module is specifically configured to select an alternative barrage to be inserted according to the play weight;

Based on the above scheme, the apparatus further comprises:

the first discarding module is used for discarding any alternative barrage which is remained to be inserted into the video clip when the remaining duration of a video clip of an allowed audio barrage is less than the barrage duration of any remaining audio barrage to be inserted.

Based on the above scheme, the bullet screen information further includes: bullet screen generation time; the device further comprises:

a fourth determining module, configured to determine a target segment from video segments allowing to be inserted into an audio bullet screen according to the bullet screen generation time of an nth audio bullet screen, where the target segment is: and allowing the video segment with the start time later than the bullet screen generation time of the nth audio bullet screen and the video segment with the start time closest to the bullet screen generation time of the nth audio bullet screen to be inserted into the audio bullet screen.

Based on the above scheme, the second determining module is specifically configured to obtain a bullet screen text of the audio bullet screen and a video text of the target video; determining bullet screen keywords and video keywords of the video text according to the bullet screen text; and determining the playing weight of the audio bullet screen according to the matching degree of the bullet screen keywords and the video keywords.

the second determining module is specifically configured to match the barrage keyword with the video keyword, and determine the video keyword satisfying a similar meaning condition with the barrage keyword; and determining the playing weight of the video keywords meeting similar meaning conditions with the bullet screen keywords.

Based on the above scheme, the first determining module is specifically configured to determine, according to the video content of the target video, a vocal silence segment as a first type segment allowing insertion of an audio barrage; wherein the first type of segment allows audio barrage insertion of arbitrary content; determining segment information of the first-class segment according to the voice silence duration, wherein the segment information comprises: a start position and an end position of the first type of segment in the target video; or the starting position and the duration of the first type of segment in the target video.

Based on the above scheme, the first determining module is specifically configured to determine the second type of segment from the video segment with the preset content according to the video content of the target video; and the second type of segment allows the insertion of the barrage content and the audio barrage associated with the preset content.

A fourth aspect of the embodiments of the present disclosure provides an audio barrage processing apparatus, which is applied to a client, where the apparatus includes:

the receiving module is used for receiving the audio barrage, the barrage information and the fragment information when the target video is played;

a fifth determining module, configured to determine, according to the segment information, a video segment in which the target video being played is allowed to be inserted into the audio barrage;

and the playing module is used for inserting the audio bullet screen into the video clip allowing the audio bullet screen to be inserted according to the bullet screen information.

the playing module is specifically configured to determine, according to the playing weight, an alternative barrage to be inserted into the video clip; when the bullet screen duration of the alternative bullet screen is less than or equal to the remaining duration of the video clip, determining that the alternative bullet screen is a target bullet screen; the remaining duration is: the segment duration of the video segment deducts the duration of the bullet screen duration of the inserted target bullet screen; and inserting the target bullet screen when the video clip is played.

Based on the above scheme, the apparatus further comprises:

and the second discarding module is used for discarding any alternative barrage which is remained to be inserted into the video clip when the remaining duration of the video clip of one allowed audio barrage is less than the barrage duration of any remaining audio barrage to be inserted.

a sixth determining module, configured to determine a target segment from video segments allowing to be inserted into an audio bullet screen according to the bullet screen generation time of the nth audio bullet screen, where the target segment is: and allowing the video segment with the start time later than the bullet screen generation time of the nth audio bullet screen and the video segment with the start time closest to the bullet screen generation time of the nth audio bullet screen to be inserted into the audio bullet screen.

Based on the above scheme, the receiving module is specifically configured to receive a target bullet screen, bullet screen information of the target bullet screen, and the clip information when the target video is played, where the target bullet screen is: an audio bullet screen has been determined to insert the video segment.

A fifth aspect of an embodiment of the present disclosure provides a mobile terminal, including:

a memory for storing processor-executable instructions;

a processor coupled to the memory;

wherein the processor is configured to execute the audio barrage processing method according to any of the technical solutions of the first aspect or the second aspect.

A sixth aspect of the embodiments of the present disclosure provides a non-transitory computer-readable storage medium, where computer-executable instructions are stored in the computer-readable storage medium, and when executed by a processor, the computer-executable instructions implement the audio barrage processing method provided in any of the foregoing first aspect or second aspect.

The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects:

the embodiment of the disclosure allows the audio barrage to be inserted when playing videos, and only inserts the video clip allowing the audio barrage to be inserted when the audio barrage is inserted, so as to reduce the interference of random insertion to the video playing in the audio barrage playing process caused by the random insertion of the target video being played, thereby having the characteristics of audio barrage insertion and small interference to the video.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

Fig. 1 is a flowchart illustrating an audio bullet screen processing method according to an embodiment of the present disclosure.

Fig. 2 is a flowchart illustrating an audio bullet screen processing method according to an embodiment of the present disclosure.

Fig. 3 is a flowchart illustrating an audio bullet screen processing method according to an embodiment of the present disclosure.

Fig. 4 is a schematic flowchart illustrating bullet screen weight determination according to an embodiment of the present disclosure.

Fig. 5 is a schematic flow chart illustrating an audio bullet screen process according to an embodiment of the present disclosure.

Fig. 6 is a flowchart illustrating an audio bullet screen processing method according to an embodiment of the present disclosure.

Fig. 7 is a schematic structural diagram of an audio bullet screen processing device according to an embodiment of the present disclosure.

Fig. 8 is a schematic structural diagram of an audio bullet screen processing device according to an embodiment of the present disclosure.

Fig. 9 is a schematic structural diagram of an electronic device shown in an embodiment of the present disclosure;

fig. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.

As shown in fig. 1, an embodiment of the present disclosure provides an audio barrage processing method, which is applied in a server, and the method includes:

s110: acquiring bullet screen information of the audio bullet screen;

s120: determining a video clip allowing to insert an audio barrage and clip information of the video clip according to the video content of the target video;

s130: and distributing the audio barrage, the barrage information and the fragment information to a client, wherein the barrage information and the fragment information are used for the client to insert the audio barrage into the video fragment allowing the audio barrage to be inserted.

The audio barrage is a barrage of the audio format file. The bullet screen information includes: any description information of the audio barrage is specifically as follows:

playing the bullet screen duration of the audio bullet screen;

the generation time of the audio barrage;

the content theme of the audio barrage.

In some embodiments, the bullet screen information may further include: bullet screen identification, etc.

S120 may include: and determining a video clip allowing the audio barrage to be inserted according to the video content of the target video, wherein the duration of the video clip can be any duration. In some embodiments, in order to enhance the effect of inserting the audio barrage, the minimum duration of the video segment allowing the audio barrage to be inserted may be not less than the playing duration of one shot.

The video segments may be divided into voiced segments and/or unvoiced segments according to the audio content in the video content.

The video segments can be divided into special sound segments and silence segments according to the audio content in the video content. For example, special sound segments include, but are not limited to: drinking sounds, sounds of fireworks, etc. A silent segment is a segment without sound.

Video clips can be divided into the following according to the image content in the video content: a particular picture segment. For example, with regard to a ball game type video, it may include: shot pieces or shot pictures, etc.

In some embodiments, the video clips allowing audio barrage insertion may be: the video clip with small negative influence on the playing effect of the video after the audio barrage is inserted, or the video clip capable of improving the playing effect of the video after the audio barrage is inserted.

The fragment information may indicate at least one of:

start and end position information of the video clip;

the starting position information and the segment duration information of the video segment at any time;

type information of the video clip.

In the embodiment of the disclosure, after receiving the audio barrage sent by the client, the server distributes the audio barrage to the client according to the barrage information and the segment information, so that the client can play the audio barrage while playing the video.

In the embodiment of the disclosure, in order to reduce the interference of inserting the audio barrage into the video playing, a video clip suitable for inserting the audio barrage is determined according to the video content, so that the audio barrage can be inserted, and the interference on the video playing effect is reduced as much as possible.

In some embodiments, the method further comprises:

In the embodiment of the present disclosure, the relevance between the audio barrage content and the video content is further determined according to the barrage content of the audio barrage, so as to determine the playing weight of the audio barrage.

For example, the voice of the audio barrage is converted into text, the text of the playing content related to the target video is obtained, and the relevance is determined by the meaning of the text expression.

In one embodiment, the higher the relevance, the greater the play weight. I.e. the positive correlation between the level of the correlation and the size of the play weight.

In some embodiments, the higher the playback weight of the audio barrage, the higher the priority of inserting the corresponding video clip.

Therefore, through the introduction of the play weight, on one hand, the relevance between the audio barrage which is allowed to be inserted into the audio video segment of the audio terminal wood for inserting the target video and the target video can be ensured, and on the other hand, the inserted audio barrage can be screened.

In other embodiments, the bullet screen information comprises: the bullet screen duration and the playing weight of the audio bullet screen; the fragment information includes: playing time information of the video clip;

In some embodiments, the task of selecting an audio bullet to be inserted into a target video for playing may be performed by the client itself in order to reduce the load rate of the server. At this time, the server may calculate the playing weight of the audio barrage, and the playing weight is carried in the audio barrage information and distributed to the client.

Therefore, the client selects the actually played audio bullet screen according to the bullet screen information and the fragment information.

The playing time information of the video clip can be used for the client to determine the video clip allowing to be inserted into the audio barrage and determine the duration of the video clip.

In some embodiments, the bullet screen information of the audio bullet screen further comprises: and generating time information of the audio barrage. The generation time information indicates the generation time of the audio bullet screen. In this way, according to the generation time, an audio segment within a preset time range from the generation time interval is selected for the alternative segment inserted by the audio bullet screen, or one or more video segments after the generation time are selected as the alternative segment inserted by the audio bullet screen.

And finally, whether the audio barrage is inserted into the corresponding video segment for playing needs to be finally inserted into the alternative segment or the target segment selected from the alternative segment according to the playing weight of the audio barrage and the barrage duration.

In some embodiments, as shown in fig. 2, the method further comprises:

s121: determining a target bullet screen of a video clip allowing to be inserted into the audio bullet screen from the audio bullet screen according to the playing weight;

the S130 includes: and distributing the target bullet screen, bullet screen information of the target bullet screen and the fragment information to the client.

In the embodiment of the present disclosure, the server may also select a target barrage into which the corresponding video segment is inserted according to the play weight. And then the selected target barrage, the barrage information and the fragment information are distributed to the client. At this time, the bullet screen information may not include the playback weight of the audio bullet screen. The bullet screen information at this time includes but is not limited to: the bullet screen identification of the audio bullet screen, the bullet screen duration, the segment identification of the video segment to be inserted into the audio bullet screen, and the like.

In the embodiment of the present disclosure, if the server filters the target barrage inserted into the video segment according to the play weight, each client is not required to process the target barrage, distribution of the determined non-target barrage which is not played is also reduced, and network transmission load and calculation load of the client are reduced.

In some embodiments, the S121 may include: selecting an alternative barrage to be inserted according to the playing weight;

Illustratively, according to the sequence of the playing weights from large to small, the audio barrage corresponding to one or more playing weights in the front sequence is selected as the alternative barrage.

Further, according to the bullet screen duration of the alternative bullet screen, the alternative bullet screen with the bullet screen duration less than or equal to the remaining duration of the video segment is determined to be used as the target bullet screen inserted into the video segment.

By adopting the method to determine the target barrage, on one hand, the audio barrage insertion which has the highest relevance with the video content of the target video can be selected according to the playing weight, and on the other hand, the time relation between the barrage duration of the audio barrage and the video segment allowing the target video to be inserted is considered, so that the problem caused by the insertion of the audio barrage in the whole target video is reduced.

In some embodiments, the method further comprises:

In this way, the audio barrage can be prohibited from being inserted into other parts of the target video, so that the interference on the target video playing is reduced.

In one embodiment, the bullet screen information further comprises: bullet screen generation time; the method further comprises the following steps:

By adopting the mode, the audio barrage can be inserted nearby in the time domain, so that the audio barrage can be played in time.

In some embodiments, the S121 may include:

The audio barrage may include: the audio of a human voice. The human voice may be the voice of a human speaking, the voice of a human reading, and/or the voice of a song. No matter which voice audio is, the bullet screen text of the audio bullet screen can be determined through audio recognition.

The video text may include at least one of:

description information of the target video, for example, any information associated with the video content in title, content summary or annotation information and/or rating information of the target video; the annotation information may include tags, for example, in the case of a movie, the movie has classification tags classified according to its genre and content, illustratively, action pieces, art pieces, and the like. The content summary may include: the main content of the movie. The annotation information may further include: a credits list contained in the credits list, etc.

Identifying the audio data contained in the target video to obtain a video text;

and identifying the video text obtained by the image contained in the target video.

In the embodiment of the disclosure, based on the keyword extraction algorithm, the barrage keywords of the barrage text and the video keywords of the video text can be respectively extracted.

Finally, according to the matching of the two keywords, the relevance between the audio barrage and the target video can be known, and the playing weight is determined based on the relevance.

In some embodiments, the video text may be: allowing the audio barrage to be inserted into the video text of the video segment, the matching degree being the matching degree between the barrage keywords and the video keywords of the video segment.

In some embodiments, the video keywords have word weights, wherein the word weights of the video keywords are positively correlated with the number of times the video keywords appear in the video text;

There may be multiple video keywords, with different video keywords having word weights set based on their own associated closeness to the target video and/or video segment. Illustratively, a higher word weight indicates a higher degree of reaction or association of the corresponding video keyword to the video content of the target video.

The same audio bullet screen may also contain a plurality of bullet screen keywords.

And matching the bullet screen keywords with the video keywords, and if the bullet screen keywords and the video keywords which have similar meanings meet the conditions, determining that the matching is successful.

The video key words satisfying the condition of similar meaning to the bullet screen key words may include at least one of:

video keywords that are the same as the barrage keywords;

video keywords that are different from the bullet screen keywords but that are synonyms of each other.

The play weight can be calculated by adopting the following functional relationship:

Weight＝∑keyword_n*w_nwherein Weight is the play Weight, keyword_nThe value of the nth video keyword is taken, and if the nth video keyword and the barrage keyword meet the condition of similar meaning, the keyword is processed_nIs 1, otherwise the keyword_nIs 0. w is a_nThe word weight for the nth video keyword.

In some embodiments, the determining, according to the video content of the target video, the video segment and the segment information that are allowed to be inserted into the audio bullet screen includes:

In some embodiments, the S120 may include:

In some embodiments, the predetermined content may be content of the video clip.

For example, a goal segment of a ball game may be a second type of segment, allowing insertion of audio bullet screen inserts containing applause sounds, audio bullet screen inserts containing drumbeats sounds. For example, singing clips may also be second type clips, allowing insertion of a called audio bullet.

As shown in fig. 4, an embodiment of the present disclosure provides an audio barrage processing method, which is applied in a client, and the method includes:

s210: receiving an audio barrage, barrage information and fragment information when a target video is played;

s220: determining a video clip allowing the target video to be inserted into the audio barrage in playing according to the clip information;

s230: and according to the bullet screen information, inserting the audio bullet screen in the video clip allowing the insertion of the audio bullet screen.

In the embodiment of the disclosure, when a client plays a target video, an audio bullet screen, bullet screen information and segment information are received, a video segment allowing the audio bullet screen to be inserted is determined according to the segment information, and then the audio bullet screen is inserted into the video segment allowing the audio bullet screen to be inserted according to the bullet screen information.

In some embodiments, the barrage information comprises: the bullet screen duration and the playing weight of the audio bullet screen; the fragment information includes: playing time information of the video clip;

the S230 may include: s231: determining an alternative barrage to be inserted into the video clip according to the playing weight;

s232: when the bullet screen duration of the alternative bullet screen is less than or equal to the remaining duration of the video clip, determining that the alternative bullet screen is a target bullet screen; the remaining duration is: the segment duration of the video segment deducts the duration of the bullet screen duration of the inserted target bullet screen;

s233: and inserting the target bullet screen when the video clip is played.

In the embodiment of the disclosure, the client determines the alternative barrage of the video clip to be accessed according to the play weight, and in the alternative barrage, when the barrage duration of the alternative barrage is less than or equal to the remaining duration of the video clip, the alternative barrage is determined as the target barrage which needs to be inserted into the video clip for playing, so that the target barrage is inserted into the video clip which is allowed to be inserted into the audio barrage.

In some embodiments, the method further comprises:

If the remaining duration of a video segment of an allowed audio barrage is less than or equal to the duration of any audio barrage, discarding other alternative barrages prepared for inserting the video segment, thereby avoiding the interference of the audio barrage inserted into other positions except the video segment allowed to be inserted into the audio barrage with the playing of the target video.

In some embodiments, the bullet screen information further comprises: bullet screen generation time; the method further comprises the following steps:

In the disclosed embodiment, the insertion of the audio bullet screen is: and the corresponding video clip is inserted nearby in the time domain, so that the audio barrage which is finally determined to be played can be played as much as possible.

In one embodiment, the S210 may include: when the target video is played, receiving a target bullet screen, bullet screen information of the target bullet screen and the fragment information, wherein the target bullet screen is as follows: an audio bullet screen has been determined to insert the video segment.

In some embodiments, which audio barrages are specifically inserted is determined by the server itself, so that the client can directly receive the target barrage to be played and the barrage information of the target barrage, and finally insert the target barrage when the target video is played according to the received barrage information and the segment information.

To the problem that after converting the bullet screen content into speech, it is disturbing with the original sound of the original video, the embodiment of the present disclosure provides a solution for intelligently playing bullet screen content sound, which may include: the method comprises the steps of analyzing original video sound to obtain important segments of original sound in the video, and inserting and playing electronic voice converted from bullet screen content in the segments of non-important or non-human sound.

The audio barrage processing method provided by the embodiment of the disclosure can comprise the following steps:

s310: and generating a video and audio bullet screen. And producing the audio barrage of the barrage by using an Application Programming Interface (API). For example, the audio bullet screen can be produced by using the existing text-to-speech (TTS) technology to output the audio bullet screen with the corresponding tone, and the audio bullet screen is stored in the server in combination with the bullet screen information. The audio bullet screen information comprises: [ Audio barrage (Danmaku Audio), Audio barrage Duration (Danmaku Duration), barrage Text (Danmaku Text), and barrage insertion timestamp (Danmaku Time) ]. That is, the generation of the audio bullet screen may be: and acquiring an audio text collection TTS technology to generate an audio bullet screen, wherein the audio bullet screen can also be an audio bullet screen input by a user when audio is played.

S320: and extracting original video text information. Separating audio information from an original Video, performing voice recognition on the audio information, and finally outputting a content list of the original Video, wherein the content list is in a specific format of a Start Time Stamp (SE), an End Time stamp (ET) and a Text content (Video Text, VT).

S330: and performing word segmentation statistics. And respectively extracting key contents and key words from the recognized bullet screen text information and the recognized text result of the original video, and performing word segmentation on the bullet screen text information and the recognized text result of the original video by using a word segmentation algorithm.

S340: and calculating the weight of the keywords of the original video. And based on the keywords of the original video extracted in the step S330, calculating the corresponding weight W of each keyword T by adopting a vector space model. And then, combining the information of S320, calculating an insertable Time period list (Insert TimeList) of the original video, where the specific format is [ Start timestamp Start Time, End timestamp End Time, next text keyword list KeyWords ].

S350: and calculating the play weight of the bullet screen. Based on the keywords of the barrage extracted in S330, the play Weight of the barrage is calculated in combination with the information of the keywords of the original video, and [ Audio barrage (Danmaku Audio), Duration of the Audio barrage (Danmaku Duration), barrage text (Danmaku Content), barrage insertion timestamp (Danmaku Time), and barrage play Weight (Danmaku Weight) ] are obtained.

S360: when the client plays the video content, the video content is synchronously played according to the calculated audio barrage and a certain insertion algorithm, and the audio barrage is inserted into the non-important information and other time segments in the video. The insertion algorithm is based on the play duration of the bullet screen and the play weight of the bullet screen.

By adopting the mode, the audio barrage is inserted in the video playing process, the user experience is good, the voice generated by the content of the barrage cannot disturb the key sound of the original video, the watching experience of the user on the original video cannot be influenced, the flexibility is good, and the self-adaptive voice insertion scheme is provided.

S410: for example, using audio synthesis software or software development tool (SDK), the audio barrage of the corresponding tone is output and stored in the server in combination with the barrage information. The audio barrage basic information comprises: [ Audio barrage (Danmaku Audio), Audio barrage Duration (Danmaku Duration), barrage text (Danmaku Content), and barrage insertion timestamp (Danmaku Time) ].

S420: extracting the sound of the original Video, performing speech recognition, and finally outputting a list with a format of [ Start Time (Start Time), End Time (End Time) and Text content (Video Text) ].

And S430, extracting key contents and key words from the recognized bullet screen text information and the voice recognition result of the original video respectively.

S440, based on the keywords of the original video extracted in S430, a vector space model is adopted to calculate the corresponding weight W of each keyword T. And then, combining the information of S420, calculating an insertable Time list (Insert TimeList) of the original video, wherein the specific formats are [ Start timestamp (Start Time), End timestamp (End Time), and next text keyword list (KeyWords) ].

S450: based on the keywords of the barrage extracted in S430, the play Weight of the barrage is calculated by combining the information of the keywords of the original video, and [ Audio barrage (DA), Audio barrage Duration (DD), barrage text (DC), barrage insertion timestamp (Danmaku Time, DT), barrage insertion Time point (DIT), and barrage play Weight (Danmaku Weight, DW) ] are obtained.

S460, when the client plays the video content, the video content is played synchronously according to the calculated audio barrage and a certain insertion algorithm, and the audio barrage is inserted into the non-important information and other time segments in the video. The insertion algorithm is to sort the language contents from large to small according to the weight of the language contents, and preferentially play the language contents with high weight.

The specific interpolation algorithm is as follows:

when the video is played to the insertable timestamp, a bullet screen list inserted after the current timestamp is obtained, and bullet screens are sequenced from large to small according to the playing weight.

When the total duration rd (rest duration) of the remaining insertable time period ═ the bullet screen playing duration dd (danmaku duration) of Max (playing weight) in the bullet screen list, playing the bullet screen;

and when the total duration RD of the remaining insertable time periods is less than the bullet screen of Max (playing weight) in the bullet screen list, judging the next bullet screen, and if the DD of the next bullet screen is less than the total duration RD of the remaining insertable time periods, playing the bullet screen, otherwise, continuously judging the next bullet screen until the last or remaining playing duration of the list is 0.

As shown in fig. 5, an embodiment of the present disclosure provides an audio bullet screen processing method, which may include:

inputting audio of playing video and inputting bullet screen content, wherein the bullet screen content can comprise bullet screen content of an audio bullet screen;

the voice recognition API recognizes the Video to obtain a content Time table [ Start Time (Start Time), End Time (End Time) and Text content (Video Text) ] of the original Video;

the voice device API identifies Audio barrage information to obtain [ Audio barrage (Danmaku Audio, DA), Audio barrage Duration (Danmaku Duration, DD), barrage text (Danmaku Content, DC), barrage insertion timestamp (Danmaku Time, DT), and barrage insertion Time point (DIT) ];

obtaining keywords and a keyword weight table TW [ (T1, W1) (T2, W2) … … (Tn, Wn) ] for the text identified by the original video by the word segmentation algorithm API; tm is a keyword, and Wm is the word weight of Tm;

the word segmentation algorithm API obtains word segmentation information of the bullet screen according to the bullet screen audio bullet screen information;

the insertable time period is calculated and a time list (TimeList) [ T1, T2, … …, Tn ] is obtained.

And calculating the bullet screen playing weight W and the bullet screen playing time DD.

As shown in fig. 6, an embodiment of the present disclosure provides an audio bullet screen processing method, which may include:

inputting a playing video;

the server acquires the insertable time period information and the barrage information, wherein the insertable time period information is as follows: time information allowing insertion of an audio bullet screen;

judging whether the audio barrage can be inserted;

if so, playing the audio barrage;

if so, the audio barrage is not played.

As shown in fig. 7, an embodiment of the present disclosure provides an audio bullet screen processing apparatus, which is applied in a server, and the method includes:

an obtaining module 510, configured to obtain bullet screen information of an audio bullet screen;

a first determining module 520, configured to determine, according to video content of a target video, a video segment allowing insertion of an audio barrage and segment information of the video segment;

a distributing module 530, configured to distribute the audio barrage, the barrage information, and the segment information to a client, where the barrage information and the segment information are used for the client to insert an audio barrage in a video segment where the audio barrage is allowed to be inserted.

In some embodiments, the obtaining module 510, the first determining module 520, and the distributing module 530 may be program modules; after being executed by the processor, the program module can realize acquisition of the barrage information, determination of the video clip and the clip information and distribution of the audio barrage, the barrage information and the clip information.

In other embodiments, the obtaining module 510, the first determining module 520, and the distributing module 530 may be a combination of hardware and software modules; the soft and hard combining module includes but is not limited to: various programmable arrays; the programmable array includes, but is not limited to: field programmable arrays and/or complex programmable arrays.

In still other embodiments, the obtaining module 510, the first determining module 520, and the distributing module 530 may be purely hardware modules; the pure hardware modules include, but are not limited to: an application specific integrated circuit.

In some embodiments, the apparatus further comprises:

the distributing module 530 is specifically configured to distribute the target barrage, the barrage information of the target barrage, and the segment information to the client.

In some embodiments, the third determining module is specifically configured to select an alternative barrage to be inserted according to the play weight;

In some embodiments, the apparatus further comprises:

In some embodiments, the bullet screen information further comprises: bullet screen generation time; the device further comprises:

In some embodiments, the second determining module is specifically configured to obtain a bullet screen text of the audio bullet screen and a video text of the target video; determining bullet screen keywords and video keywords of the video text according to the bullet screen text; and determining the playing weight of the audio bullet screen according to the matching degree of the bullet screen keywords and the video keywords.

In some embodiments, the first determining module 520 is specifically configured to determine, according to the video content of the target video, a voice silence segment as a first type segment allowing insertion of an audio barrage; wherein the first type of segment allows audio barrage insertion of arbitrary content; determining segment information of the first-class segment according to the voice silence duration, wherein the segment information comprises: a start position and an end position of the first type of segment in the target video; or the starting position and the duration of the first type of segment in the target video.

In some embodiments, the first determining module 520 is specifically configured to determine the second type segment from the video segments with preset content according to the video content of the target video; and the second type of segment allows the insertion of the barrage content and the audio barrage associated with the preset content.

As shown in fig. 8, an embodiment of the present disclosure provides an audio bullet screen processing apparatus, which is applied in a client, and the apparatus includes:

the receiving module 610 is configured to receive an audio barrage, barrage information, and clip information when a target video is played;

a fifth determining module 620, configured to determine, according to the segment information, a video segment that the target video during playing is allowed to be inserted into the audio barrage;

and the playing module 630 is configured to play the audio bullet screen in the video segment allowing the audio bullet screen to be inserted according to the bullet screen information.

In some embodiments, the receiving module 610, the fifth determining module 620, and the playing module 630 may be program modules; after being executed by the processor, the program module can realize the receiving of the audio barrage, the barrage information and the segment information, the determination of the video segment allowing the insertion of the audio barrage and the sending of the audio barrage of the video segment allowing the insertion of the audio barrage.

In other embodiments, the receiving module 610, the fifth determining module 620 and the playing module 630 may be a soft-hard combining module; the soft and hard combining module includes but is not limited to: various programmable arrays; the programmable array includes, but is not limited to: field programmable arrays and/or complex programmable arrays.

In still other embodiments, the receiving module 610, the fifth determining module 620, and the playing module 630 may be hardware-only modules; the pure hardware modules include, but are not limited to: an application specific integrated circuit.

the playing module 630 is specifically configured to determine, according to the playing weight, an alternative barrage to be inserted into the video clip; when the bullet screen duration of the alternative bullet screen is less than or equal to the remaining duration of the video clip, determining that the alternative bullet screen is a target bullet screen; the remaining duration is: the segment duration of the video segment deducts the duration of the bullet screen duration of the inserted target bullet screen; and inserting the target bullet screen when the video clip is played.

In some embodiments, the apparatus further comprises:

In some embodiments, the receiving module 610 is specifically configured to receive a target bullet screen, bullet screen information of the target bullet screen, and the clip information when the target video is played, where the target bullet screen is: an audio bullet screen has been determined to insert the video segment.

An embodiment of the present disclosure provides an electronic device, including:

a memory for storing processor-executable instructions;

the processors are respectively connected with the memories;

wherein the processor is configured to execute the audio barrage processing method provided by any of the preceding technical solutions by executing the computer-executable instructions stored on the memory.

The processor may include various types of storage media, which are non-transitory computer storage media capable of continuing to remember the information stored thereon after power has been removed from the mobile terminal.

The processor may be connected to the memory via a bus or the like for reading an executable program stored on the memory, e.g. at least one of the methods as shown in any of fig. 1 to 6.

The processor can be a server or a middle terminal device where a client is located.

Fig. 7 is a block diagram illustrating an electronic device 800 in accordance with an example embodiment. For example, the electronic device 800 may be a mobile phone, a mobile computer, or the like.

Referring to fig. 7, electronic device 800 may include one or more of the following components: processing component 802, memory 804, power component 806, multimedia component 808, audio component 810, input/output (I/O) interface 812, sensor component 814, and communication component 816.

The processing component 802 generally controls overall operation of the electronic device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.

The memory 804 is configured to store various types of data to support operation at the device 800. Examples of such data include instructions for any application or method operating on the electronic device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The power supply component 806 provides power to the various components of the electronic device 800. Power components 806 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for electronic device 800.

The multimedia component 808 includes a screen that provides an output interface between the electronic device 800 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the device 800 is in an operating state, such as a shooting state or a video state. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the electronic device 800 is in an operational state, such as a call state, a recording state, and a voice recognition state. The received audio signals may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.

The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the electronic device 800. For example, sensor assembly 814 may detect the open/closed status of device 800, the relative positioning of components, such as a display and keypad of device 800, sensor assembly 814 may also detect a change in position of device 800 or a component of device 800, the presence or absence of user contact with device 800, orientation or acceleration/deceleration of device 800, and a change in temperature of device 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 816 is configured to facilitate wired or wireless communication between the electronic device 800 and other devices. The electronic device 800 may access a wireless network based on a communication standard, such as Wi-Fi, 2G, or 3G, or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, communications component 816 further includes a Near Field Communications (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the electronic device 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer-readable storage medium comprising instructions, such as the memory 804 comprising instructions, executable by the processor 820 of the electronic device 800 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

The embodiment of the present disclosure provides a non-transitory computer-readable storage medium, wherein when instructions in the storage medium are executed by a processor of a mobile terminal, the mobile terminal is enabled to execute the audio barrage processing method provided in any of the foregoing embodiments, and at least one of the methods shown in any of fig. 1 to 6 can be executed.

As shown in fig. 10, an embodiment of the present disclosure illustrates a structure of an electronic device. For example, electronic device 900 may be provided as a network device.

Referring to fig. 10, electronic device 900 includes a processing component 922, which further includes at least one processor, and memory resources, represented by memory 932, for storing instructions, such as application programs, that are executable by processing component 922. The application programs stored in memory 932 may include one or more modules that each correspond to a set of instructions. Furthermore, the processing component 922 is configured to execute instructions to perform any of the methods described above as applied to the electronic device, for example, the methods shown in fig. 2-6.

The electronic device 900 may also include a power component 926 configured to perform power management of the electronic device 900, a wired or wireless network interface 950 configured to connect the electronic device 900 to a network, and an input/output (I/O) interface 958. The electronic device 900 may operate based on an operating system stored in the memory 932, such as Windows Server (TM), Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, or the like.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims

1. An audio bullet screen processing method, which is applied to a server, the method comprising:

acquiring bullet screen information of the audio bullet screen;

determining a video clip allowing to insert an audio barrage and clip information of the video clip according to the video content of the target video;

and distributing the audio barrage, the barrage information and the fragment information to a client, wherein the barrage information and the fragment information are used for the client to insert the audio barrage into the video fragment allowing the audio barrage to be inserted.

2. The method of claim 1, further comprising:

3. The method of claim 2, wherein the barrage information comprises: the bullet screen duration and the playing weight of the audio bullet screen; the fragment information includes: playing time information of the video clip;

4. The method of claim 2, further comprising:

5. The method of claim 4,

the determining, according to the playing weight, a target bullet screen of a video clip that is allowed to be inserted into an audio bullet screen from the audio bullet screen includes:

6. The method of claim 5, further comprising:

7. The method of claim 4, wherein the barrage information further comprises: bullet screen generation time; the method further comprises the following steps:

8. The method of claim 2, wherein the determining the play weight of the audio bullet screen according to the relevance between the bullet screen content of the audio bullet screen and the video content of the target video comprises:

9. The method of claim 8, wherein the video keywords have word weights, and wherein the word weights of the video keywords are positively correlated with the number of occurrences of the video keywords in the video text;

10. The method according to any one of claims 1 to 9, wherein the determining of the video segments and the segment information that allow the audio barrage to be inserted according to the video content of the target video comprises:

11. The method according to any one of claims 1 to 9, wherein the determining of the video segments and the segment information that allow the audio barrage to be inserted according to the video content of the target video comprises:

12. An audio bullet screen processing method, which is applied to a client, includes:

13. The method of claim 12, wherein the barrage information comprises: the bullet screen duration and the playing weight of the audio bullet screen; the fragment information includes: playing time information of the video clip;

and inserting the target bullet screen when the video clip is played.

14. The method of claim 13, further comprising:

15. The method of claim 13 or 14, wherein the barrage information further comprises: bullet screen generation time; the method further comprises the following steps:

16. The method of claim 12, wherein receiving the bullet screen insertion information, the audio bullet screen and the clip information when the target video is played comprises:

17. An audio bullet screen processing device, which is applied in a server, the method comprising:

18. The apparatus of claim 17, further comprising:

19. The apparatus of claim 18, wherein the barrage information comprises: the bullet screen duration and the playing weight of the audio bullet screen; the fragment information includes: playing time information of the video clip;

20. The apparatus of claim 18, further comprising:

21. The apparatus according to claim 20, wherein the third determining module is specifically configured to select an alternative bullet screen to be inserted according to the play weight;

22. The apparatus of claim 21, further comprising:

23. The apparatus of claim 20, wherein the barrage information further comprises: bullet screen generation time; the device further comprises:

24. The apparatus according to claim 18, wherein the second determining module is specifically configured to obtain a bullet screen text of the audio bullet screen and a video text of the target video; determining bullet screen keywords and video keywords of the video text according to the bullet screen text; and determining the playing weight of the audio bullet screen according to the matching degree of the bullet screen keywords and the video keywords.

25. The apparatus of claim 24, wherein the video keywords have word weights, and wherein the word weights of the video keywords are positively correlated to the number of times the video keywords appear in the video text;

26. The apparatus according to any one of claims 17 to 25, wherein the first determining module is specifically configured to determine, according to the video content of the target video, a vocal silence segment as a first type segment allowing insertion of an audio barrage; wherein the first type of segment allows audio barrage insertion of arbitrary content; determining segment information of the first-class segment according to the voice silence duration, wherein the segment information comprises: a start position and an end position of the first type of segment in the target video; or the starting position and the duration of the first type of segment in the target video.

27. The apparatus according to any one of claims 17 to 25, wherein the first determining module is specifically configured to determine, according to the video content of the target video, the second type segment from video segments with preset content; and the second type of segment allows the insertion of the barrage content and the audio barrage associated with the preset content.

28. An audio bullet screen processing device, which is applied in a client, the device comprising:

29. The apparatus of claim 28, wherein the barrage information comprises: the bullet screen duration and the playing weight of the audio bullet screen; the fragment information includes: playing time information of the video clip;

30. The apparatus of claim 29, further comprising:

31. The apparatus of claim 29 or 30, wherein the barrage information further comprises: bullet screen generation time; the device further comprises:

32. The apparatus of claim 28, wherein the receiving module is specifically configured to receive a target bullet screen, bullet screen information of the target bullet screen, and the clip information when the target video is played, where the target bullet screen is: an audio bullet screen has been determined to insert the video segment.

33. An electronic device, comprising:

a memory for storing processor-executable instructions;

a processor coupled to the memory;

wherein the processor is configured to perform the audio bullet screen processing method as provided in any one of claims 1 to 11.

34. A non-transitory computer-readable storage medium having stored therein computer-executable instructions that, when executed by a processor, implement the audio bullet screen processing method provided in any one of claims 1 to 11.