CN115119039A

CN115119039A - Video playing system, method and device, electronic equipment and storage medium

Info

Publication number: CN115119039A
Application number: CN202210764069.6A
Authority: CN
Inventors: 周亚桥
Original assignee: Beijing QIYI Century Science and Technology Co Ltd
Current assignee: Beijing QIYI Century Science and Technology Co Ltd
Priority date: 2022-06-29
Filing date: 2022-06-29
Publication date: 2022-09-27

Abstract

The disclosed embodiment relates to a video playing system, a video playing method, a video playing device, an electronic device and a storage medium, wherein the system comprises a terminal and a server, the terminal is in communication connection with the server, and the method comprises the following steps: the terminal described above, configured to: sending a target playing request to the server; the server is configured to: determining a target video played by the target playing request instruction; acquiring an initial playing time and a speech segment corresponding to a video segment of the target video; wherein, the corresponding initial playing time of the video clip is: the moment when the video clip starts playing in the target video; returning the initial playing time and the line segment to the terminal; the terminal described above, further configured to: presenting at least part of characters in the lines segment; and if the playing operation corresponding to the partial characters is detected, starting to play the target video from the initial playing time. According to the scheme, the generation efficiency of the video clips can be improved.

Description

Video playing system, method and device, electronic equipment and storage medium

Technical Field

The embodiment of the disclosure relates to the technical field of computers, and in particular, to a video playing system, method, device, electronic device and storage medium.

Background

Currently, short videos are becoming more and more popular as the social rhythm accelerates. There is a substantial part of short video that is clipping of classical segments of long video. That is, a video can be viewed as a concatenation of a series of end-to-end video segments. In the prior art, video division mainly depends on a video producer to manually divide and write corresponding presentation texts (such as titles of video clips).

However, the above method of dividing the video to obtain the video segments has high authoring cost and low efficiency.

Disclosure of Invention

In view of the above, in order to solve some or all of the above technical problems, embodiments of the present disclosure provide a video playing system, method, apparatus, electronic device and storage medium.

In a first aspect, an embodiment of the present disclosure provides a video playing system, where the system includes a terminal and a server, where the terminal is in communication connection with the server, and where:

the terminal described above, configured to: sending a target playing request to the server;

the server is configured to: determining a target video played by the target playing request instruction; acquiring an initial playing time and a speech segment corresponding to a video segment of the target video; the video clip corresponds to an initial playing time as follows: the moment when the video clip starts playing in the target video; returning the initial playing time and the line segment to the terminal;

the terminal described above, further configured to: presenting at least part of characters in the lines segment; and if the playing operation corresponding to the partial characters is detected, starting from the initial playing time to play the target video.

Optionally, in the system according to any embodiment of the present disclosure, the server is further configured to:

dividing video lines of the target video to obtain a first line segment set;

determining the number of bullet screens and the dragging and playing times corresponding to the lines in the first line segment set;

generating a second speech segment set based on the first speech segment set, the number of the bullet screens and the number of the dragging and playing times, wherein the number of the speech segments in the second speech segment set is smaller than that of the speech segments in the first speech segment set;

and determining the starting playing time corresponding to the video clip comprising the line segment in the second line segment set.

Optionally, in the system according to any embodiment of the present disclosure, the first speech segment set is a speech segment sequence, and the number of speech segments in the second speech segment set is a target number; and

the server is specifically configured to:

calculating the weighted summation result of the number of bullet screens corresponding to each line segment in the line segment sequence and the dragging and playing times;

selecting the target number of results from the calculated results according to the sequence from big to small;

determining a set of the speech line segments corresponding to the target number of results as a candidate segment set;

if the candidate segment set comprises adjacent candidate segments, executing the following determination steps:

determining the adjacent candidate segment as a new candidate segment;

selecting the result with the largest numerical value from the calculated and unselected results;

determining the speech segment corresponding to the result with the maximum value as a new candidate segment;

updating the adjacent candidate segments into the two determined new candidate segments to update the candidate segment set;

if the updated candidate segment set comprises the adjacent candidate segment, executing the determining step based on the updated candidate segment set;

if the candidate segment set does not comprise the adjacent candidate segment, determining the candidate segment set not comprising the adjacent candidate segment as a second speech segment set;

wherein, the adjacent candidate segment is two adjacent lines segments in the line segment sequence.

Optionally, in the system of any embodiment of the present disclosure, the speech segment includes at least one speech sentence; and

the server is further configured to:

acquiring a preset speech-line sentence set, wherein the play amount and/or the comment amount of the speech-line sentences in the preset speech-line sentence set meet a first preset condition;

and for each line segment in the second line segment set, determining the partial characters used for presentation in the line segment based on the similarity between the line sentence included in the line segment and the line sentence in the preset line sentence set.

Optionally, in the system according to any embodiment of the present disclosure, the first preset condition includes: the result of the weighted summation of the playing quantity and the comment quantity of the target speech is greater than or equal to a preset numerical value.

Optionally, in the system according to any embodiment of the present disclosure, the server is specifically configured to:

if the target video meets a second preset condition, generating a second line segment set based on the first line segment set, the number of bullet screens and the dragging and playing times;

wherein the second preset condition includes at least one of the following:

the playing amount of the target video is greater than or equal to a preset playing amount threshold value;

the playing time of the target video is greater than or equal to the preset playing time.

Optionally, in the system of any embodiment of the present disclosure, the terminal is specifically configured to:

and presenting at least part of characters in the speech segment as the title of the corresponding video segment.

presenting a cover of the video clip, wherein the cover comprises a title of the video clip, and the title comprises at least part of characters in the corresponding line segment; and

the terminal is further configured to:

and if the clicking operation on the cover is detected, determining that the playing operation corresponding to the partial characters is detected.

In a second aspect, an embodiment of the present disclosure provides a video playing method, where the method is applied to a terminal, and the method includes:

sending a target playing request to a server, wherein the server is in communication connection with the terminal;

receiving the starting playing time and the speech segment corresponding to the video segment of the target video indicated to be played by the target playing request returned by the server; the video clip corresponds to an initial playing time as follows: the moment when the video clip starts playing in the target video;

presenting at least part of characters in the lines segment;

and if the playing operation corresponding to the partial characters is detected, starting from the initial playing time to play the target video.

Optionally, in the method according to any embodiment of the present disclosure, the start playing time is determined by:

dividing video lines of the target video to obtain a first line segment set;

and determining the starting playing time corresponding to the video clip comprising the speech clip in the second speech clip set.

Optionally, in the method according to any embodiment of the present disclosure, the first speech segment set is a speech segment sequence, and the number of speech segments in the second speech segment set is a target number; and

generating a second speech segment set based on the first speech segment set, the number of the bullet screens, and the number of dragging and playing times, the method comprising:

calculating the weighted summation result of the bullet screen quantity and the dragging and playing times corresponding to each line segment in the line segment sequence;

determining a set of the speech segment corresponding to the target number of results as a candidate segment set;

determining the adjacent candidate segment as a new candidate segment;

if the updated candidate segment set comprises adjacent candidate segments, executing the determining step based on the updated candidate segment set;

Optionally, in a method of any embodiment of the present disclosure, the line segment includes at least one line sentence; and

the partial characters are determined by the following steps:

Optionally, in the method of any embodiment of the present disclosure, the first preset condition includes: the result of the weighted summation of the playing quantity and the comment quantity of the target speech is greater than or equal to a preset numerical value.

Optionally, in the method according to any embodiment of the present disclosure, the generating a second speech segment set based on the first speech segment set, the number of bullet screens, and the number of times of dragging and playing includes:

wherein the second preset condition includes at least one of the following:

Optionally, in a method according to any embodiment of the present disclosure, the presenting at least part of the text in the above-mentioned speech segment includes:

Optionally, in the method according to any embodiment of the present disclosure, the presenting at least part of the text in the speech segment as a title of the corresponding video segment includes:

presenting a cover of the video clip, wherein the cover comprises a title of the video clip, and the title comprises at least part of characters in the lines clip; and

before the step of detecting the play operation corresponding to the partial text, the method further includes:

Optionally, in the method according to any embodiment of the present disclosure, the starting playing time corresponding to the video clip is: the middle time of the time when the two word segments start to play in the target video; wherein, the two word segments include: the video segment includes a speech segment, and a speech segment that is included in a video segment that is previous to the video segment in the target video.

In a third aspect, an embodiment of the present disclosure provides a video playing method, where the method is applied to a server, and the method includes:

receiving a target playing request sent by a terminal, wherein the terminal is in communication connection with the server;

determining a target video played by the target playing request instruction;

acquiring an initial playing time and a speech segment corresponding to a video segment of the target video; the video clip corresponds to an initial playing time as follows: the moment when the video clip starts playing in the target video;

and returning the starting playing time and the line segment to the terminal so that the terminal presents at least part of characters in the line segment, and playing the target video from the starting playing time under the condition that a playing operation corresponding to the part of characters is detected.

dividing video lines of the target video to obtain a first line segment set;

determining the adjacent candidate segment as a new candidate segment;

updating the adjacent candidate segments into the two determined new candidate segments so as to update the candidate segment set;

Optionally, in the method of any embodiment of the present disclosure, the speech segment includes at least one speech sentence; and

the partial characters are determined by the following steps:

wherein the second preset condition includes at least one of the following:

and the playing time length of the target video is greater than or equal to the preset playing time length.

In a fourth aspect, an embodiment of the present disclosure provides a video playing apparatus, where the apparatus is disposed in a terminal, and the apparatus includes:

a sending unit configured to send a target play request to a server, wherein the server is in communication connection with the terminal;

a first receiving unit, configured to receive a starting playing time and a speech segment corresponding to a video segment of a target video indicated to be played by the target playing request, which are returned by the server; the video clip corresponds to an initial playing time as follows: the moment when the video clip starts playing in the target video;

the presenting unit is configured to present at least part of characters in the lines segment;

and the playing unit is configured to play the target video from the initial playing time if the playing operation corresponding to the partial characters is detected.

Optionally, in an apparatus according to any embodiment of the present disclosure, the start playing time is determined by:

dividing video lines of the target video to obtain a first line segment set;

Optionally, in the apparatus according to any embodiment of the present disclosure, the first speech segment set is a speech segment sequence, and the number of speech segments in the second speech segment set is a target number; and

determining the adjacent candidate segment as a new candidate segment;

Optionally, in the apparatus of any embodiment of the present disclosure, the speech segment includes at least one speech sentence; and

the partial characters are determined by the following steps:

Optionally, in the apparatus according to any embodiment of the present disclosure, the first preset condition includes: the result of the weighted summation of the playing quantity and the comment quantity of the target speech is greater than or equal to a preset numerical value.

Optionally, in an apparatus according to any embodiment of the present disclosure, the generating a second speech segment set based on the first speech segment set, the number of bullet screens, and the number of times of dragging and playing includes:

wherein the second preset condition includes at least one of the following:

Optionally, in the apparatus according to any embodiment of the present disclosure, the presenting unit is further configured to:

a cover page presenting the video clip, wherein the cover page comprises a title of the video clip, and the title comprises at least part of characters in the lines clip; and

the above-mentioned device still includes:

a first determination unit configured to determine that a play operation corresponding to the partial text is detected if a click operation on the cover is detected.

Optionally, in the apparatus according to any embodiment of the present disclosure, the starting playing time corresponding to the video clip is: the middle time of the time when the two word segments start to play in the target video; wherein, the two word segments include: the video segment includes a speech segment, and a speech segment that is included in a video segment that is previous to the video segment in the target video.

In a fifth aspect, an embodiment of the present disclosure provides a video playing apparatus, where the apparatus is disposed at a server, and the apparatus includes:

a second receiving unit, configured to receive a target play request sent by a terminal, wherein the terminal is in communication connection with the server;

a second determination unit configured to determine a target video indicated to be played by the target play request;

the acquisition unit is configured to acquire an initial playing time and a speech segment corresponding to the video segment of the target video; the video clip corresponds to an initial playing time as follows: the moment when the video clip starts playing in the target video;

a returning unit configured to return the starting playing time and the speech segment to the terminal so that the terminal presents at least part of the characters in the speech segment, and to start playing the target video from the starting playing time when a playing operation corresponding to the part of the characters is detected.

dividing video lines of the target video to obtain a first line segment set;

determining the adjacent candidate segment as a new candidate segment;

and the adjacent candidate segments are two adjacent lines segments in the line segment sequence.

the partial characters are determined by the following steps:

wherein the second preset condition includes at least one of the following:

In a sixth aspect, an embodiment of the present disclosure provides an electronic device, including:

a memory for storing a computer program;

a processor, configured to execute the computer program stored in the memory, and when the computer program is executed, implement the method of any embodiment of the video playing method of the second aspect or the third aspect of the present disclosure.

In a seventh aspect, the present disclosure provides a computer-readable storage medium, and when executed by a processor, the computer program implements the method in any embodiment of the video playing method in the second or third aspect.

In an eighth aspect, the disclosed embodiments provide a computer program comprising computer readable code which, when run on a device, causes a processor in the device to execute instructions for implementing the steps in the method as embodiments in any of the video playback methods of the second or third aspects described above.

The video playing system provided by the embodiment of the disclosure comprises a terminal and a server, wherein the terminal is in communication connection with the server, and the video playing system comprises: the terminal described above, configured to: sending a target playing request to the server; the server is configured to: determining a target video played by the target playing request instruction; acquiring an initial playing time and a speech segment corresponding to a video segment of the target video; the video clip corresponds to an initial playing time as follows: the moment when the video clip starts playing in the target video; returning the initial playing time and the line segment to the terminal; the terminal described above, further configured to: presenting at least part of characters in the line segment; and if the playing operation corresponding to the partial characters is detected, starting from the initial playing time to play the target video. According to the scheme, at least part of characters in the lines segment are presented through the terminal, and the target video is played from the initial playing moment returned by the server under the condition that the playing operation corresponding to the part of characters is detected, so that the playing of the video segment is realized, and the generation efficiency of the video segment is improved.

The video playing method applied to the terminal provided by the embodiment of the disclosure obtains the decryption resource information, wherein. According to the scheme, a target playing request is sent to a server side, wherein the server side is in communication connection with the terminal, and then the initial playing time and the speech segment corresponding to the video segment of the target video, which is indicated to be played by the target playing request and returned by the server side, are received; the video clip corresponds to an initial playing time as follows: and starting playing the target video from the starting playing time if the playing operation corresponding to the part of characters is detected. According to the scheme, at least part of characters in the lines segment are presented through the terminal, and the target video is played from the initial playing moment returned by the server under the condition that the playing operation corresponding to the part of characters is detected, so that the playing of the video segment is realized, and the generation efficiency of the video segment is improved.

The video playing method applied to the server side provided by the embodiment of the disclosure receives a target playing request sent by a terminal, wherein the terminal is in communication connection with the server side, then determines a target video indicated to be played by the target playing request, and then acquires an initial playing time and a speech segment corresponding to a video segment of the target video; wherein, the corresponding initial playing time of the video clip is: and then returning the starting playing time and the line segment to the terminal so that the terminal presents at least part of characters in the line segment, and playing the target video from the starting playing time under the condition that the playing operation corresponding to the part of characters is detected. According to the scheme, at least part of characters in the lines segment are presented through the terminal, and the target video is played from the initial playing moment returned by the server under the condition that the playing operation corresponding to the part of characters is detected, so that the playing of the video segment is realized, and the generation efficiency of the video segment is improved.

Drawings

Fig. 1 is an interaction diagram of a video playing system according to an embodiment of the present disclosure;

fig. 2 is an interaction diagram of another video playing system provided in the embodiment of the present disclosure;

fig. 3 is a schematic flowchart of a video playing method applied to a terminal according to an embodiment of the present disclosure;

fig. 4 is a schematic flowchart of a video playing method applied to a server according to an embodiment of the present disclosure;

fig. 5 is a schematic structural diagram of a video playing device disposed in a terminal according to an embodiment of the present disclosure;

fig. 6 is a schematic structural diagram of a video playing apparatus disposed at a server according to an embodiment of the present disclosure;

fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.

Detailed Description

Various exemplary embodiments of the present disclosure will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of parts and steps, numerical expressions, and values set forth in these embodiments do not limit the scope of the present disclosure unless specifically stated otherwise.

It will be understood by those within the art that the terms "first," "second," and the like in the embodiments of the present disclosure are used only for distinguishing between different steps, devices, or modules, and do not denote any particular technical meaning or order therebetween.

It is also understood that in the present embodiment, "a plurality" may mean two or more, and "at least one" may mean one, two or more.

It is also to be understood that any reference to any component, data, or structure in the embodiments of the present disclosure may be generally understood as one or more, unless explicitly defined otherwise or indicated to the contrary hereinafter.

In addition, the term "and/or" in the present disclosure is only one kind of association relationship describing the association object, and indicates that three relationships may exist, for example, a and/or B, may indicate: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" in the present disclosure generally indicates that the former and latter associated objects are in an "or" relationship.

It should also be understood that the description of the various embodiments of the present disclosure emphasizes the differences between the various embodiments, and the same or similar parts may be referred to each other, so that the descriptions thereof are omitted for brevity.

The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses.

Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail, but are intended to be part of the specification where appropriate.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be discussed further in subsequent figures.

It should be noted that, in the present disclosure, the embodiments and features of the embodiments may be combined with each other without conflict. For the purpose of facilitating an understanding of the embodiments of the present disclosure, the present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with the embodiments. It is to be understood that the described embodiments are only a few, and not all, of the disclosed embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.

Fig. 1 is an interaction schematic diagram of a video playing system provided by an embodiment of the present disclosure, and as shown in fig. 1, the system includes a terminal and a server. And the terminal is in communication connection with the server. Wherein:

the server is configured to: determining a target video indicated to be played by the target playing request; acquiring an initial playing time and a speech segment corresponding to a video segment of the target video; the video clip corresponds to an initial playing time as follows: the moment when the video clip starts playing in the target video; returning the initial playing time and the line segment to the terminal;

As shown in fig. 1, in step 101, the terminal sends a target play request to the server.

In this embodiment, the terminal may send a target play request to the server.

The target playback request may be used to indicate to play a video (i.e., a target video described later).

In step 102, the server determines a target video indicated to be played by the target play request.

In this embodiment, the server may determine a target video indicated to be played by the target play request.

Optionally, after receiving the target play request sent by the terminal, the server may send video data (e.g., streaming media data) of a video indicated to be played by the target play request to the terminal, so that the terminal plays the video indicated to be played by the target play request.

In step 103, the server obtains a start playing time and a speech segment corresponding to the video segment of the target video.

In this embodiment, the server may obtain a start playing time and a speech segment corresponding to the video segment of the target video.

The video clip corresponds to an initial playing time as follows: and starting playing the video clip in the target video.

The video clips can be any one or more video clips in the target video. In the case that the video clips are multiple video clips in the target video, the multiple video clips may be connected end to end.

The above-mentioned speech segment may be a speech included in the video segment.

In step 104, the start playing time and the speech segment are returned to the terminal.

In this embodiment, the server may return the start playing time and the speech segment to the terminal.

In step 105, the terminal presents at least some of the words in the line segment.

In this embodiment, the terminal may present at least some of the characters in the line segment.

The partial characters can be any partial characters in the lines segment. For example, the part of the text may be any one of the above-mentioned lines.

Optionally, the terminal may also present all the characters in the above-mentioned speech segment.

In addition, in some cases, if the number of words of at least part of the presented words is greater than or equal to the preset word number threshold, the terminal may present the at least part of the words in a scrolling manner.

In some cases, the terminal may reduce the influence of the presentation of the part of the text on other operations (for example, watching a target video) performed by the user by setting the transparency of the presented part of the text.

In some optional implementation manners of this embodiment, the terminal may execute the step 105 in the following manner: and at least part of characters in the speech segment are used as the title of the corresponding video segment for presentation.

It will be appreciated that in generating a video clip, it is often necessary to determine the title of the generated video clip and to make the user aware of the general content of the video clip by presenting the title. In the above optional implementation manner, at least part of the text in the lines segment may be used as the title of the corresponding video segment, so that the generation efficiency of the video segment may be further improved.

In some optional implementation manners of this embodiment, or in some application scenarios in the optional implementation manners, the terminal may perform the step 105 in the following manner:

a cover page of the video clip is presented.

Wherein the cover page includes a title of the video clip. The title comprises at least part of characters in the corresponding line segment.

In step 106, if a playing operation corresponding to the partial text is detected, the terminal starts to play the target video from the start playing time.

In this embodiment, if a playing operation corresponding to the partial text is detected, the target video is played from the start playing time.

The playing operation corresponding to the partial characters may be a clicking operation on the partial characters.

Here, before the terminal starts playing the target video from the start playing time, the server may start to transmit the video segment described in step 103 to the terminal, or the server may also transmit the video segment described in step 103 to the terminal, so that the terminal starts playing the target video from the start playing time. The sending time of the video clip may be before or after the detection of the playing operation. In some cases, if the sending time of the video clip is after the detection of the playing operation, and the terminal is playing the target video before the detection of the playing operation corresponding to the partial text, the server may stop sending other video clips of the target video except the video clip described in step 103 to the terminal.

In some optional implementations of this embodiment, in a case that the terminal performs step 105 by presenting a cover of a video clip, the terminal may further perform step 106 by: and if the clicking operation on the cover is detected, determining that the playing operation corresponding to the partial characters is detected.

It is to be understood that, in the above alternative implementation manner, the user may implement the playing of the video clip by clicking the cover page.

In some cases, if the terminal detects that the playing of the video segment is completed during the playing of the target video, the terminal may continue to play the target video from the ending playing time of the video segment, or play another video or video segment by selection of the user.

The video playing system provided by the embodiment of the disclosure comprises a terminal and a server, wherein the terminal is in communication connection with the server, and the video playing system comprises: the terminal described above, configured to: sending a target playing request to the server; the server is configured to: determining a target video played by the target playing request instruction; acquiring an initial playing time and a speech segment corresponding to a video segment of the target video; wherein, the corresponding initial playing time of the video clip is: the moment when the video clip starts playing in the target video; returning the initial playing time and the line segment to the terminal; the terminal described above, further configured to: presenting at least part of characters in the line segment; and if the playing operation corresponding to the partial characters is detected, starting from the initial playing time to play the target video. According to the scheme, at least part of characters in the lines segment are presented through the terminal, and the target video is played from the initial playing moment returned by the server under the condition that the playing operation corresponding to the part of characters is detected, so that the playing of the video segment is realized, and the generation efficiency of the video segment is improved.

Fig. 2 is an interaction diagram of another video playing system provided by the embodiment of the present disclosure, and as shown in fig. 2, the system includes a terminal and a server. And the terminal is in communication connection with the server. Wherein:

the server is configured to: dividing video lines of a target video to obtain a first line segment set; determining the number of bullet screens and the dragging and playing times corresponding to the lines in the first line segment set; generating a second speech segment set based on the first speech segment set, the number of the bullet screens and the number of the dragging and playing times, wherein the number of the speech segments in the second speech segment set is smaller than that of the speech segments in the first speech segment set; and determining the starting playing time corresponding to the video clip comprising the line segment in the second line segment set.

the server is configured to: determining a target video indicated to be played by the target playing request; acquiring an initial playing time and a line segment corresponding to a video segment of the target video; the video clip corresponds to an initial playing time as follows: the moment when the video clip starts playing in the target video; returning the initial playing time and the line segment to the terminal;

the terminal described above, further configured to: presenting at least part of characters in the lines segment; if the play operation corresponding to the partial characters is detected, starting from the initial play time

As shown in fig. 2, in step 201, the server divides the video lines of the target video to obtain a first line segment set.

In this embodiment, the server may divide the video lines of the target video to obtain a first line segment set.

The video lines, that is, the lines included in the target video.

The first line segment set is the result of dividing the video lines of the target video.

The number of the speech-line sentences contained in each speech-line segment in the first speech-line segment set may be one or more. Each speech sentence may be a sentence.

In step 202, the server determines the number of bullet screens and the number of dragging and playing times corresponding to the lines in the first line segment set.

In this embodiment, the server may determine the number of bullet screens and the number of dragging and playing times corresponding to the speech-line segment in the first speech-line segment set.

The number of barrages corresponding to the speech segment may be the number of barrages sent for the video segment including the speech segment.

The number of times of dragging and playing corresponding to the line segment may be the number of times of playing the video segment including the line segment in a dragging manner.

The number of the bullet screens and the number of dragging and playing times can be obtained through a terminal (including the terminal), and then the bullet screens are sent to the server by the terminal and stored in the server or electronic equipment in communication connection with the server.

In step 203, the server generates a second word segment set based on the first word segment set, the number of the bullet screens, and the number of dragging and playing times.

In this embodiment, the server may generate a second word segment set based on the first word segment set, the number of the bullet screens, and the number of dragging and playing times.

The number of the speech-line segments in the second speech-line segment set is smaller than the number of the speech-line segments in the first speech-line segment set.

As an example, the server may determine, as the second speech segment set, a set of speech segments in the first speech segment set, where the number of barrage is greater than a first preset value and the number of times of dragging and playing is greater than a second preset value.

In some optional implementations of this embodiment, the first speech segment set is a speech segment sequence. The number of the line segment in the second line segment set is the target number.

The target number may be a predetermined number, or may be a product of the number of the speech-line segments in the first speech-line segment set and a preset percentage.

On this basis, the server may also determine the second speech segment set by using the following method:

step one, calculating the weighted summation result of the number of bullet screens corresponding to each line segment in the line segment sequence and the dragging and playing times.

The bullet screen number and the weight of the dragging and playing times can be respectively set to 2 numerical values.

And step two, selecting the target number of results from the calculated results according to the sequence from big to small.

And step three, determining the set of the speech segment corresponding to the target number of results as a candidate segment set.

In the first step, a result (that is, the result of the weighted sum) can be obtained for each speech segment in the sequence of speech segments, so that there is a one-to-one correspondence relationship between the speech segments in the sequence of speech segments and the result. According to the corresponding relation, the speech segment corresponding to the target number of results can be determined.

Step four, if the candidate segment set comprises adjacent candidate segments, executing the following determining steps (comprising substep one, substep two, substep three and substep four):

and a first sub-step of determining the adjacent candidate segment as a new candidate segment.

And a second substep of selecting the result with the largest value from the calculated results which are not selected.

And thirdly, determining the speech segment corresponding to the result with the maximum value as a new candidate segment.

And a substep four, updating the adjacent candidate segments into the two determined new candidate segments so as to update the candidate segment set.

And step five, if the updated candidate segment set comprises the adjacent candidate segment, executing the determining step based on the updated candidate segment set.

And step six, if the adjacent candidate segment is not included in the candidate segment set, determining the candidate segment set which does not include the adjacent candidate segment as a second speech segment set.

In the above optional implementation manner, the adjacent candidate segments are two adjacent lines in the line segment sequence.

It can be understood that, in the above alternative implementation manner, the generated second speech segment set does not include adjacent speech segments, and therefore, when the target video is divided into video segments, the divided video segments also do not include adjacent video segments.

In some alternative implementations of this embodiment, the line segment includes at least one line sentence.

On this basis, the server may determine the partial text by using the following method:

first, a preset speech sentence set is obtained.

The play amount and/or the comment amount of the speech-line sentences in the preset speech-line sentence set meet a first preset condition.

Then, for each line segment in the second line segment set, based on the similarity between the line sentence included in the line segment and the line sentence in the preset line sentence set, the part of the text used for presentation in the line segment is determined.

As an example, the line sentence with the largest similarity to the line sentences in the preset line sentence set in the line sentences included in the line segment may be used as the partial characters for presentation in the line segment.

As another example, for each line sentence included in the line segment, a weighted summation result of the similarity between the line sentence and each line sentence in the preset line sentence set may be calculated, so that the line sentence corresponding to the maximum weighted summation result is used as the partial text for presentation in the line segment.

It can be understood that the preset speech sentence set can be used as a high-heat speech sentence library, and thus, the probability of the user executing the playing operation can be improved by calculating the similarity between each speech sentence in the second speech sentence set and the speech sentence in the preset speech sentence set to determine the partial characters, so that the playing amount of the video clip is increased.

In some application scenarios where the current room or the clock is selectable, the first preset condition includes: the result of the weighted summation of the playing quantity and the comment quantity of the target speech is greater than or equal to a preset numerical value.

Optionally, the first preset condition may also include at least one of the following:

in the first item, the playing amount of the target speech is greater than or equal to a preset first numerical value.

And in the second item, the comment amount of the target speech is greater than or equal to a preset second numerical value.

In the third item, the play amount of the target line is greater than or equal to a preset third numerical value, and the comment amount of the target line is greater than or equal to a preset fourth numerical value.

In some optional implementation manners of this embodiment, if the target video meets a second preset condition, the server may generate a second speech segment set based on the first speech segment set, the number of bullet screens, and the number of dragging and playing times; and if the target video does not meet a second preset condition, the server does not generate a second speech segment set.

Wherein the second preset condition includes at least one of the following:

the playing amount of the target video is greater than or equal to a preset playing amount threshold value.

It is to be understood that, in the above alternative implementation manner, the generation timing of the second speech segment set may be determined based on the play amount and/or the play duration of the target video. For the video which does not meet the second preset condition, the video clip does not need to be generated, so that the computing resource of the server can be saved.

In step 204, the server determines an initial playing time corresponding to a video segment including a speech segment in the second speech segment set.

In this embodiment, the server may determine a start playing time corresponding to a video segment including each speech segment in the second speech segment set.

In step 205, the terminal sends a target play request to the server.

In this embodiment, the terminal may send a target play request to the server.

In this embodiment, the execution manner of step 205 may refer to step 101, which is not described herein again.

In step 206, the server determines a target video indicated to be played by the target play request.

In this embodiment, the execution manner of step 206 may refer to step 102, which is not described herein again.

In step 207, the starting playing time and the speech segment corresponding to the video segment of the target video are obtained.

The video segment described in step 207 may be a video segment including any one of the above-mentioned second speech segment sets.

In this embodiment, the execution manner of step 207 may refer to step 103, which is not described herein again.

In step 208, the server returns the start playing time and the speech segment to the terminal.

In this embodiment, the execution manner of step 208 may refer to step 104, which is not described herein again.

In step 209, the terminal presents at least a portion of the text in the above-mentioned speech segment.

In this embodiment, the terminal may present at least some of the characters in the above-mentioned speech segment.

In this embodiment, the execution manner of step 209 may refer to step 105, which is not described herein again.

In step 210, if a playing operation corresponding to the partial text is detected, the terminal starts playing the target video from the starting playing time.

In this embodiment, if a playing operation corresponding to the partial text is detected, the terminal may start playing the target video from the starting playing time.

In this embodiment, the execution manner of step 210 may refer to step 106, and is not described herein again.

According to the video playing system provided by the embodiment of the disclosure, the second speech segment set is generated based on the first speech segment set, the number of the bullet screens and the number of times of dragging and playing, and then the video segments containing the speech segments in the second speech segment set are played, so that the number of the bullet screens and the number of times of dragging and playing can be used for assisting in determining how to divide the target video into the video segments, the interest degree of a user in the played video segments is further improved, and the play-out rate of the video segments is improved.

As an application scenario of the present embodiment, the present embodiment is exemplarily described below. It should be noted, however, that the embodiments of the present disclosure may have the features described below, but the following description should not be construed as limiting the scope of the embodiments of the present disclosure.

In the present application scenario, first, some high-heat speech words (i.e., the preset speech word sentence set) in the video (i.e., the target video) are prepared. Then, the video is divided. Wherein, the division rule is as follows: each video clip contains a section of high-heat speech, and all video clips are exactly equivalent to the original video in an end-to-end manner. The titles of the divided video segments are the high-heat lines (i.e. the above at least partial characters) included in the video segments. When the user plays the video (i.e. the target video), all the divided titles of the video may be listed in the player, and when the user clicks a certain title, the corresponding video segment is played.

Specifically, the segmentation of the speech segment may be performed as follows:

first, a database of high-heat lines (i.e., the above-mentioned preset line sentence set) is built. The database may be purchased commercially, the annotations may be extracted manually, or may be generated by artificial intelligence.

Then, when the long video (i.e. the target video) is online, the division is not performed first. And after the video is played for a period of time or the playing amount reaches a certain threshold value, the long video is divided to obtain video clips. In the long video playing stage, the terminal records all bullet screen launching time offset points and time offset points of user dragging playing, and transmits the record to the server for storage. The server determines the number of bullet screens corresponding to the lines segments according to the bullet screen launching time offset points; and determining the dragging and playing times corresponding to the lines segments according to the time offset point of the dragging and playing of the user.

Then, the server may use continuous lines as a line segment (that is, the line segment in the first line segment set), and calculate the heat of each line segment in the long video, where the calculation formula is:

Hot＝λ×A+(1-λ)×B

wherein λ is a predetermined weight value greater than 0 and less than 1. And A represents the number of bullet screens corresponding to the speech segment. And B, representing the corresponding dragging and playing times of the speech segment. Hot characterizes the heat of the speech segment.

Then, the k (i.e. the target number) segment words with the highest heat degree are selected from the obtained word segments (i.e. the first word segment set), if the selected two segments of words are adjacent, the two segments of words are combined into one segment, and one segment of words with the highest heat degree is selected from the unselected words.

Thus, k-segment lines (i.e., the second line segment set) can be picked out.

Next, for each segment in the k segments of lines, comparing the similarity between each line of lines in the segment and each line of lines in the high-heat-degree line database, and using the line of lines with the highest similarity as the title of the video clip containing the line segment.

In particular, in the application scenario described above, the server may be responsible for generating video clips. The terminal can be responsible for requesting data and controlling video playing. The server and the terminal may be software or hardware, respectively.

At the server side, N sections of high-heat speech lines (i.e. the preset speech line sentence set) of the target video can be prepared in advance, and the speech lines are respectively numbered as lines from morning to evening according to the time of occurrence in the target video ₁ 、line ₂ 、line ₃ ……line _n . Then, the target video is logically divided into N video clips, and the video clips are respectively numbered as clips from morning to evening according to the initial playing time ₁ 、chip ₂ 、clip ₃ ……clip _n . The division rule is as follows: clip _n Must contain line _n ，clip _n End time of (i.e. clip) _n+1 Start play time of) line _n End time and line of _n+1 At the midpoint of the starting play time. That is, the starting playing time corresponding to the video clip is: and starting playing the video clip in the target video. Subsequently, the title of each video clip is recorded as the high-heat speech lines contained in the video clip, i.e. clip _n Is recorded as line _n And the title and each videoAnd the initial playing time of the segment forms auxiliary information, and the auxiliary information and the target video are simultaneously stored on the server.

On the side of the terminal, the target video and the auxiliary information corresponding to the target video can be applied to the server. After the terminal obtains the auxiliary information, the initial playing time and the title of each video clip are extracted by analyzing the auxiliary information, and all the video clip titles are displayed on the player for the user to click. The player starts playing from the starting playing time of the target video. If the user clicks the title of a certain video segment at a certain time, the player jumps to the initial playing time of the video segment for playing.

In the application scene, the videos are divided by using high-heat lines, so that each video segment has certain heat, and even a classical video segment is obtained. Furthermore, in the application scene, the high-heat lines are used as the titles for dividing the video, so that the click rate of the user on the video clip can be increased.

Fig. 3 is a schematic flowchart of a video playing method applied to a terminal according to an embodiment of the present disclosure, as shown in fig. 1, the method specifically includes:

301. and sending a target playing request to the server.

In this embodiment, the terminal may send a target play request to the server.

And the server is in communication connection with the terminal.

302. And receiving the starting playing time and the speech segment corresponding to the video segment of the target video indicated to be played by the target playing request returned by the server.

In this embodiment, the terminal may receive a start playing time and a speech segment corresponding to a video segment of a target video indicated to be played by the target playing request, where the start playing time and the speech segment are returned by the server.

303. And presenting at least part of characters in the lines segment.

304. And if the playing operation corresponding to the partial characters is detected, starting from the initial playing time to play the target video.

In this embodiment, if a playing operation corresponding to the partial text is detected, the terminal may start playing the target video from the starting playing time

In some optional implementations of this embodiment, the start playing time is determined by:

dividing video lines of the target video to obtain a first line segment set;

In some optional implementation manners of this embodiment, the first speech segment set is a speech segment sequence, and the number of speech segments in the second speech segment set is a target number; and

determining the adjacent candidate segment as a new candidate segment;

In some optional implementations of this embodiment, the speech segment includes at least one speech statement; and

the partial characters are determined by the following steps:

In some optional implementations of the present embodiment, the first preset condition includes: the result of the weighted summation of the playing quantity and the comment quantity of the target speech is greater than or equal to a preset numerical value.

In some optional implementation manners of this embodiment, the generating a second speech segment set based on the first speech segment set, the number of bullet screens, and the number of times of dragging and playing includes:

wherein the second preset condition includes at least one of the following:

In some optional implementations of this embodiment, the presenting at least part of the text in the above-mentioned speech segment includes:

In some optional implementations of this embodiment, the presenting at least a part of the text in the speech segment as a title of the corresponding video segment includes:

before the above step of detecting the play operation corresponding to the above-mentioned part of the text, the method further includes:

In some optional implementation manners of this embodiment, the starting playing time corresponding to the video clip is: the middle time of the time when the two word segments start to play in the target video; wherein, the two word segments include: the video segment includes a speech segment, and a speech segment that is included in a video segment that is previous to the video segment in the target video.

It should be noted that, besides the above-mentioned contents, the present embodiment may also include the technical features described in the above fig. 1 or fig. 2, and generate the technical effects described in the above fig. 1 or fig. 2, which are not described again here.

Fig. 4 is a schematic flowchart of a video playing method applied to a server according to an embodiment of the present disclosure, and as shown in fig. 4, the method specifically includes:

401. and receiving a target playing request sent by the terminal.

In this embodiment, the server may receive a target play request sent by the terminal.

And the terminal is in communication connection with the server.

402. And determining the target video indicated to be played by the target playing request.

And the consumption of decryption resources of the video data in the video data set is different. The above-mentioned decryption resource consumption amount represents the resource amount consumed for performing the decryption process on the video data.

403. And acquiring the initial playing time and the speech segment corresponding to the video segment of the target video.

404. And returning the starting playing time and the line segment to the terminal so that the terminal presents at least part of characters in the line segment, and playing the target video from the starting playing time under the condition that a playing operation corresponding to the part of characters is detected.

In this embodiment, the server may return the start playing time and the speech segment to the terminal, so that the terminal presents at least a part of the text in the speech segment, and start playing the target video from the start playing time when a playing operation corresponding to the part of the text is detected.

dividing video lines of the target video to obtain a first line segment set;

determining the adjacent candidate segment as a new candidate segment;

the partial characters are determined by the following steps:

wherein the second preset condition includes at least one of the following:

Fig. 5 is a schematic structural diagram of a video playing device disposed in a terminal according to an embodiment of the present disclosure, which specifically includes:

a sending unit 501, configured to send a target play request to a server, where the server is in communication connection with the terminal;

a first receiving unit 502, configured to receive a starting playing time and a speech segment corresponding to a video segment of a target video indicated to be played by the target playing request, which are returned by the server; the video clip corresponds to an initial playing time as follows: the moment when the video clip starts playing in the target video;

a presentation unit 503 configured to present at least part of the text in the above-mentioned speech segment;

a playing unit 504 configured to play the target video from the start playing time if a playing operation corresponding to the partial text is detected.

dividing video lines of the target video to obtain a first line segment set;

determining the adjacent candidate segment as a new candidate segment;

the partial characters are determined by the following steps:

wherein the second preset condition includes at least one of the following:

Optionally, in an apparatus according to any embodiment of the present disclosure, the presenting unit 503 is further configured to:

the above-mentioned device still includes:

a first determining unit (not shown in the figure) configured to determine that a play operation corresponding to the partial text is detected if a click operation on the cover is detected.

The video playing apparatus provided in this embodiment may be the video playing apparatus shown in fig. 5, and may perform all the steps of the video playing method shown in fig. 3, so as to achieve the technical effect of the video playing method shown in fig. 3.

Fig. 6 is a schematic structural diagram of a video playing apparatus disposed at a server according to an embodiment of the present disclosure, which specifically includes:

a second receiving unit 601, configured to receive a target play request sent by a terminal, where the terminal is in communication connection with the server;

a second determining unit 602 configured to determine a target video indicated to be played by the target play request;

an obtaining unit 603 configured to obtain a start playing time and a speech segment corresponding to a video segment of the target video; the video clip corresponds to an initial playing time as follows: the moment when the video clip starts playing in the target video;

a returning unit 604 configured to return the start playing time and the speech segment to the terminal so that the terminal presents at least part of the text in the speech segment, and to start playing the target video from the start playing time when a playing operation corresponding to the part of the text is detected.

dividing video lines of the target video to obtain a first line segment set;

determining the adjacent candidate segment as a new candidate segment;

the partial characters are determined by the following steps:

if the target video meets a second preset condition, generating a second line segment set based on the first line segment set, the bullet screen quantity and the dragging and playing times;

wherein the second preset condition includes at least one of the following:

The video playing apparatus provided in this embodiment may be the video playing apparatus shown in fig. 6, and may perform all the steps of the video playing method shown in fig. 3, so as to achieve the technical effect of the video playing method shown in fig. 4.

Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure, where the electronic device 700 shown in fig. 7 includes: at least one processor 701, memory 702, at least one network interface 704, and other user interfaces 703. The various components in the electronic device 700 are coupled together by a bus system 705. It is understood that the bus system 705 is used to enable communications among the components. The bus system 705 includes a power bus, a control bus, and a status signal bus in addition to a data bus. But for clarity of illustration the various busses are labeled in figure 7 as the bus system 705.

The user interface 703 may include, among other things, a display, a keyboard, or a pointing device (e.g., a mouse, trackball, touch pad, or touch screen, among others.

It is to be understood that the memory 702 in embodiments of the present disclosure may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The non-volatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash Memory. Volatile Memory can be Random Access Memory (RAM), which acts as external cache Memory. By way of illustration and not limitation, many forms of RAM are available, such as Static random access memory (Static RAM, SRAM), Dynamic Random Access Memory (DRAM), Synchronous Dynamic random access memory (Synchronous DRAM, SDRAM), Double Data Rate Synchronous Dynamic random access memory (ddr Data Rate SDRAM, ddr SDRAM), Enhanced Synchronous SDRAM (ESDRAM), synchlronous SDRAM (SLDRAM), and Direct Rambus RAM (DRRAM). The memory 702 described herein is intended to comprise, without being limited to, these and any other suitable types of memory.

In some embodiments, memory 702 stores the following elements, executable units or data structures, or a subset thereof, or an expanded set thereof: an operating system 7021 and application programs 7022.

The operating system 7021 includes various system programs, such as a framework layer, a core library layer, a driver layer, and the like, for implementing various basic services and processing hardware-based tasks. The application 7022 includes various applications, such as a Media Player (Media Player), a Browser (Browser), and the like, for implementing various application services. Programs that implement methods in accordance with embodiments of the present disclosure can be included within application program 7022.

In this embodiment, the processor 701 is configured to execute the method steps provided by the method embodiments by calling a program or an instruction stored in the memory 702, which may be, specifically, a program or an instruction stored in the application 7022.

As an example, it may include:

presenting at least part of characters in the lines segment;

dividing video lines of the target video to obtain a first line segment set;

determining the adjacent candidate segment as a new candidate segment;

the partial characters are determined by the following steps:

wherein the second preset condition includes at least one of the following:

Or, as yet another example, it may also include:

determining a target video played by the target playing request instruction;

dividing video lines of the target video to obtain a first line segment set;

determining the adjacent candidate segment as a new candidate segment;

if the candidate segment set does not comprise the adjacent candidate segment, determining the candidate segment set not comprising the adjacent candidate segment as a second speech line segment set;

the partial characters are determined by the following steps:

wherein the second preset condition includes at least one of the following:

The method disclosed by the embodiment of the present disclosure may be applied to the processor 701, or implemented by the processor 701. The processor 701 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be implemented by integrated logic circuits of hardware or instructions in the form of software in the processor 701. The Processor 701 may be a general-purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, or discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present disclosure may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present disclosure may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software elements in the decoding processor. The software elements may be located in ram, flash, rom, prom, or eprom, registers, among other storage media that are well known in the art. The storage medium is located in the memory 702, and the processor 701 reads the information in the memory 702 and performs the steps of the above method in combination with the hardware thereof.

It is to be understood that the embodiments described herein may be implemented in hardware, software, firmware, middleware, microcode, or any combination thereof. For a hardware implementation, the Processing units may be implemented in one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), general purpose processors, controllers, micro-controllers, microprocessors, other electronic units configured to perform the above-described functions of the present disclosure, or a combination thereof.

For a software implementation, the techniques described herein may be implemented by means of units performing the functions described herein. The software codes may be stored in a memory and executed by a processor. The memory may be implemented within the processor or external to the processor.

The electronic device provided in this embodiment may be the electronic device shown in fig. 7, and may execute all the steps of the streaming media playing method shown in fig. 2 and 3, so as to achieve the technical effect of the streaming media playing method shown in fig. 2 and 3, and for brevity, please refer to the description related to fig. 2 and 3, which is not described herein again.

The disclosed embodiments also provide a storage medium (computer-readable storage medium). The storage medium herein stores one or more programs. Among others, the storage medium may include volatile memory, such as random access memory; the memory may also include non-volatile memory, such as read-only memory, flash memory, a hard disk, or a solid state disk; the memory may also comprise a combination of memories of the kind described above.

When one or more programs in the storage medium are executable by one or more processors, the streaming media playing method executed on the electronic device side is realized.

The processor is configured to execute the streaming media playing program stored in the memory to implement the following steps of the streaming media playing method executed on the electronic device side.

As an example, it may include:

presenting at least part of characters in the line segment;

dividing video lines of the target video to obtain a first line segment set;

determining the adjacent candidate segment as a new candidate segment;

the partial characters are determined by the following steps:

wherein the second preset condition includes at least one of the following:

Or, as yet another example, it may also include:

determining a target video played by the target playing request instruction;

dividing video lines of the target video to obtain a first line segment set;

determining the adjacent candidate segment as a new candidate segment;

the partial characters are determined by the following steps:

wherein the second preset condition includes at least one of the following:

Those of skill would further appreciate that the various illustrative components and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied in hardware, a software module executed by a processor, or a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

The above-mentioned embodiments, objects, technical solutions and advantages of the present disclosure are described in further detail, it should be understood that the above-mentioned embodiments are merely illustrative of the present disclosure and are not intended to limit the scope of the present disclosure, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present disclosure should be included in the scope of the present disclosure.

Claims

1. A video playing system is characterized in that the system comprises a terminal and a server, the terminal is in communication connection with the server, wherein:

the terminal configured to: sending a target playing request to the server;

the server is configured to: determining a target video which is indicated to be played by the target playing request; acquiring an initial playing time and a speech segment corresponding to a video segment of the target video; the video clip corresponds to an initial playing time as follows: the moment when the video clip starts to play in the target video; returning the starting playing time and the line segment to the terminal;

the terminal, further configured to: presenting at least part of characters in the lines segment; and if the playing operation corresponding to the partial characters is detected, starting from the initial playing time, and playing the target video.

2. The system of claim 1, wherein the server is further configured to:

dividing video lines of the target video to obtain a first line segment set;

generating a second speech segment set based on the first speech segment set, the number of the barrage and the number of the dragging and playing times, wherein the number of the speech segments in the second speech segment set is smaller than that of the speech segments in the first speech segment set;

3. The system according to claim 2, wherein the first set of speech segments is a sequence of speech segments, and the number of speech segments in the second set of speech segments is a target number; and

the server is specifically configured to:

selecting the target quantity of results from the calculated results according to the sequence from big to small;

determining the adjacent candidate segment as a new candidate segment;

performing the determining step based on the updated candidate segment set if the updated candidate segment set includes neighboring candidate segments;

wherein, the adjacent candidate segments are two adjacent lines segments in the line segment sequence.

4. The system of claim 2, wherein the speech segment comprises at least one speech sentence; and

the server is further configured to:

and for each line segment in the second line segment set, determining the part of characters used for presentation in the line segment based on the similarity between the line sentence included in the line segment and the line sentence in the preset line sentence set.

5. The system according to claim 4, wherein the first preset condition comprises: the result of the weighted summation of the playing quantity and the comment quantity of the target speech is greater than or equal to a preset numerical value.

6. The system of claim 2, wherein the server is specifically configured to:

wherein the second preset condition comprises at least one of:

7. The system according to one of claims 1 to 6, characterized in that said terminal is specifically configured to:

8. The system according to claim 7, characterized in that the terminal is specifically configured to:

presenting a cover of the video clip, wherein the cover comprises a title of the video clip, and the title comprises at least part of characters in the corresponding line clip; and

the terminal is further configured to:

and if the clicking operation on the cover is detected, determining that the playing operation corresponding to the part of the characters is detected.

9. The system according to any one of claims 1 to 6, wherein the video segments correspond to starting playing time points of: the middle time of the time when the two word segments start to play in the target video; wherein, the two word segments comprise: the video segment comprises a speech segment, and a speech segment which is included in a video segment before the video segment in the target video.

10. A video playing method is applied to a terminal, and the method comprises the following steps:

receiving the starting playing time and the speech segment corresponding to the video segment of the target video indicated to be played by the target playing request returned by the server; the video clip corresponds to an initial playing time as follows: the moment when the video clip starts to play in the target video;

presenting at least part of characters in the lines segment;

and if the playing operation corresponding to the partial characters is detected, starting from the initial playing time, and playing the target video.

11. The method of claim 10, wherein the starting playing time is determined by:

dividing video lines of the target video to obtain a first line segment set;

12. The method according to claim 11, wherein the first set of speech segments is a sequence of speech segments, and the number of speech segments in the second set of speech segments is a target number; and

generating a second speech segment set based on the first speech segment set, the number of the bullet screens and the number of dragging and playing times, wherein the generating of the second speech segment set comprises:

determining the adjacent candidate segment as a new candidate segment;

13. The method of claim 11, wherein the speech segment comprises at least one speech sentence; and

the partial characters are determined by the following steps:

14. The method according to claim 13, wherein the first preset condition comprises: the result of the weighted summation of the playing quantity and the comment quantity of the target speech is greater than or equal to a preset numerical value.

15. The method of claim 11, wherein generating a second set of speech segments based on the first set of speech segments, the number of barrage, and the number of drag plays comprises:

wherein the second preset condition comprises at least one of:

16. The method according to any of claims 10-15, wherein said presenting at least some of the text in said speech segment comprises:

17. The method of claim 16, wherein presenting at least some of the text in the line segment as a title of the corresponding video segment comprises:

before the step of detecting the playing operation corresponding to the partial characters, the method further comprises the following steps:

18. The method according to any of claims 10-15, wherein the video segments correspond to starting playing time points: the middle time of the time when the two word segments start to play in the target video; wherein, the two word segments comprise: the video segment comprises a speech segment, and a speech segment which is included in a video segment before the video segment in the target video.

19. A video playing method is applied to a server side, and the method comprises the following steps:

determining a target video which is indicated to be played by the target playing request;

acquiring an initial playing time and a speech segment corresponding to a video segment of the target video; the video clip corresponds to an initial playing time as follows: the moment when the video clip starts to play in the target video;

and returning the starting playing time and the speech segment to the terminal so as to enable the terminal to present at least part of characters in the speech segment, and playing the target video from the starting playing time under the condition that the playing operation corresponding to the part of characters is detected.

20. The method of claim 19, wherein the starting playing time is determined by:

dividing video lines of the target video to obtain a first line segment set;

21. The method of claim 20, wherein the first set of speech segments is a sequence of speech segments, and the number of speech segments in the second set of speech segments is a target number; and

determining the adjacent candidate segment as a new candidate segment;

22. The method of claim 20, wherein the speech segment comprises at least one speech sentence; and

the partial characters are determined by the following steps:

23. The method of claim 22, wherein the first preset condition comprises: the result of the weighted summation of the playing quantity and the comment quantity of the target speech is greater than or equal to a preset numerical value.

24. The method of claim 20, wherein generating a second set of speech segments based on the first set of speech segments, the number of barrage, and the number of drag plays comprises:

wherein the second preset condition comprises at least one of:

25. The method according to any of claims 19-24, wherein the video segments correspond to starting playing time points: the middle time of the time when the two word segments start to play in the target video; wherein, the two word segments comprise: the video segment comprises a speech segment, and a speech segment which is included in a video segment before the video segment in the target video.

26. A video playing device is characterized in that the device is arranged on a terminal, and the device comprises:

the first receiving unit is configured to receive a starting playing time and a speech segment corresponding to a video segment of a target video, which is returned by the server and indicated to be played by the target playing request; the video clip corresponds to an initial playing time as follows: the moment when the video clip starts to play in the target video;

a presentation unit configured to present at least part of the text in the speech segment;

a playing unit configured to play the target video starting from the start playing time if a playing operation corresponding to the partial text is detected.

27. A video playing device is characterized in that the device is arranged at a server side, and the device comprises:

the second receiving unit is configured to receive a target playing request sent by a terminal, wherein the terminal is in communication connection with the server;

the acquisition unit is configured to acquire an initial playing time and a speech segment corresponding to a video segment of the target video; the video clip corresponds to an initial playing time as follows: the moment when the video clip starts to play in the target video;

a returning unit configured to return the starting playing time and the speech segment to the terminal so as to enable the terminal to present at least part of characters in the speech segment, and to play the target video from the starting playing time when a playing operation corresponding to the part of characters is detected.

28. An electronic device, comprising:

a memory for storing a computer program;

a processor for executing a computer program stored in the memory, and when executed, implementing the method of any of the preceding claims 10-25.

29. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method of any one of the preceding claims 10 to 25.